Notable other

When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi

Original source

arXiv cs.NE

Problem
This preprint addresses the gap in understanding how structural separation in neural networks influences the stability-plasticity trade-off in continual learning (CL). While prior research has explored the balance between retaining old knowledge and acquiring new information, the specific conditions under which modular architectures enhance or hinder performance remain unclear. The authors investigate the interplay between network architecture, task similarity, and representational dimensionality to elucidate when structural modularity is beneficial in CL scenarios.

Method
The study employs a task-partitioned modular recurrent neural network (RNN) and compares it against a single-module baseline. The authors systematically manipulate task similarity (low, medium, high) and the scale of weight initialization to induce different learning regimes. The effective dimensionality of learned representations is characterized empirically, allowing for an analysis of how these factors interact. The modular architecture is designed to facilitate task-specific subspaces, while the single-module baseline serves as a control to assess the impact of structural separation on learning dynamics.

Results
The findings reveal that in high-dimensional regimes, where representations are less constrained, the architecture has minimal impact on performance. However, in lower-dimensional (rich) regimes, the modular network demonstrates significant advantages: it achieves graded alignment of task-specific subspaces for similar tasks, partial orthogonalization for moderately dissimilar tasks, and stronger separation for dissimilar tasks. In contrast, the single-module baseline fails to exhibit this graded geometry, leading to increased interference. The results suggest that representational dimensionality is a critical factor in determining the effectiveness of structural separation, with implications for the design of continual learning systems.

Limitations
The authors acknowledge that their findings are contingent on the specific architectures and tasks employed in the study. They do not explore the effects of varying other hyperparameters, such as learning rates or regularization techniques, which could influence the outcomes. Additionally, the study is limited to a specific class of tasks and may not generalize to all continual learning scenarios. The reliance on empirical characterization of dimensionality may also introduce variability based on the chosen metrics.

Why it matters
This work has significant implications for the design of continual learning systems, particularly in how modular architectures can be leveraged to mitigate interference when learning sequential tasks. By identifying representational dimensionality as a key organizing variable, the study provides a framework for future research to explore adaptive geometry in neural networks. This understanding can guide the development of more effective CL systems that balance the need for stability and plasticity, ultimately enhancing the performance of AI systems in dynamic environments.

Authors: Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi
Source: arXiv:2604.27656
URL: https://arxiv.org/abs/2604.27656v1