Abstract
Continual learning (CL) equips deep models to learn a sequence of tasks under resource constraints while retaining previously acquired knowledge. This is especially valuable in multilingual and low-resource NLP, where new data and tasks arrive over time and retraining from scratch is impractical.
The tutorial introduces major CL methodologies—regularization, replay, and architecture-based approaches—and shows how they support NLP-specific scenarios such as task-incremental, language-incremental, and joint task–language incremental learning. We discuss datasets and evaluation protocols for measuring forgetting and transfer in these settings.
A central part of the tutorial is devoted to continual learning for LLMs. We examine how CL ideas apply to large-scale pretraining, continual finetuning and instruction tuning, and evolving alignment objectives, as well as how they relate to practices like model merging and retrieval-augmented generation. The goal is to bridge classical CL and current LLM practice, and to highlight open challenges and opportunities.
Key Themes
- Foundations of continual learning and catastrophic forgetting in deep models.
- Core CL families: regularization-based, replay-based, and architecture/parameter-isolation methods.
- NLP continual learning setups: task-incremental, language-incremental, and joint task–language incremental learning.
- Datasets and metrics for evaluating forgetting, transfer, and stability–plasticity trade-offs in NLP.
- Continual pretraining for LLMs on evolving corpora (web, code, scientific text) and mixture control.
- Continual instruction tuning and finetuning: prompts, adapters, and interference-aware merging.
- Continual alignment as preferences and safety constraints evolve (e.g., RLHF-style updates over time).
- Connections between CL and LLM practices such as model merging and retrieval-augmented generation.
- Opportunities for CL in LLM agents and multilingual, low-resource deployment scenarios.
Tutorial Outline
Half-day format (~3.5 hours including a 30-minute break), organized into three main blocks.
-
1. Continual Learning Basics ~45 min
Motivation, catastrophic forgetting, and core CL scenarios (task-, domain-, and class-incremental). Overview of regularization, replay, and architecture-based methods, including EWC, distillation-based functional regularization, hypernetworks, parameter isolation, dynamic expansion, and replay buffers.
-
2. Continual Learning in NLP ~45 min
CL for NLP tasks and multilingual setups: task-incremental and language-incremental learning, and joint task–language incremental learning. Discussion of replay-, regularization-, and adapter-based methods, as well as datasets and metrics tailored to NLP.
-
3. Continual Learning in LLMs ~90 min
CL at the pretraining, finetuning, and alignment stages of large language models: continual pretraining on evolving corpora (e.g., temporal and domain-shifted data), continual instruction tuning (progressive prompts, adapter-based updates, model merging), and continual preference/alignment methods. Connections to retrieval-augmented generation and CL for LLM agents.
Tutorial Organizers
Selected References
- French, R. M. (1993). Catastrophic forgetting in connectionist networks: Can it be predicted? Proc. Cognitive Science Society.
- Liu, B. (2017). Lifelong machine learning: A paradigm for continuous learning. Frontiers of Computer Science.
- Kirkpatrick, J. et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS.
- Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A comprehensive survey of continual learning: Theory, method and application. IEEE TPAMI.
- Biesialska, M., Biesialska, K., & Costa-jussà, M. R. (2020). Continual lifelong learning in natural language processing: A survey. COLING.
- De Lange, M. et al. (2021). A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI.
- Satapara, S. & Srijith, P. K. (2024). TL-CL: Task and language incremental continual learning. EMNLP.
- Wu, T., Luo, L., Li, Y., Pan, S., Vu, T., & Haffari, G. (2024). Continual learning for large language models: A survey. arXiv:2402.01364.
- Lewis, P. et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS.