Continual Learning in Large Language Models: Foundations to Frontiers

Overview

Abstract

Continual learning (CL) equips deep models to learn a sequence of tasks under resource constraints while retaining previously acquired knowledge. This is especially valuable in multilingual and low-resource NLP, where new data and tasks arrive over time and retraining from scratch is impractical.

The tutorial introduces major CL methodologies—regularization, replay, and architecture-based approaches—and shows how they support NLP-specific scenarios such as task-incremental, language-incremental, and joint task–language incremental learning. We discuss datasets and evaluation protocols for measuring forgetting and transfer in these settings.

A central part of the tutorial is devoted to continual learning for LLMs. We examine how CL ideas apply to large-scale pretraining, continual finetuning and instruction tuning, and evolving alignment objectives, as well as how they relate to practices like model merging and retrieval-augmented generation. The goal is to bridge classical CL and current LLM practice, and to highlight open challenges and opportunities.

What You Will Learn

Key Themes

Foundations of continual learning and catastrophic forgetting in deep models.
Core CL families: regularization-based, replay-based, and architecture/parameter-isolation methods.
NLP continual learning setups: task-incremental, language-incremental, and joint task–language incremental learning.
Datasets and metrics for evaluating forgetting, transfer, and stability–plasticity trade-offs in NLP.
Continual pretraining for LLMs on evolving corpora (web, code, scientific text) and mixture control.
Continual instruction tuning and finetuning: prompts, adapters, and interference-aware merging.
Continual alignment as preferences and safety constraints evolve (e.g., RLHF-style updates over time).
Connections between CL and LLM practices such as model merging and retrieval-augmented generation.
Opportunities for CL in LLM agents and multilingual, low-resource deployment scenarios.

Structure

Tutorial Outline

Half-day format (~3.5 hours including a 30-minute break), organized into three main blocks.

1. Continual Learning Basics ~45 min

Motivation, catastrophic forgetting, and core CL scenarios (task-, domain-, and class-incremental). Overview of regularization, replay, and architecture-based methods, including EWC, distillation-based functional regularization, hypernetworks, parameter isolation, dynamic expansion, and replay buffers.
2. Continual Learning in NLP ~45 min

CL for NLP tasks and multilingual setups: task-incremental and language-incremental learning, and joint task–language incremental learning. Discussion of replay-, regularization-, and adapter-based methods, as well as datasets and metrics tailored to NLP.
3. Continual Learning in LLMs ~90 min

CL at the pretraining, finetuning, and alignment stages of large language models: continual pretraining on evolving corpora (e.g., temporal and domain-shifted data), continual instruction tuning (progressive prompts, adapter-based updates, model merging), and continual preference/alignment methods. Connections to retrieval-augmented generation and CL for LLM agents.

Organizers

Tutorial Organizers

P. K. Srijith

Associate Professor, Dept. of CSE & AI, IIT Hyderabad, India

Shrey Satapara

Researcher-II, AI Lab, Fujitsu Research of India

Sarath Chandar

Canada CIFAR AI Chair & Associate Professor, Polytechnique Montréal · Core Faculty, Mila

Reading

Selected References

French, R. M. (1993). Catastrophic forgetting in connectionist networks: Can it be predicted? Proc. Cognitive Science Society.
Liu, B. (2017). Lifelong machine learning: A paradigm for continuous learning. Frontiers of Computer Science.
Kirkpatrick, J. et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS.
Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A comprehensive survey of continual learning: Theory, method and application. IEEE TPAMI.
Biesialska, M., Biesialska, K., & Costa-jussà, M. R. (2020). Continual lifelong learning in natural language processing: A survey. COLING.
De Lange, M. et al. (2021). A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI.
Satapara, S. & Srijith, P. K. (2024). TL-CL: Task and language incremental continual learning. EMNLP.
Wu, T., Luo, L., Li, Y., Pan, S., Vu, T., & Haffari, G. (2024). Continual learning for large language models: A survey. arXiv:2402.01364.
Lewis, P. et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS.