Continual Learning in Large Language Models: Foundations to Frontiers

A half-day tutorial on continual learning methods for NLP and large language models, covering core CL techniques, NLP-specific setups across tasks and languages, and emerging strategies for continual pretraining, instruction tuning, and alignment in LLMs.

Continual Learning · NLP · LLMs Multilingual & Low-resource Settings
Date & Time: 23 Dec 2025, 14:00 - 17:30 IST
Overview

Abstract

Continual learning (CL) equips deep models to learn a sequence of tasks under resource constraints while retaining previously acquired knowledge. This is especially valuable in multilingual and low-resource NLP, where new data and tasks arrive over time and retraining from scratch is impractical.

The tutorial introduces major CL methodologies—regularization, replay, and architecture-based approaches—and shows how they support NLP-specific scenarios such as task-incremental, language-incremental, and joint task–language incremental learning. We discuss datasets and evaluation protocols for measuring forgetting and transfer in these settings.

A central part of the tutorial is devoted to continual learning for LLMs. We examine how CL ideas apply to large-scale pretraining, continual finetuning and instruction tuning, and evolving alignment objectives, as well as how they relate to practices like model merging and retrieval-augmented generation. The goal is to bridge classical CL and current LLM practice, and to highlight open challenges and opportunities.

What You Will Learn

Key Themes

  • Foundations of continual learning and catastrophic forgetting in deep models.
  • Core CL families: regularization-based, replay-based, and architecture/parameter-isolation methods.
  • NLP continual learning setups: task-incremental, language-incremental, and joint task–language incremental learning.
  • Datasets and metrics for evaluating forgetting, transfer, and stability–plasticity trade-offs in NLP.
  • Continual pretraining for LLMs on evolving corpora (web, code, scientific text) and mixture control.
  • Continual instruction tuning and finetuning: prompts, adapters, and interference-aware merging.
  • Continual alignment as preferences and safety constraints evolve (e.g., RLHF-style updates over time).
  • Connections between CL and LLM practices such as model merging and retrieval-augmented generation.
  • Opportunities for CL in LLM agents and multilingual, low-resource deployment scenarios.
Structure

Tutorial Outline

Half-day format (~3.5 hours including a 30-minute break), organized into three main blocks.

  • 1. Continual Learning Basics ~45 min

    Motivation, catastrophic forgetting, and core CL scenarios (task-, domain-, and class-incremental). Overview of regularization, replay, and architecture-based methods, including EWC, distillation-based functional regularization, hypernetworks, parameter isolation, dynamic expansion, and replay buffers.

  • 2. Continual Learning in NLP ~45 min

    CL for NLP tasks and multilingual setups: task-incremental and language-incremental learning, and joint task–language incremental learning. Discussion of replay-, regularization-, and adapter-based methods, as well as datasets and metrics tailored to NLP.

  • 3. Continual Learning in LLMs ~90 min

    CL at the pretraining, finetuning, and alignment stages of large language models: continual pretraining on evolving corpora (e.g., temporal and domain-shifted data), continual instruction tuning (progressive prompts, adapter-based updates, model merging), and continual preference/alignment methods. Connections to retrieval-augmented generation and CL for LLM agents.

Organizers

Tutorial Organizers

Photo of P. K. Srijith
P. K. Srijith
Associate Professor, Dept. of CSE & AI, IIT Hyderabad, India
Photo of Shrey Satapara
Shrey Satapara
Researcher-II, AI Lab, Fujitsu Research of India
Photo of Sarath Chandar
Sarath Chandar
Canada CIFAR AI Chair & Associate Professor, Polytechnique Montréal · Core Faculty, Mila
Reading

Selected References

  1. French, R. M. (1993). Catastrophic forgetting in connectionist networks: Can it be predicted? Proc. Cognitive Science Society.
  2. Liu, B. (2017). Lifelong machine learning: A paradigm for continuous learning. Frontiers of Computer Science.
  3. Kirkpatrick, J. et al. (2017). Overcoming catastrophic forgetting in neural networks. PNAS.
  4. Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A comprehensive survey of continual learning: Theory, method and application. IEEE TPAMI.
  5. Biesialska, M., Biesialska, K., & Costa-jussà, M. R. (2020). Continual lifelong learning in natural language processing: A survey. COLING.
  6. De Lange, M. et al. (2021). A continual learning survey: Defying forgetting in classification tasks. IEEE TPAMI.
  7. Satapara, S. & Srijith, P. K. (2024). TL-CL: Task and language incremental continual learning. EMNLP.
  8. Wu, T., Luo, L., Li, Y., Pan, S., Vu, T., & Haffari, G. (2024). Continual learning for large language models: A survey. arXiv:2402.01364.
  9. Lewis, P. et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS.