Research Daily: Top AI papers of the day

Get these updates on email!

ArXiv Paper Title:
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation

October 21, 2024

Keywords:
Large Language Models, Machine Translation, Rationale Distillation, Catastrophic Forgetting

Read the paper on ArXiv Graph showing translation performance and general ability preservation using RaDis

Boosting LLM Translation Without Losing Smarts

Introduction: The Problem of Catastrophic Forgetting

Large Language Models (LLMs) are amazing, excelling at many tasks. But when it comes to machine translation (MT), traditional fine-tuning methods often cause a frustrating problem: catastrophic forgetting. The LLM gets better at translating, but loses some of its general knowledge and ability to perform other tasks. This is a big deal, especially for safety and instruction-following capabilities, which are often developed with proprietary data.

This research paper introduces a clever solution called RaDis (Rationale Distillation) that tackles this head-on. Instead of just training the LLM on translations, RaDis uses the LLM's own ability to generate rationales – basically explanations – for its translations. These rationales act as a form of "replay data," helping the LLM retain its general knowledge while learning new translation skills.

RaDis: A Novel Approach to LLM Fine-tuning

The core of RaDis is simple yet effective:

  1. LLM Rationale Generation: The researchers observed that instruction-tuned LLMs often generate detailed rationales when asked to translate. These rationales contain general knowledge and safety principles.

  2. Enriched Training Data: RaDis uses the LLM to generate these rationales for the training data's reference translations. These rationales are then combined with the reference translations to create an enriched dataset.

  3. Combined Loss Function: The model is trained on this enriched dataset using a loss function that includes both a standard translation loss and a self-distillation loss on the rationale tokens. This self-distillation encourages the model to retain the knowledge encapsulated in the rationales.

Experiment Results: Success Without Forgetting

The researchers tested RaDis on two popular LLMs: LLaMA-2-7B-Chat and Mistral-7B-Instruct-v0.2. The results were impressive:

RaDis: Strengths, Limitations, and Future Directions

Strengths:

Limitations:

Future Directions:

Conclusion: A Promising Step Forward

RaDis presents a promising new approach to fine-tuning LLMs for translation. By cleverly leveraging the LLMs' own ability to generate rationales, RaDis offers a path towards creating more versatile and robust LLMs that excel in specialized tasks without sacrificing their overall intelligence and safety. The research opens exciting avenues for future work in mitigating catastrophic forgetting and building more capable and reliable LLMs.