Papers
arxiv:2602.05879

EuroLLM-22B: Technical Report

Published on Feb 5
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

EuroLLM-22B is a multilingual language model designed to support European languages with strong performance across reasoning, instruction following, and translation tasks.

AI-generated summary

This report presents EuroLLM-22B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-22B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. Across a broad set of multilingual benchmarks, EuroLLM-22B demonstrates strong performance in reasoning, instruction following, and translation, achieving results competitive with models of comparable size. To support future research, we release our base and instruction-tuned models, our multilingual web pretraining data and updated EuroBlocks instruction datasets, as well as our pre-training and evaluation codebases.

Community

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 1

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.
Paper page - EuroLLM-22B: Technical Report
Papers
arxiv:2602.05879

EuroLLM-22B: Technical Report

Published on Feb 5
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

EuroLLM-22B is a multilingual language model designed to support European languages with strong performance across reasoning, instruction following, and translation tasks.

AI-generated summary

This report presents EuroLLM-22B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-22B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. Across a broad set of multilingual benchmarks, EuroLLM-22B demonstrates strong performance in reasoning, instruction following, and translation, achieving results competitive with models of comparable size. To support future research, we release our base and instruction-tuned models, our multilingual web pretraining data and updated EuroBlocks instruction datasets, as well as our pre-training and evaluation codebases.

Community

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 1

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.