Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
AI & ML interests
At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities
Recent Activity
Organization Card
Helsinki-NLP refers to the language technology research group at the University of Helsinki. Here, we publish various resource related to multilingual NLP, machine translation, text simplification to name a few application areas. We focus on wide language coverage, open data sets and public pre-trained models.
models
1,536
Helsinki-NLP/opus-mt-eo-caenes
Translation
•
76.9M
•
Updated
•
1
Helsinki-NLP/opus-mt-caenes-eo
Translation
•
76.9M
•
Updated
•
2
Helsinki-NLP/opus-mt-fr-en
Translation
•
75.2M
•
Updated
•
800k
•
•
50
Helsinki-NLP/opus-mt-synthetic-en-eu
Updated
•
68
•
1
Helsinki-NLP/opus-mt-synthetic-en-mk
Updated
•
110
Helsinki-NLP/opus-mt-synthetic-en-ka
Updated
•
167
Helsinki-NLP/opus-mt-synthetic-en-so
Updated
•
121
•
1
Helsinki-NLP/opus-mt-synthetic-en-is
Updated
•
120
•
1
Helsinki-NLP/opus-mt-synthetic-en-uk
Updated
•
143
Helsinki-NLP/opus-mt-synthetic-en-gd
Updated
•
129
datasets
51
Helsinki-NLP/nemotron-cc-translated
Viewer
•
Updated
•
5.79B
•
9.93k
•
2
Helsinki-NLP/fineweb-edu-translated
Preview
•
Updated
•
206k
•
4
Helsinki-NLP/OpenSubtitles2024
Viewer
•
Updated
•
570M
•
108
•
2
Helsinki-NLP/shroom
Preview
•
Updated
•
2
Helsinki-NLP/mu-shroom
Viewer
•
Updated
•
11.5k
•
169
•
4
Helsinki-NLP/tatoeba_mt_train
Viewer
•
Updated
•
13.7B
•
176
•
5
Helsinki-NLP/tatoeba_mt
Updated
•
2.26k
•
61
Helsinki-NLP/un_pc
Viewer
•
Updated
•
323M
•
1.6k
•
26
Helsinki-NLP/un_ga
Viewer
•
Updated
•
1.11M
•
717
•
3
Helsinki-NLP/opus_books
Viewer
•
Updated
•
1.25M
•
16.1k
•
86