The Copenhagen NLP Symposium is centered around highlighting NLP researchers from Denmark and other Nordic countries. The symposium features keynotes from renowned international researchers, a poster session for attendees to present their work, and dedicated time and space for networking across academia, industry, and students. The symposium is a free full-day event, taking place in the heart of Copenhagen.
Loubna ben Allal
is a Research Engineer at HuggingFace. Loubna leads efforts on training small Language Models
(SmolLM & SmolLM2) and building pre-training datasets like Cosmopedia and FineWeb-Edu.
Title: The Rise of Smol Models
Abstract: On-device language models are revolutionizing AI by making advanced models accessible in resource-constrained environments. In this talk, we will explore the rise of small models and how they are reshaping the AI landscape, moving beyond the era of scaling to ever-larger models. We will also cover SmolLM, a series of compact yet powerful LLMs, focusing on data curation, and ways to leverage these models for on-device applications.
Marzieh Fadaee is a staff research scientist at Cohere Labs (formerly Cohere For AI) whose work centers on multilingual language models, data-efficient learning, and robust evaluation methods.
Title: Evaluating Language Models: A Mirror, a Microscope, and a Map
Abstract: Evaluation plays a central, but often underestimated, role in how large language models are developed and understood. This talk critically examines current evaluation practices, highlighting how they shape perceptions of model progress while often overlooking key challenges in robustness, multilingual performance, and real-world reliability. By reflecting on these gaps, I make the case for rethinking evaluation as a guiding force, and not just a final checkpoint, in building more capable, inclusive, and trustworthy LLMs.
Najoung Kim is an Assistant Professor at the Department of Linguistics and an affiliate faculty in the Department of Computer Science at Boston University.
Title: What does it take to convince ourselves that a system is exhibiting compositionality?
Abstract: Compositionality is often stated to be a desirable property for AI systems. But how do we evaluate this claim, and what evidence do we need to convince ourselves that a system is exhibiting this property? In this talk, I will start from the not-so-controversial bottom line that there can be no meaningful claims of compositionality without nontrivial commitments about the compositional machinery, building up towards the main claim that what we really want from an AI system is the availability of a process-compositional route. Then, I will discuss the role of behavioral and mechanistic evidence in convincing ourselves that such a route exists, featuring work on contextual inferences from adjective + noun compositions (with Hayley Ross and Kate Davidson) and on using Tensor Product Operations as a means to investigate symbol manipulation in neural networks (with Aditya Yedetore)
Kyle Lo is a research scientist at the Allen Institute for AI (Ai2), where he co-leads the OLMo project on open language modeling research. His current work focuses on data-driven approaches to model behavior and efficient language model experimentation. His research on language model development and adaptation, evaluation methods, and human-AI interaction has won awards at ACL, NAACL, EMNLP, EACL and CHI. Kyle’s work on language models for scientific research assistance—including fact checking, summarization, and augmented reading have been featured in Nature, Science, TechCrunch and other publications. Kyle holds a degree in Statistics from the University of Washington. Outside of work, he enjoys board games, boba tea, D&D, and spending time with his cat Belphegor.
Title: The OLMo Cookbook: Open Recipes for Language Model Data Curation
Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it can be challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities, risks and limitations. In this talk, I'll present how we approach data curation research for OLMo, our project to develop and share fully open language models. Reflecting on our journey from OLMo 1 to our latest release of OLMo 2, I'll explore how data curation practices have matured across our work and the broader open data research ecosystem. Finally, I'll examine key challenges and opportunities for open data amid a rapidly changing language model landscape.
Yohei
Oseki is an Associate Professor at the University of Tokyo. Yohei investigates human
linguistic intelligence as well as builds machines that process and learn natural language like
humans, by comparing human language processing experimentally measured in cognitive and brain
sciences, and machine language processing computationally implemented in natural language processing
(NLP) and artificial intelligence.
Title: Small Language Models through Human-Like Learning Strategies
Abstract: Large language models (LLMs) have achieved remarkable success, thanks to the rapid development of NLP and AI, and outperform humans at various downstream tasks. However, those LLMs, despite their super-human performance, are pointed out as not efficient in terms of training data, model parameters, and computational resources. In this talk, I propose human-like learning strategies to efficiently train small language models (SLMs), building on insights from human language acquisition. Specifically, SLMs are trained on 100 million words of child-directed speech (CDS) through learning strategies such as curriculum learning, batch learning, indirect evidence, variation set, and working memory. The results suggest that inductive biases, inherent in both training data and language models, play an important role to efficiently train SLMs, with scientific implications for human language acquisition, as well as engineering applications to edge AIs and low-resource languages.
08:30 - 09:00 | Welcome |
09:00 - 09:15 | Opening Remarks |
09:30 - 10:10 | Keynote Talk 1 |
10:10 - 10:30 | Break |
10:45 - 11:25 | Keynote Talk 2 |
11:30 - 12:30 | Poster Session |
12:30 - 13:30 | Lunch |
13:30 - 14:10 | Keynote Talk 3 |
14:20 - 15:00 | Keynote Talk 4 |
15:10 - 15:50 | Keynote Talk 5 |
15:50 - 16:00 | Closing Remarks |
16:00 - 17:00 | Reception |
We invite researchers to present posters on any topic related to Natural Language Processing. Posters may showcase recently published work (e.g., at conferences or in journals) or ongoing research. The registration form contains a dedicated section to express an interest in presenting a poster. Please include the poster title, an abstract, and—if applicable—details about the publication venue. We are interested in receiving posters on any of the following topics:
Poster Submission Deadline | May 30, 2025 |
Registration Notifications | June 13, 2025 |
Workshop Date | June 20, 2025 |
The closest station is Nørreport station (metro and S Train).
For tickets and travel cards, you can either buy them at any station or use the
Rejsekort app
for convenient access to public transport.
Read an extensive list of restaurants and bars here
Visit the Copenhagen Neighborhood Guide here
Check what to see and do in Copenhagen here
Russa Biswas
Aalborg University Copenhagen
Ernests Lavrinovics
Aalborg University Copenhagen
Ingo Ziegler
University of Copenhagen
Danae Sanchez Villegas
University of Copenhagen
Arzu Burcu Güven
IT University Copenhagen
Andreas Geert Motzfeldt
IT University Copenhagen
Please send all inquiries to email or contact any of the organisers via their email addresses.