Copenhagen NLP Symposium 2025

The Copenhagen NLP Symposium is centered around highlighting NLP researchers from Denmark and other Nordic countries. The symposium features keynotes from renowned international researchers, a poster session for attendees to present their work, and dedicated time and space for networking across academia, industry, and students. The symposium is a free full-day event, taking place in the heart of Copenhagen.

Keynote Speakers

Loubna ben Allal is a Research Engineer at HuggingFace. Loubna leads efforts on training small Language Models (SmolLM & SmolLM2) and building pre-training datasets like Cosmopedia and FineWeb-Edu.
Title: The Rise of Smol Models
Abstract: On-device language models are revolutionizing AI by making advanced models accessible in resource-constrained environments. In this talk, we will explore the rise of small models and how they are reshaping the AI landscape, moving beyond the era of scaling to ever-larger models. We will also cover SmolLM, a series of compact yet powerful LLMs, focusing on data curation, and ways to leverage these models for on-device applications.

Marzieh Fadaee is a staff research scientist at Cohere Labs (formerly Cohere For AI) whose work centers on multilingual language models, data-efficient learning, and robust evaluation methods.
Title: Evaluating Language Models: A Mirror, a Microscope, and a Map
Abstract: Evaluation plays a central, but often underestimated, role in how large language models are developed and understood. This talk critically examines current evaluation practices, highlighting how they shape perceptions of model progress while often overlooking key challenges in robustness, multilingual performance, and real-world reliability. By reflecting on these gaps, I make the case for rethinking evaluation as a guiding force, and not just a final checkpoint, in building more capable, inclusive, and trustworthy LLMs.

Najoung Kim is an Assistant Professor at the Department of Linguistics and an affiliate faculty in the Department of Computer Science at Boston University.
Title: What does it take to convince ourselves that a system is exhibiting compositionality?
Abstract: Compositionality is often stated to be a desirable property for AI systems. But how do we evaluate this claim, and what evidence do we need to convince ourselves that a system is exhibiting this property? In this talk, I will start from the not-so-controversial bottom line that there can be no meaningful claims of compositionality without nontrivial commitments about the compositional machinery, building up towards the main claim that what we really want from an AI system is the availability of a process-compositional route. Then, I will discuss the role of behavioral and mechanistic evidence in convincing ourselves that such a route exists, featuring work on contextual inferences from adjective + noun compositions (with Hayley Ross and Kate Davidson) and on using Tensor Product Operations as a means to investigate symbol manipulation in neural networks (with Aditya Yedetore)

Kyle Lo is a research scientist at the Allen Institute for AI (Ai2), where he co-leads the OLMo project on open language modeling research. His current work focuses on data-driven approaches to model behavior and efficient language model experimentation. His research on language model development and adaptation, evaluation methods, and human-AI interaction has won awards at ACL, NAACL, EMNLP, EACL and CHI. Kyle’s work on language models for scientific research assistance—including fact checking, summarization, and augmented reading have been featured in Nature, Science, TechCrunch and other publications. Kyle holds a degree in Statistics from the University of Washington. Outside of work, he enjoys board games, boba tea, D&D, and spending time with his cat Belphegor.
Title: The OLMo Cookbook: Open Recipes for Language Model Data Curation
Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it can be challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities, risks and limitations. In this talk, I'll present how we approach data curation research for OLMo, our project to develop and share fully open language models. Reflecting on our journey from OLMo 1 to our latest release of OLMo 2, I'll explore how data curation practices have matured across our work and the broader open data research ecosystem. Finally, I'll examine key challenges and opportunities for open data amid a rapidly changing language model landscape.

Yohei Oseki is an Associate Professor at the University of Tokyo. Yohei investigates human linguistic intelligence as well as builds machines that process and learn natural language like humans, by comparing human language processing experimentally measured in cognitive and brain sciences, and machine language processing computationally implemented in natural language processing (NLP) and artificial intelligence.
Title: Small Language Models through Human-Like Learning Strategies
Abstract: Large language models (LLMs) have achieved remarkable success, thanks to the rapid development of NLP and AI, and outperform humans at various downstream tasks. However, those LLMs, despite their super-human performance, are pointed out as not efficient in terms of training data, model parameters, and computational resources. In this talk, I propose human-like learning strategies to efficiently train small language models (SLMs), building on insights from human language acquisition. Specifically, SLMs are trained on 100 million words of child-directed speech (CDS) through learning strategies such as curriculum learning, batch learning, indirect evidence, variation set, and working memory. The results suggest that inductive biases, inherent in both training data and language models, play an important role to efficiently train SLMs, with scientific implications for human language acquisition, as well as engineering applications to edge AIs and low-resource languages.

Program

08:30 - 09:00	Welcome
09:15 - 09:30	Opening Remarks
09:30 - 10:10	Speaker: Marzieh Fadaee
10:10 - 10:40	Break
10:45 - 11:25	Speaker: Najoung Kim
11:30 - 12:30	Poster Session
12:30 - 13:30	Lunch
13:30 - 14:10	Speaker: Yohei Oseki
14:20 - 15:00	Speaker: Loubna ben Allal
15:10 - 15:50	Speaker: Kyle Lo
15:50 - 16:00	Closing Remarks
16:00 - 17:00	Reception

Call for Posters

We invite researchers to present posters on any topic related to Natural Language Processing. Posters may showcase recently published work (e.g., at conferences or in journals) or ongoing research. The registration form contains a dedicated section to express an interest in presenting a poster. Please include the poster title, an abstract, and—if applicable—details about the publication venue. We are interested in receiving posters on any of the following topics:

Nordic Language Processing
Safety and Alignment in LLMs
AI/LLM Agents
Human-AI Interaction/Cooperation
Retrieval-Augmented Language Models
Mathematical, Symbolic, and Logical Reasoning in NLP
Computational Social Science, Cultural Analytics, and NLP for Social Good
Code Models
Interpretability, Model Editing, Transparency, and Explainability
LLM Efficiency
Generalizability and Transfer
Dialogue and Interactive Systems
Discourse, Pragmatics, and Reasoning
Low-resource Methods for NLP
Ethics, Bias, and Fairness
Natural Language Generation
Information Extraction and Retrieval
Linguistic theories, Cognitive Modeling and Psycholinguistics
Machine Translation
Multilinguality and Language Diversity
Multimodality and Language Grounding to Vision, Robotics and Beyond
Neurosymbolic approaches to NLP
Phonology, Morphology and Word Segmentation
Question Answering
Resources and Evaluation
Semantics: Lexical, Sentence-level Semantics, Textual Inference and Other areas
Sentiment Analysis, Stylistic Analysis, and Argument Mining
Speech Processing and Spoken Language Understanding
Summarization
Hierarchical Structure Prediction, Syntax, and Parsing
NLP Applications

Important Dates

Poster Submission Deadline	May 30, 2025
Registration Notifications	June 13, 2025
Workshop Date	June 20, 2025

Attend

How to get to Arbejdermuseet

The closest station is Nørreport station (metro and S Train).
For tickets and travel cards, you can either buy them at any station or use the Rejsekort app for convenient access to public transport.

Useful Links

Read an extensive list of restaurants and bars here
Visit the Copenhagen Neighborhood Guide here
Check what to see and do in Copenhagen here

Dining Near the Venue

Slurp Ramen Joint
Cuisine: Japanese Ramen
Address: Nansensgade 90, 1366 Copenhagen
Notes: Handmade noodles, rich broth, expect possibly queues
Hanoi Alley
Cuisine: Vietnamese
Address: Nørrebrogade 62A, 2200 Copenhagen
Notes: Excellent Vietnamese food at a decent price
Torvehallerne KBH
Type: Food Market
Address: Frederiksborggade 21, 1360 København K
Notes: Indoor market with a wide range of food stalls (open daily)
Flindt & Ørsted
Type: Café/Bar
Address: Nørre Farimagsgade 6, 1364 Copenhagen (inside Ørstedsparken)
Notes: Cozy café and bar with outdoor seating
Sporvejen
Cuisine: Burgers
Address: Gråbrødretorv 17, 1154 København K
Notes: Retro-themed diner with great-value burgers
Poulette
Cuisine: Fried Chicken & Tofu Sandwiches
Address: Møllegade 1, 2200 København N
Notes: Nashville-style spicy chicken sandwiches
Diamond Slice
Cuisine: New York-Style Pizza
Address: Blågårdsgade 27, 2200 København N
Notes: Creative pizza toppings, casual vibe
Ramen to Bíiru (Nørrebro)
Cuisine: Ramen & Craft Beer
Address: Griffenfeldsgade 28, 2200 København N
Notes: Japanese ramen with Danish craft beer
Gasoline Grill
Cuisine: Burgers
Address: Landgreven 10, 1301 København K
Notes: Famous for juicy, organic burgers from a former gas station
Bangkok Cantine
Cuisine: Thai
Address: Nørre Allé 13, 2200 København
Notes: Family-run Thai restaurant. Pretty small, so it may not accommodate large groups