We invite submissions to the First Workshop on Evaluating LLMs for Specialized Domains (Eval4SD), to be held co-located with KONVENS 2026 in Hamburg, Germany (September 14th - 17th).
The workshop focuses on the evaluation of large language models in specialized domains such as—but not limited to—law, medicine, science, finance, digital humanities, social sciences, education, and politics. In this space, we have identified three core areas detailed below: LLM Benchmarking, Domain Research Replication, and Evaluation Methodology. Work that fits within the general theme but not any of the focus areas is also welcome!
Topics of Interest
- LLM Benchmarking: We invite contributions that evaluate multiple models, datasets, inference methods, or prompting techniques on existing data or introduce novel, specialized benchmarking datasets. Papers in this direction may seek to answer questions like: ‘Which model should I use for my social science project?’ ‘Are open-weight models inferior for specialized tasks?’, or ‘Given a limited budget, what is my best choice of LLM for my digital humanities question?’ We especially encourage submissions that evaluate performance in low- and medium-resource languages.
- Domain Research Replication: Does information automatically extracted using a different model or a slightly altered approach still support the same domain conclusions? We invite submissions that attempt to replicate existing domain research using a tweaked LLM setup. For us, testing open-weight models is especially important in light of replicability. We are excited to see how robust domain research is to adaptations of the automation setups, from prompting to model weights and training data.
- Metrics and Evaluation Methodology: We invite submissions on methodology for assessing LLM outputs in complex tasks. This includes work on LLM judge setups or novel rule-based metrics for specialized tasks.
Submission Types
We allow submissions in two categories:
- Long Papers (up to 8 pages + references): Complete research contributions with novel findings, experimental results, and thorough analysis. Suitable for mature work on LLM evaluation methodology or new benchmark proposals.
- Short & Position Papers (up to 4 pages + references): Preliminary results, position papers, system descriptions, and focused contributions. Great for provocative arguments or narrowly scoped empirical studies.
Submissions follow the ACL template; reviews are double-blind and are conducted via OpenReview.
Additionally, we welcome non-archival submissions to present recently published work or seek feedback on work-in-progress without violating dual-submission policies. Accepted papers will be presented at the workshop, but will not be included in the official proceedings.
Important Dates
| Event | Date |
|---|---|
| Submissions open | May 04, 2026 |
| Submission deadline | July 03, 2026 |
| Notification of acceptance | July 31, 2026 |
| Camera-ready deadline | August 15, 2026 |
| Workshop date | September 2026 |
All deadlines are 11:59 PM CEST.
Submission Link
Submissions via OpenReview.
All submissions undergo double-blind peer review. Submitted papers (except non-archival submissions) must not be under review elsewhere.