Profile

Currently co-founder at Kapitel Software GmbH.
I completed my PhD in December 2022 and successfully defended it Feb. 2023. You can access the thesis here.
(And if you like you can call me Dr. rer. nat. David Lassner!)

Contact

firstname at kapitel-software.de
@scholar.social/@davidlassner
twitter.com/davidlassner
github.com/millawell
orcid.org/0000-0001-9013-0834

Research

Thesis

David Lassner. Analysis of textual variants with robust machine learning methods: Towards novel insights for the digital humanities. 2023.
Referees: Klaus-Robert Müller, Anne Baillot, Christiane Fellbaum.
doi.org/10.14279/depositonce-17717

Prizes

Winner of the 2021 Rahtz Prize for TEI Ingenuity.
It has been awarded for the Standoff Converter, an open source Python package for the bidirectional conversion between TEI and standoff that makes it easy to apply NLP models on existing TEI and add the results to the TEI document.
github.com/standoff-nlp/standoffconverter

Journal Articles

Domain-Specific Word Embeddings with Structure Prediction
David Lassner*, Stephanie Brandl*, Anne Baillot and Shinichi Nakajima
Transactions of the Association for Computational Linguistics 2023
(*equal contribution)

From Graphs to Word Embeddings. Mathematical and visualization tools for literary studies.
Anne Baillot, David Lassner.
Germanica, 71 | 2022, 191-203.

Publishing an OCR ground truth data set for reuse in an unclear copyright setting.
David Lassner, Julius Coburger, Clemens Neudecker, Anne Baillot.
Zeitschrift für digitale Geisteswissenschaften and Melusina Press 2021 (Sonderband 5).
zfdg.de/sb005_006

Automatic Identification of Types of Alterations in Historical Manuscripts.
David Lassner, Anne Baillot, Sergej Dogadov, Klaus-Robert Müller, Shinichi Nakajima.
Digital Humanities Quartely (2021 15.2)

Book Chapters

Bridging the Gap Between Digital Humanities and Natural Language Processing: A Pedagogical Imperative for Humanistic NLP.
Toma Tasovac, Nick Budak, Natalia Ermolaev, Andrew Janco, David Lassner
In: Multilingual Digital Humanities. ed. by Lorella Viola, Paul Spence.
Routledge 2024, forthcoming.

Conference Talks

Humanistic NLP: Bridging the Gap Between Digital Humanities and Natural Language Processing
Toma Tasovac, Natalia Ermolaev, Andrew Janco, David Lassner, Nick Budak
DH 2023 Conference. Graz Jul 2023.

The Standoff Converter. A standoff-based approach to work on TEI documents in Python that connects the world of digital philology with NLP.
David Lassner.
TEI Conference and Members’ Meeting 2021.
Rehearsal take on yt: youtu.be/JEQ3ChonZz8

Attributions Of Early German Shakespeare Translations.
David Lassner, Anne Baillot, Julius Coburger.
DH 2019 Conference. Utrecht Jul 2019.

What comes next? Finding connections between word embeddings.
David Lassner, Stephanie Brandl
EADH 2018 Conference. Galway 2018.

Finding reasons for modifications in historical manuscripts
David Lassner, Anne Baillot, Sergej Dogadov, Klaus-Robert Müller, Shinichi Nakajima
AIUCD 2017 Conference. Rome 2017.

Poster Presentations

Times are changing: Investigating the pace of language change in diachronic word embeddings.
Stephanie Brandl, David Lassner.
Workshop on Computational Approaches to Historical Language Change at the ACL 2019 Florence 2019.

Variational Inference: Finding reasons for modifications in historical manuscripts.
David Lassner.
DH 2017 Conference. Montreal 2017

Panels

Multilingual NLP as Interface.
David Bamman, Quinn Dombrowski, Natalia Ermolaev, Andrew Janco, Toma Tasovac, Melanie Walsh, David Lassner.
DARIAH Annual Event 2021: Interfaces.

Zeitgeist in NLP.
Moderating the topical table on ML and NLP in the Digital Humanities.
Oliver Eberle and David Lassner.
ICML 2021 Social Event

Datasets

Data set of the paper “Publishing an OCR ground truth data set for reuse in an unclear copyright setting” (Version 1.1).
David Lassner, Julius Coburger, Clemens Neudecker, & Anne Baillot. (2021).
doi.org/10.5281/zenodo.4742068

Early Corona Twitter Dataset.
Stephanie Brandl, David Lassner
hal.archives-ouvertes.fr/hal-02861167

Workshops

Digital Humanities Applications of spaCy's Span Categorizer
Edward Schmul, Ákos Kádár, Andrew Janco, David Lassner, Nick Budak, Toma Tasovac, Natalia Ermolaev, Jajwalya Karajgikar
DH 2023 Conference. Graz Jul 2023.

Organiser and instructor at the workshop for DH and ML.
BIFOLD and TU Berlin. Jul 2021.
workshop-dh.ml.tu-berlin.de

Instructor at a series of workshops on NLP for low-resource languages, original title: “New Languages for NLP. Building Linguistic Diversity in the Digital Humanities.”
Center for Digital Humanities. Princeton University. 2021 - 2022.
newnlp.princeton.edu

Workshop on ML bias in DH, original title: “Bias in Datensätzen und ML-Modellen. Erkennung und Umgang in den DH.”
DHd 2020 Conference. Paderborn Mar 2020
Together with Stephanie Brandl and Anne Baillot
bias-ml-dh.davidlassner.com

Workshop on NLP for DH with SpaCy.
COST Action Budapest Sep 2019.
Together with Andrew Janco and Leonard Konle

Workshop on NLP for DH with SpaCy.
DH 2019 Conference. Utrecht Jul 2019.
Together with Andrew Janco and Seth Bernstein

Invited Talks

David Lassner:
Machine Learning and its Applications in Literary Studies.
MathEnJeans Congrès de Berlin-Potsdam 2023.
Potsdam Mar. 2023

Anne Baillot, David Lassner:
Krieg gegen Frankreich? Frankreich-Hass bei Fontane im Kontext der Kriegsberichte seiner Zeit
Tagung des Fontanearchivs: Zwischen den Linien.
Theodor Fontane und der Deutsch-Französische Krieg 1870/71
Potsdam Jun. 2021

Natural Language Processing and the application to Digital Humanities
Tempton Next Level: Meet the Geek.
Cologne Oct. 2020.

Document Variants, Translation Styles and Dispute on Social Media
Workshop Digital Tools for Letters, Languages and Humanities
Le Mans Feb 2020.

HR in the era of Machine Learning
Workshop Bundesverband der Personalmanager at PWC
Berlin May 2019.

Blog Posts

On the workshop on Digital Humanities and Machine Learning that I organized at the BIFOLD/TU Berlin.
digitalintellectuals.hypotheses.org/4256

Bericht aus dem Workshop zu Bias in Datensätzen und ML-Modellen. Erkennung und Umgang in den DH.
Gastbeitrag im Blog von Anne Baillot.
digitalintellectuals.hypotheses.org/3262

Preprints

Von Graphen zu Word Embeddings. Zur Entwicklung des mathematischen und visuellen Instrumentariums der Literaturwissenschaft.
Anne Baillot and David Lassner. 2022.
hal.archives-ouvertes.fr/hal-03687146

Balancing the composition of word embeddings across heterogenous data sets.
Stephanie Brandl, David Lassner, Maximilian Alber.
arxiv.org/abs/2001.04693

Automatic Identification of Types of Alterations in Historical Manuscripts.
David Lassner, Anne Baillot, Sergej Dogadov, Klaus-Robert Müller, Shinichi Nakajima
hal.archives-ouvertes.fr/hal-02512217v2

Pet Projects

Budenblaetter, a paper print calendar full of life's found footage collages and found objects, accumulations and experiments: short prose to pictures to poetry
budenblaetter.de

Alphabattle, word guessing game that I did with a friend.
Use your vocabulary, your detective skills, and your knowledge of your friend's personality to guess the word in a game of alphabattle.xyz

§