• Budget: EUR 120,000
  • Duration: 1.1.2009 – 31.12.2012
  • Funding: University of Helsinki Research Funds
  • Project leader: Dr. Teemu Roos
  • Keywords: stemmatology, textual criticism, phylogenetics

Algorithmic Methods in Stemmatology (STAM)
Research Project at HIIT

Given a collection of imperfect copies of a textual document, the aim of stemmatology is to reconstruct the history of the text, indicating for each variant the source text from which it was copied. The project develops theory and methods for computer-assisted stemmatology, and evaluates the accuracy of such methods in simulated and real data-sets.

Stemmatology lies at the intersection of several scientific disciplines. On one hand, it is associated with humanities where texts are used as sources, and on the other
Solving ancestral states by dynamic programming. Photo: Tuula Roos.

Computer-based visualization of multiple texts; see (Merivuori & Roos, 2009).
hand, to mathematics, statistics, and computer science, and finally, to evolutionary biology and cladistics, the study evolution and speciation. The aim of traditional stemmatology — or textual criticism — has been to infer the original content of a textual source based on a number of different versions. Modern computer-assisted stemmatology has proven to be an extremely powerful tool not only for the study of the alteration of texts but in giving insight to the way the texts have been distributed geographically as well. In doing so, stemmatology is answering several central questions in historical, philological, and theological research.

Our objective is to develop reliable methods and tools for the study of the origins, variation, and distribution of texts. An easy-to-use method available on the internet, based on a sound methodology, would significantly benefit a large group of scholars in a variety humanistic disciplines. In computer science applications include, e.g., the study of computer viruses and chain letters. Advances in methods for textual scholarship also contribute to cladistics and evolutionary biology.

The project is associated with two other projects focusing on related topics: project Suomen keskiajan kirjallinen kulttuuri (2007-2010), lead by Prof. Tuomas Heikkilä at the Department of History, and a Science Workshop on Stemmatology (2009-2010), funded by the Finnish Cultural Foundation, lead by Prof. Petri Myllymäki, Prof. Heikkilä and Dr. Roos.

The work is carried out mainly within the Cosco group at the Helsinki Institute for Information Technology HIIT.


Teemu Roos, PhD
Senior researcher (project leader)
Petri Myllymäki, PhD
Professor, Department of Computer Science
Tuomas Heikkilä, PhD
Professor, Department of History
Simo Linkola
Research assistant
Yuan Zou, MSc
PhD student

Past students

Anupam Arohi, MSc
Toni Merivuori, MSc


  1. T. Roos and T. Heikkilä, (2009). Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets, Literary and Linguistic Computing 24:4, pp. 417–433.
  2. T. Merivuori, (2009). Normalisoitu kompressioetäisyys: katsaus sovelluksiin ("Normalized compression distance: A review on applications"), Master's Thesis, Department of Computer Science, University of Helsinki.
  3. T. Merivuori and T. Roos, (2009). Some observations on the applicability of normalized compression distance to stemmatology, in Proc. 2nd Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-09), Tampere, Finland, August 17–19.
  4. P.-H. Lai, T. Roos, and J. O'Sullivan, (2010). MDL hierarchical clustering for stemmatology, IEEE International Symposium on Information Theory (ISIT-10), Austin, Texas, June 13–18.
  5. Y. Zou, (2010). Structural EM methods in phylogenetics and stemmatology, Master's Thesis, Department of Computer Science, University of Helsinki.
  6. A. Arohi, (2011). Structural EM—An Algorithmi Method in Stemmatology, Master's Thesis, Department of Computer Science, University of Helsinki.
  7. T. Roos and Y. Zou, (2011). Analysis of Textual Variation by Latent Tree Structures, in Proc. IEEE International Conference on Data Mining.



May 18, 2009, Helsinki. "Darwin – banaanikärpänen – stemmatologia". Colloquium organized by the VARIANTTI network on textual criticism (in Finnish). Speakers: Tuomas Heikkilä and Teemu Roos.
»» event details

May 29, 2009, Bern. Tuomas Heikkilä speaks about experiments with artificial manuscript traditions in a one-day symposium organized by H.F. Windram, C.J. Howe, and M. Stolz.

June 17, 2009, Tikkurila. Teemu Roos gives an introduction to computer-assisted stemmatology at the VARIENG Spring Excursion. Place: Finnish Science Center, Heureka.
»» VARIENG unit

August 1, 2009. Yuan Zou joins STAM as a full-time employee.

August 19, Tampere. Toni Merivuori and Teemu Roos present a paper on stemmatology at the 2nd WITMSE workshop.
»» workshop page

August 27, 2009, Helsinki. Final seminar of HIIT summer interns. Anu Sulander and Anupam Arohi present results of summer internship in STAM. Place: Kumpula Campus.

October 28, 2009, Helsinki. Open House at the Department of Computer Science. Demonstration by STAM project. Place: Kumpula Campus.

January 28–30, 2010, Helsinki. We organized a stemmatology workshop in Helsinki.
»» workshop page

April 2, 2010, New Haven. Teemu Roos gives a talk on algorithms of stemmatology at Yale University.
»» Yale Probabilistic Networks Group

April 22, 2010, St. Louis. Teemu Roos gives a talk on algorithms of stemmatology at the Washington University in St. Louis.

June 21–23, 2010, Uppsala. The second workshop in our series. Attendance is by invitation only.

November 21–24, 2010, Pisa. The third workshop in our series. Attendance by invitation.
»» workshop page

December 17, 2010, Helsinki. Yuan Zou graduates (MSc) with a Master's thesis on stemmatology.

March 22–25, 2011, Cambridge, UK. Fourth workshop in our series. Attendance by invitation.
»» workshop page

August 15, 2011. Anupam Arohi graduates with a MSc thesis on stemmatology.

October 5–8, 2011, Rome, Italy. Fifth workshop in our series. Attendance by invitation.
»» workshop page

December 11–14, 2011, Vancouver, Canada. Teemu Roos and Yuan Zou present a paper on a new stemmatological method, Semstem, at the ICDM conference.
»» conference page

March 23, 2012, University of East Anglia, UK. Teemu Roos speaks at the Computational Biology Seminar at University of East Anglia.
»» seminar

November 22–24, 2012, University of Bern, Switzerland. Teemu Roos speaks at the Phylomemetic and Phylogenetic Approaches in the Humanities Workshop.
»» workshop page


University of Helsinki | Department of Computer Science | Helsinki Institute for Information Technology