Daniel Apollon: Joint Representation of Readings and Source Documents Manuscripts Families

Possibilities and limitations of Correspondence Analysis and related methods

In recent years, the awareness of analogies between textual variation and biological evolutionary variation has encouraged the application of cladistics to textual phylogenies. Cladistics has been approached as a general engine that may be used to identify original readings, select the best stemma among a collection of most probable stemmata, and, where possible, suggest the original state of a textual tradition.

Cladistics may be thought as a family of algorithms exploiting:

  • graph theoretical methods,
  • classical probabilistic and Bayesian methods,
  • aspects of classical geometrical and distance methods,
  • search and select algorithms.

However, cladistics, as a pure graph-based approach has unresolved issues:

  • How many trees are the best candidates for the prototype?
  • Which kind of trees are the best candidates to model the texual tradition?
  • Which search and select algorithms should be used?
  • Is textual change evolutionary, in the same sense as, e.g. morphological characters in living organisms?

Furthermore, cladistic methods do not offer or offer only partially a full space of reconstruction, depicting either, e.g. manuscripts by means of units of variations (readings), or, more seldom, a selection of readings.

Correspondence analysis does not search a within a space of possible trees, but visualises readings and manuscripts as a cloud of points in a high dimensional space. Being a dual scaling method, the method offers distinctive benefits to the textual analyst, the most obvious ones being

  • the simultaneous visualable projection of variants and their sources within a common space;
  • a characterisation of variants by their source or family of sources and a corresponding characterisation of sources by unique or shared variant readings,
  • an efficient dimensionality reduction
  • diagnostic tools to measure the respective contributions of variants to their family, and vice versa of sources to individual variants or groups of variants.
  • one data set - one solution;

Hendrik Blockeel: From Decision Trees to Phylogenetic Trees

Decision tree induction is a machine learning technique that has been studied for decades. It is usually seen as a method for learning classifiers, but in fact a decision tree implicitly also represents a cluster hierarchy. Decision tree learning can therefore be seen as a top-down conceptual clustering method, where "conceptual" refers to the fact that clusters are defined by listing properties of their members, rather than listing the members themselves. In this talk, we argue that such conceptual clustering can have advantages over other clustering methods, when used in the context of phylogenetic tree construction (or the construction of stemmata). We list advantages and disadvantages of the decision-tree based approach, and present some preliminary experimental results. Our main conclusions are that this new method to phylogenetic tree construction is a viable alternative to existing methods, but certain aspects of its behaviour are not fully understood yet, including a tendency to work better on more balanced target trees.

Celine Vens, Eduardo Costa, Hendrik Blockeel (2010). Top-Down Induction of Phylogenetic Trees. In: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 8th European Conference, EvoBIO 2010, Istanbul, Turkey, April 7–9, 2010, Proceedings. Lecture Notes in Computer Science 6023, pp. 62–73. Springer.

Marina Buzzoni: Is reductio ad unum always desirable? The Hêliand stemma revisited

The 'Electronic Hêliand Project' was started in June 2006 at the University of Venice (see Buzzoni 2009a; Buzzoni 2009b). Its main aim was to show how the electronic medium is capable of capturing the often disregarded differences amongst the witnesses of the ninth-century Old Saxon poem, i.e. its inner mouvance that some uses of Stemmatology contribute to highlight. As against a printed edition which offers us a static text, an electronic edition presents the text in a variety of forms and permit users to choose between visualizing only one tag scenario or several (cf. Ciula and Stella 2006; Burnard, O'Brien O'Keeffe and Unsworth 2006). In this paper I will critically reconsider some of the readings traditionally attributed to the Hêliand "archetype" also (albeit not only) in the light of the recently found Leipzig fragment (April 2006). Finally, I intend to show how this revised information can be profitably used within the aforementioned 'Electronic Hêliand Project'.
Odd Einar Haugen: Stemmatology: what comes before, what comes after?

In this talk, I would like to discuss the place of stemmatology within a wider context. On analogy with other disciplines, I believe it can be helpful to draw a distinction between pure or theoretical stemmatology on the one hand and applied stemmatology on the other. Theoretical stemmatology will be taken to mean the study of a manuscript tradition from a purely formal perspective, while applied stemmatology will be taken to mean a stemmatological analysis made as part of an editorial project. My focus will be on the latter, i.e. the construction and use of a stemma for an edition of a text. Part of the editorial work precedes the construction of the stemma, such as deciding on the text to be edited, locating its sources, transcribing the manuscripts, and the collation of them; while part of the work comes after the stemma has been established, such as the text construction, constitutio textus, the selection of variants and the choice of user interface for the edition. In the recent debate between traditional (or old) philology and material (or new) philology, stemmatology is firmly rooted in the traditional philology, but I will argue that the study of the transmission of a text will never be outdated, and that stemmatology is as important to our generation as it was to that of Karl Lachmann almost 200 years ago. The qualitative method of Lachmann's generation, i.e. the genealogical method, has now been supplemented (not supplanted, I think) by a number of quantitative methods. Yet no method can claim to give an undisputed analysis of any but the simplest manuscript traditions. I therefore believe it is essential to encourage the use of several methods, including the traditional genealogical method, in the recension of manuscript traditions. For this reason, a standard output format for the collation of manuscripts should be developed so that the use of several methods, qualitative as well as quantitative, will be facilitated.

Chris Howe: Extraction of DNA from Parchment

Those of us who use phylogenetic analysis of texts are familiar with the idea of treating them as though they were a DNA sequence? But what of the possibility of extracting 'real' DNA from parchment? This might be used to identify the animal species used to make the parchment, and more fine-scale genetic analysis (as with genetic fingerprinting in forensic science) might be used to identify source populations. I shall describe work done in Cambridge using 18th and 19th century land transfer documents as test specimens. We see a complex pattern of DNA in extracts from this material. Some of the complexity may reflect transfer of DNA between skins during the parchment preparation process.

  1. "A flock of sheep, goats and cattle: ancient DNA analysis reveals complexities of historical parchment manufacture." Campana MG, Bower MA, Bailey MJ, Stock F, O'Connell TC, Edwards CJ, Checkley-Scott C, Knight B, Spencer M, Howe CJ (2010) J. Archaeol. Sci. 37, 1317-1325. doi: 10.1016/j.jas.2009.12.036

  2. "The potential for extraction and exploitation of DNA from parchment: a review of the opportunities and hurdles." Bower MA, Campana MG, Checkley-Scott C, Knight B, Howe CJ (2010) J. Inst. of Conservation 33 1-11. doi: 10.1080/19455220903509937

Wendy J Phillips-Rodriguez: What is the use of a Stemmatologist once a Critical Edition has been Made?

Originally, the discipline of Stemmatics was created as a method to edit texts, and as such it is still probably the preferred method among textual scholars due to its cogency. Nevertheless, are all stemmatologists working on an edition of their text nowadays?

The information revealed by a stemma used to be helpful only at the beginning of the process of edition, for it was precisely such stemma that would make it easier to choose the readings that would form the reconstituted text. However, it seems that the use of new technologies (e.g. phylogenetic algorithms) besides helping textual scholars to achieve their traditional goals, has also brought about the possibility of using Stemmatology for purposes it was not initially meant for.

Grouping of manuscripts according to their characteristics, analysing certain patterns of contamination, even spotting some scribal behaviours — all of which can now be visualized in ways that were not possible in the past — may reveal important information about the text. If stemmata were the exclusive tools of the so-called "lower criticism" perhaps more sophisticated computer-generated trees might have something to say also about "higher criticism".

In this paper I will present some uses of Stemmatology that are not related to the creation of and edition. Instead, they have as their purpose to analyze aspects of the text that are related to the cultural environment in which it was created and transmitted.

Steven Schwager: Entropy Information in Refining the Reconstruction of Classical Texts, and the Anticipation of Information Theory by Early Philologists

Philologists reconstructing ancient texts from variously miscopied manuscripts anticipated information theorists by centuries in conceptualizing information in terms of probability. An example is the editorial principle difficilior lectio potior (DLP): in choosing between otherwise acceptable alternative wordings in different manuscripts, "the more difficult reading [is] preferable". As philologists at least as early as Erasmus observed (and as information theory's version of the second law of thermodynamics would predict), scribal errors tend to replace less frequent and hence entropically more information-rich wordings with more frequent ones. Without measurements, it has been unclear how effectively DLP has been used in the reconstruction of texts, and how effectively it could be used. We analyze a case history of acknowledged editorial excellence that mimics an experiment: the reconstruction of Lucretius's De Rerum Natura, beginning with Lachmann's landmark 1850 edition based on the two oldest manuscripts then known. Treating words as characters in a code, and taking the occurrence frequencies of words from a current, more broadly based edition, we calculate the difference in entropy information between Lachmann's 756 pairs of grammatically acceptable alternatives. His choices average 0.26±0.20 bits/word higher in entropy information (95% confidence interval, P = 0.005). As a channel width, this corresponds to a likelihood of the rarer word being the one accepted in the reference edition, which is consistent with the observed 547/756 = 0.72±0.03 (95%). Statistically informed application of DLP can recover substantial amounts of semantically meaningful entropy information from noise; hence the extension copiosior informatione lectio potior, "the reading richer in information [is] preferable."

Heather Windram: A Phylogenetic and Stemmatic Analysis of the 'Sack' Poems by Robert Herrick — a 17th century English Poet.

England in the 17th century was a time of political, religious and cultural turbulence. Against the backdrop of a civil war which divided the country into parliamentarian and royalist factions, Robert Herrick (1591-1674), a staunch but not unquestioning royalist, spent most of his adult life as the Anglican vicar of Dean Prior in Devon, where he wrote pastoral poetry in the classical tradition. During the protectorate, temporarily evicted from his living, he became one of the many writers and musicians trying to survive in London in the austere climate of the Commonwealth. Also, living at the brink of the age of popular print, his poetry is passed down to us in both print and manuscript form and, whilst in London, he oversaw the printing and publication of much of his poetical output (some 1400 poems) in the form of the book Hesperides. After the restoration, he was re-appointed to Dean Prior where he lived and worked for the remaining years of his life.

We were approached by Ruth Connolly from the English Department at the University of Newcastle and asked to perform a phylogenetic analysis of several of Herrick's poems to complement a stemmatic analysis that she and her colleagues were currently preparing. The aim was that the two types of analyses should be done in parallel so that results from one could inform and guide the other approach.

I will present the results obtained as part of the collaboration, focussing on the pair of poems entitled 'Farewell to Sack' and Welcome to Sack'. Sack is a sweet fortified wine much favoured in England at this time. It was exported from the Canary Isles and often referred to as Canary Sack. In Shakespeare's Henry IV Falstaff states:

"If I had a thousand sons, the first humane principle I would teach them should be, to forswear thin potations and to addict themselves to sack."(Part II - Act 4.Sc3)

Herrick takes a classical poetical format and mischievously turns it to a pair of poems first renouncing and then welcoming back his true love — Canary Sack.