TY - GEN
T1 - Balisage Paper
T2 - Balisage: The Markup Conference 2022
AU - Beshero-Bondar, Elisa E.
N1 - Publisher Copyright:
Copyright © 2022 Elisa Beshero-Bondar.
PY - 2022
Y1 - 2022
N2 - The process of instructing a computer to compare texts, known as computer-aided collation, might resemble trying to fix a power loom when the threads it is supposed to weave together become tangled. The power of the automated weaving continues, with the threads improperly aligned and the pattern broken in a way that can make it difficult to isolate the cause of the problem. Automating a tedious process magnifies the complexity of error-correction, sometimes calling for new tooling to help us perfect the weaving or collating process. The authors are attempting to refine a collation algorithm to improve its alignment of variant passages in the Frankenstein Variorum project. We have begun with a Python script that tokenizes and normalizes the texts of the editions and delivers them to collateX for processing the collation and delivering TEI-conformant output for our project. In post-processing stages after running the collation, we apply a series of XSLT transformations to the collation output. This post-collation XSLT pipeline publishes the digital variorum edition, which prepares each output witness in TEI XML to store information about its own variance from the other editions. We have discussed that pipeline elsewhere, but our interest in this paper is in efforts to repair and correct and improve the collation process. We have applied Schematron and XSLT in post-processing to correct patterns of erroneous alignments, but eventually realized that the problems we were trying to solve required repairing the collation algorithm. We are now experimenting with revising the collation algorithm in two ways: 1) by fine-tuning the text preparation algorithms we apply in our Python file that delivers text to the collateX software, and 2) by attempting to introduce those same text preparation algorithms entirely with XSLT using the Text Alignment Network's XSLT application of tan:diff() and tan:collate(), introduced by Joel Kalvesmaki at the 2021 Balisage conference. In this paper we discuss the challenges of figuring out where and how to intervene in the collation process, and what we are learning about how far we can take XSLT and Schematron in helping to automate the preparation, collation, and correction process.
AB - The process of instructing a computer to compare texts, known as computer-aided collation, might resemble trying to fix a power loom when the threads it is supposed to weave together become tangled. The power of the automated weaving continues, with the threads improperly aligned and the pattern broken in a way that can make it difficult to isolate the cause of the problem. Automating a tedious process magnifies the complexity of error-correction, sometimes calling for new tooling to help us perfect the weaving or collating process. The authors are attempting to refine a collation algorithm to improve its alignment of variant passages in the Frankenstein Variorum project. We have begun with a Python script that tokenizes and normalizes the texts of the editions and delivers them to collateX for processing the collation and delivering TEI-conformant output for our project. In post-processing stages after running the collation, we apply a series of XSLT transformations to the collation output. This post-collation XSLT pipeline publishes the digital variorum edition, which prepares each output witness in TEI XML to store information about its own variance from the other editions. We have discussed that pipeline elsewhere, but our interest in this paper is in efforts to repair and correct and improve the collation process. We have applied Schematron and XSLT in post-processing to correct patterns of erroneous alignments, but eventually realized that the problems we were trying to solve required repairing the collation algorithm. We are now experimenting with revising the collation algorithm in two ways: 1) by fine-tuning the text preparation algorithms we apply in our Python file that delivers text to the collateX software, and 2) by attempting to introduce those same text preparation algorithms entirely with XSLT using the Text Alignment Network's XSLT application of tan:diff() and tan:collate(), introduced by Joel Kalvesmaki at the 2021 Balisage conference. In this paper we discuss the challenges of figuring out where and how to intervene in the collation process, and what we are learning about how far we can take XSLT and Schematron in helping to automate the preparation, collation, and correction process.
UR - http://www.scopus.com/inward/record.url?scp=85140295502&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140295502&partnerID=8YFLogxK
U2 - 10.4242/BalisageVol27.Beshero-Bondar01
DO - 10.4242/BalisageVol27.Beshero-Bondar01
M3 - Conference contribution
AN - SCOPUS:85140295502
T3 - Balisage Series on Markup Technologies
BT - Proceedings of Balisage
PB - Mulberry Tecnologies, Inc.
Y2 - 1 August 2022 through 5 August 2022
ER -