Up to now – and this means for its first year, my first post here dating from 12 months ago – this blog has been primarily concerned, with respect to text analysis, with quantitative approaches. However, this is of course only one part of computational text analysis, and computationally supported manual annotation of texts is another one. This is what this post is about.
CATMA – Computer Aided Textual Markup & Analysis (not just for girls, obviously.)
To be perfectly honest, I am indeed more interested in quantitative text analysis right now than in manual annotation. But a large part of my Ph.D. thesis was an exercise in computer-assissted manual annotation of literary descriptions in French Eighteenth-Century novels, and I have recently had the occasion to go back to this technique and apply it to a small number of nineteenth-century novels. The basic question was to find out what kind of relation there exists between descriptive techniques in the Enlightenment novel on the one hand, and in the realist novel on the other.
Compared to my work for the Ph.D. thesis, many things were different. First of all, instead of working my way through 32 novels, here I was just dealing with two, namely with Balzac’s Eugénie Grandet and his La Peau de Chagrin. Also, instead of dealing with a variety of different issues, here I was just dealing with one very specific issue, namely the techniques of integration of descriptions into their narrative context.[] And, most importantly, I was using the typology of such techniques which was the result of a long process in the thesis and simply “applied” it to two more novels. Of course, part of the aim was to see how well such a typology, developed for the eighteenth century, would work for the nineteenth.
The last difference is that, while I was working with Bibliographix during my thesis (a clear case of useful tool abuse), I am now not working with Windows any more and therefore need another solution. Two came to mind: CATMA, developed at the University of Hamburg, on the one hand, and Zotero developed by the Roy Rosenzweig Center for History and New Media. Both are web-based[] and therefore platform-independent, which is nice. So I tried out both of them!
I started with CATMA, the textual markup and analysis tool. What appealed to me in CATMA was the possibility to identify very specific portions of text to add tags to them, and the possibility to do nicely complex queries on the text. Also, CATMA is really built for my use case. So, I loaded my two novels into CATMA, then reading the texts, identifying descriptions, and categorizing the descriptions as a whole and smaller parts of them according to my now well-established categories. It sounds easy and really is not very complicated, but there are some not so intuitive details you need to take care of. Tag libraries need to be established before you start tagging, and they need to be loaded and activated for each document you want to tag, and the web-based interface is at times a bit slow to react; I had to cut my novels into several pieces to work comfortably with them. And the same goes for the query functions – very poweful but a bit clunky to use.
After going through all of my novels with CATMA, and wanting to do quick searches for certain combinations of features and see the resulting descriptions, I decided to try Zotero as an alternative approach. I know Zotero well for using it as a bibliographic database in various research contexts, both personal (Bibliography on Literary Description) and official (DARIAH Bibliography on Doing Digital Humanities), so this was an easy choice. And I had already identified all of the descriptions in my two novels and was able to easily grab them from CATMA.
So I took the descriptions from CATMA and entered them into a fresh Zotero collection one by one, adding each individual description as a new item and putting the text into a “note” (that took a while). Then, I went through all of them, marked relevant passages with different colors and added tags for all kinds of phenomena to each entry. The upside of Zotero was that it is easy to create tags as you go along, and that is quite snappy when you use the Firefox plugin or the standalone version (the purely web-based version is also a bit slow and limited). The downside is that it is not possible to add tags on a textual level, but only on the item level. Also, working in this way, I did not have the full text of the novel at my direct disposal, so there is a certain effect of de-contextualisation.
So, what is my personal bottom line? Zotero is simple and snappy and I know it well, but it is not really designed for my use case. CATMA, on the other hand, is designed exactly for my needs with this little research project, but I did not have the patience to stick with it. Once some of the rough edges get smoothed out, however, CATMA is undoubtedly the more adequate and more powerful tool.
And what was the research result? The typology developed for the eighteenth century worked well on Balzac, but some interesting differences showed up. For example, the ninetheenth century novel has a reputation (established by Philippe Hamon) for having a preference for symmetrical implicit integration techniques (such as a sequence like: opening of a door – description of person appearing in the doorframe – closing of the door). Such symetrical sequences are quite rare in the eighteenth century, where very simple configurations dominate and complex ones are almost always asymetrical. And it turns out, these symetrical configurations are also quite rare in the two Balzac novels I studied. Whether this is true more generally would need to be decided on the basis of a much larger sample. And indeed, I would argue, this would need to be done in an at least partly automated manner, i.e., in a way that combines qualitative analysis and quantitative techniques, possibly through a combination of rule-based annotation and machine-learning.
In other words, it doesn’t matter whether it’s manual or automatic, qualitative or quantitative, as long as it is computational. ;-)
Notes