NeDiMAH & CLARIN-EU Event: Exploring Historical Sources with Language Technology 8-9 December 2014

Joint NeDiMAH | CLARIN-EU Workshop

The proliferation of digital resources in the Humanities is leading to the elaboration of new methods, concepts, and theories by means of which researchers can query and interpret large-scale textual collections. The goal of the workshop was to demonstrate how the application of language technology has produced a new understanding of texts in different fields of Humanities.The workshop brought together researchers who already apply language technology, and those who would like to learn about the current state of art in this new and evolving area. The organizers invited researchers (especially early career scholars) who plan to apply language technology but do not already have the necessary skills and technical background. The second main goal of the workshop was to enhance exchange of experiences, disseminate know-how, and to explore potential future collaborations.

In the two workshop days twenty-four short and long papers were presented that together showed a wide panorama of historical corpora, research questions and the digital tools that are used to enrich, query and analyse them. There is a vast number of digital texts and document collections from archives and libraries available for researchers from many different countries and periods. Especially the availability of (collections of) digitized newspapers has caught the attention of researchers who use them to retrieve opinions about all sorts of important events and developments. For earlier periods, newspapers are not available as a source, so researchers turn to a variety of different sources. Keynote speaker and linguist Tony McEnery in collaboration with historian Helen Baker studied views on prostitution in seventeenth-century England, using the large collection of digitized books of the Early English Books Online programme. He used a variety of computational methods to manipulate the book texts but pointed out that his distant reading was not meant as a form of 'culturomics' but mainly different approximations of getting an overview of what is in the texts and for using intuitions. Actual reading of (a selection of) the texts will always be necessary to get a proper understanding of the subject you are dealing with. In his words: 'close reading is the key'. The necessity of interaction between forms of exploration and quantitative analysis and refinement of questions and qualitative research was a result of especially the projects in which historians and linguists intensively collaborated. An interesting and slick example was presented by Victor de Boer who showed an interface from the DIVE project that enables users to explore the projects' texts, events and audio and visual materials and zoom in on them in different ways. Of course, there is no way of using linguistic or computational tools to get ready made results from a vast corpus of texts. Like all research, it is hard and intensive work.

The presenters did not propose one method as a favorite tool for text analysis. In fact, several presenters proposed to use method triangulation, that is the comparison of result of different methods of analysis and elaboration as the preferred way of coming to results. Most of the linguistic and computational tools in themselves were not very new; improvements in this field are gradual and incremental. It is not efficient to build a set of tools that exactly caters the needs of individual user (in this case a historian) or group of
users. Historians also have to learn to apply the tools themselves to their own material and query and analyse the results themselves. In the final discussion, it was noted that the presented work had many useful aspects for historians, but the emphasis on corpus linguistics did not quite reflect the historians way of working. They usually combines a variety of sources of which only a part is digitally available. And if they are digitally available, there is no digital text, because the originals are in manuscript and cannot all be transcribed of if there is digital texts, the computer recognized texts are hampered by poor accuracy. While linguistic methods can be a useful addition to the historians toolkit, their results will be combined with other methods, but they are worth the effort as the do enable historians to do distant reading andresearch of large amounts of texts that would not be conceivable using traditional methods.

For a full report of the event, including list of speakers and links to abstracts and slides of presentations, see below .pdf