Atomic: An annotation platform to meet the demands of current and future research

2015-03-05T00:00:00Z (GMT) by Druskat, Stephan Gast, Volker

This poster presents Atomic, a multi-level annotation tool for linguistic data. In the past, a number of annotation tools – like Synpathy , RSTTool, and MMAX2 – have been developed in linguistics. Most of them have been implemented within research projects for a specific research question or a specific type of annotation, e.g., syntactic annotations or coreference chains. Unfortunately, some of these tools have not been developed further, beyond their respective project duration. In contrast, Atomic aims at being not Yet Another Annotation Tool for Yet Another Research Question. It has been designed with a focus on adaptability, in order to future-proof it for new use cases. Atomic is powered by Salt (Zipser & Romary, 2010), a graph-based, theory-neutral, and semantic-free metamodel for linguistic data. The data abstraction via Salt and the inclusion of a generic, graph-based editor enable Atomic to handle potentially all types of annotations. Atomic was built on top of the Eclipse RCP (McAffer et al., 2010), an application framework with a sophisticated plugin technology which allows to easily extend the software for different research needs, e.g., with editors for specific annotation types. There are tried and tested annotation techniques – and editor types – for different
types of annotations. In Atomic it is possible to include all of these, and make them all work on the same data model. The combination of a generic data model and a pluggable architecture, therefore, allows Atomic to outlive its original development context, as new editors and other tooling may always be added to support future theories and annotation types. To demonstrate its extensibility, Atomic ships with dedicated editors, such as a coreference editor. In order to further ensure the sustainability of the software, Atomic integrates the Pepper converter framework (Zipser et al., 2011), which enables it to read and write established linguistic formats like TigerXML, the EXMARaLDA and RST format, MMAX2, TCF, ANNIS, and many more.




CC BY 4.0