Universität Duisburg-Essen
 Kopfgrafik Projekt Wikinger
Kopfgrafik Projekt Wikinger

WALU

WALU (WIKINGER Annotations- und Lern-Umgebung = WIKINGER annotation and learning environment) is a JAVA-based tool developed within the project WIKINGER (Wiki Next Generation Enhanced Repository). WIKINGER is part of the BMBF-funded E-Science initiative and aims at developing semantic Wikis for scientific communities. In order to setup and maintain such a network of domain-relevant concepts and relations between them, tools for Information Extraction are applied to preferably large amounts of scientific texts. NER techniques play a crucial role within this process, since they are used to extract the domain-relevant concepts.

A NER-tool applicable within a generic E-Science framework needs to be adaptive to new (scientific) domains and has to accommodate that only domain experts are able to select and define the semantic categories and concepts relevant for the particular application. WALU is designed to meet these requirements by enabling domain experts to create training data and control the process of training and (semi-) automatic semantic markup.

Designing such a tool requires a trade off between power and convenience. On the one hand, such a tool has to provide the necessary functionalities, i.e. manual annotation of documents, configuration and initiation of the training process, application of automatic annotation components, as well as inspection and correction of the resulting annotations. On the other hand, intuitive interfaces and convenient facilities supporting these functionalities while encapsulating their complexity are crucial to ensure usability for professionals of any domain. In addition, this tool has to be integrated into the overall WIKINGER infrastructure. Currently there is no other tool available that meets all these requirements, at least not to our knowledge. Therefore, we are currently developing WALU.

Walu Screenshot
Abb.: Screenshot of the GUI for the manual annotation


WALU supports manual annotation with a GUI that is easy to use. It offers a comfortable navigation through the annotations, and simple but effective annotation support such as the automatic adjustment of markup boundaries or a dynamic markup dictionary. This dictionary is created during the annotation process and is used to propose markup labels for text passages corresponding to dictionary entries. Using a context-sensitive menu, the annotator confirms or rejects these proposals and/or removes the entry from the dictionary. In our experience the immediate feedback of the dynamic markup dictionary also helps the domain experts to clarify the task of string-based identification of domain-relevant concepts. Additionally, WALU also provides an automatic annotator for strings referring to the category DATE which is based on regular expressions. This is a simple prototype of a series of automatic mechanisms that will be used to annotate all the available documents. Except a few annotators based on regular expressions to classify entities with unique patterns (such as email addresses and URLs), most of these automatic annotators are based on machine learning algorithms that will be accessible via WALU.

Training the machine learning facilities as well as their annotation of new text can be initiated via the WALU GUI. The annotation results can be displayed and manually corrected. Automatic annotations are displayed in a distinct way (only the lower half of the annotated tokens are highlighted) so that they can be discovered immediately by the user. WALU is designed both as a part of the WIKINGER infrastructure and as a stand-alone tool. Web-service-based communication facilities allow WALU to load documents from the WIKINGER document repository and load/store corresponding annotations from/to the metadata repository. As a stand-alone tool, WALU currently is able to import text documents (other import formats will be captured later) and to export annotated documents in a straightforward XML standoff format. The transfer between the various different data formats is achieved via a special internal format we call ‘WaRP (WALU Rich Paragraph) stream’, which is also processed by the automatic annotation components.

Availability: In the near future, WALU will be released under a public open-source license.

Contacts: Andreas Wagner, Marc Rössler

Literatur

  • Lars Bröcker, Marc Rössler, Andreas Wagner. "Knowledge Capturing Tools for Domain Experts". In: SAAKM 2007 - Semantic Authoring, Annotation and Knowledge Markup Workshop. Co-located with the 4th International Conference on Knowledge Capture (K-Cap 2007), Whistler, British Columbia, Canada, October 28-31, 2007. [to appear]

  • M. Rössler, A. Wagner, F. Jungermann, W. Hoeppner. "Applying WALU to Annotate Named Entities in Italian". Shared task contribution. In: Intelligenza Artificiale - Periodico trimestrale dell' Associazione Italiana per l' Intelligenza Artificiale (http://ia.di.uniba.it/), Proceedings of EVALITA 2007 - Evaluation of NLP Tools for Italian, workshop organized in conjunction with AI*IA 2007 (http://aiia.info.uniroma2.it/), Roma, Italy, September 10-13, 2007.

  • Lars Bröcker, Stefan Paal, Andreas Burtscheidt, Bernhard Frings, Marc Rössler, Andreas Wagner, Wolfgang Hoeppner. "WIKINGER - Wiki Next Generation Enhanced Repositories". German e-Science Conference 2007. Baden-Baden, Germany, 2007. [pdf]

  • Andreas Wagner und Marc Rössler. "WALU – Eine Annotations- und Lern-Umgebung für semantisches Tagging". GLDV-Frühjahrstagung, Tübingen, 2007. [pdf]

Letzte Änderung: Donnerstag, 27.9.2007