WALU (WIKINGER Annotations- und Lern-Umgebung = WIKINGER annotation and learning environment) is a JAVA-based tool developed within the project WIKINGER (Wiki Next Generation Enhanced Repository). WIKINGER is part of the BMBF-funded E-Science initiative and aims at developing semantic Wikis for scientific communities. In order to setup and maintain such a network of domain-relevant concepts and relations between them, tools for Information Extraction are applied to preferably large amounts of scientific texts. NER techniques play a crucial role within this process, since they are used to extract the domain-relevant concepts.
A NER-tool applicable within a generic E-Science framework needs to be
adaptive to new (scientific) domains and has to accommodate that only
domain experts are able to select and define the semantic categories
and
concepts relevant for the particular application. WALU is designed to
meet these requirements by enabling domain experts to create training
data and control the process of training and (semi-) automatic semantic
markup.
Designing such a tool requires a trade off between power and
convenience. On the one hand, such a tool has to provide the necessary
functionalities, i.e. manual annotation of documents, configuration and
initiation of the training process, application of automatic annotation
components, as well as inspection and correction of the resulting
annotations. On the other hand, intuitive interfaces and convenient
facilities supporting these functionalities while encapsulating their
complexity are crucial to ensure usability for professionals of any
domain. In addition, this tool has to be integrated into the overall
WIKINGER infrastructure. Currently there is no other tool available
that
meets all these requirements, at least not to our knowledge. Therefore,
we are currently developing WALU.
Abb.: Screenshot of the GUI for the manual annotation
WALU supports manual annotation with a GUI that is easy to use. It
offers a comfortable navigation through the annotations, and simple but
effective annotation support such as the automatic adjustment of markup
boundaries or a dynamic markup dictionary. This dictionary is created
during the annotation process and is used to propose markup labels for
text passages corresponding to dictionary entries. Using a
context-sensitive menu, the annotator confirms or rejects these
proposals and/or removes the entry from the dictionary. In our
experience the immediate feedback of the dynamic markup dictionary also
helps the domain experts to clarify the task of string-based
identification of domain-relevant concepts. Additionally, WALU also
provides an automatic annotator for strings referring to the category
DATE which is based on regular expressions. This is a simple prototype
of a series of automatic mechanisms that will be used to annotate all
the available documents. Except a few annotators based on regular
expressions to classify entities with unique patterns (such as email
addresses and URLs), most of these automatic annotators are based on
machine learning algorithms that will be accessible via WALU.
Training the machine learning facilities as well as their annotation of
new text can be initiated via the WALU GUI. The annotation results can
be displayed and manually corrected. Automatic annotations are
displayed
in a distinct way (only the lower half of the annotated tokens are
highlighted) so that they can be discovered immediately by the user.
WALU is designed both as a part of the WIKINGER infrastructure and as a
stand-alone tool. Web-service-based communication facilities allow WALU
to load documents from the WIKINGER document repository and load/store
corresponding annotations from/to the metadata repository. As a
stand-alone tool, WALU currently is able to import text documents
(other
import formats will be captured later) and to export annotated
documents
in a straightforward XML standoff format. The transfer between the
various different data formats is achieved via a special internal
format
we call ‘WaRP (WALU Rich Paragraph) stream’, which is also processed by
the automatic annotation components.
Availability: In the near future, WALU will be released under a public open-source license.
Lars Bröcker, Marc Rössler, Andreas Wagner. "Knowledge Capturing Tools for Domain Experts". In: SAAKM 2007 - Semantic Authoring, Annotation and Knowledge Markup Workshop. Co-located with the 4th International Conference on Knowledge Capture (K-Cap 2007), Whistler, British Columbia, Canada, October 28-31, 2007. [to appear]
M. Rössler, A. Wagner, F. Jungermann, W. Hoeppner. "Applying WALU to Annotate Named Entities in Italian". Shared task contribution. In: Intelligenza Artificiale - Periodico trimestrale dell' Associazione Italiana per l' Intelligenza Artificiale (http://ia.di.uniba.it/), Proceedings of EVALITA 2007 - Evaluation of NLP Tools for Italian, workshop organized in conjunction with AI*IA 2007 (http://aiia.info.uniroma2.it/), Roma, Italy, September 10-13, 2007.
Lars Bröcker, Stefan Paal, Andreas Burtscheidt, Bernhard Frings, Marc Rössler, Andreas Wagner, Wolfgang Hoeppner. "WIKINGER - Wiki Next Generation Enhanced Repositories". German e-Science Conference 2007. Baden-Baden, Germany, 2007. [pdf]
Andreas Wagner und Marc Rössler. "WALU – Eine Annotations- und Lern-Umgebung für semantisches Tagging". GLDV-Frühjahrstagung, Tübingen, 2007. [pdf]