image strip catch 02

Checkers

Checkers is a project of the CatchPlus program in cooperation with the Public Service for Cultural Heritage (RCE) and Naturalis, and is funded by the Netherlands Organization for Scientific Research (NWO), the Ministry of Education, Culture and Science and the Ministry of Economic Affairs.

Goal of CatchPlus is to further work out functionalities which were developed by various universities in the Catch (Continuous Access to Cultural Heritage) program, and to convert them into applications that can be put to practical use. So CatchPlus is about valorization. In the subproject Checkers Trezorix works out functionalities that were originally developed by the University of Tilburg.

The Checkers project consist of two parts: the EntityChecker and the ValueChecker.

With the EntityChecker so-called named entities can be recognized and extracted from free text. Named entities are for instance dates in different formats, geographic locations, or expressions that also occur in keyword lists. The EntityChecker can be used as a tool for metadata enrichment of unstructured data such as articles, letters and reports.

With the ValueChecker databases and spreadsheets can be analysed for inconsistencies and errors in the field values. For this analysis reference structures can be used to get more specific results. Also datasets that have already been cleaned before can serve as reference material for improved analysis. After the analysis the ValueChecker generates suggestions for possible cleansing of the data.

For the EntityChecker the module OpenBoek is being used, for the ValueChecker the Tinpute module. Both applications have been developed by the University of Tilburg. As part of the CatchPlus program both the EntityChecker and the ValueChecker are made available by Trezorix as webservices.

See also www.nwo.nl/catch and www.catchplus.nl

 

 

 
text search