close

Google Award for tool that "understands" Dutch texts

Google has awarded a European Digital Humanities Award to a project in which language technologists and computer scientists from Radboud University Nijmegen will work together to create a tool that "understands" Dutch texts. A tool like this will provide academics with better answers to their questions. Google is awarding a gift of 50.000 US dollars in support of this effort.

Researchers in the field of Humanities are often disappointed by results they get from a generalised search engine. If they type in "Johan van Oldenbarnevelt" for example, the information they are given is much too diverse. If they want to know how Van Oldenbarnevelt died, why he was sentenced to death, or when he was executed, they must dig deeper into the documents suggested by the search engine.  

"Extracting" concealed data
It would be considerably more efficient if rather than having to search using key words, researchers could access the facts implicitly contained in the source texts. A system that is able to retrieve structured facts from text makes this possible. For example, the Dutch Wikipedia article on Johan van Oldenbarnevelt contains the following fragment [translated from the original]:

"Prince Maurice staged a coup. He disbanded the waardgelders and on August 29 (1618) he had Johan van Oldenbarnevelt and his chief supporters Hugo Grotius, Rombout Hogerbeets and Gilles van Leedenberch arrested on suspicion of high treason." 

This passage contains many facts, but also a lot of surplus information. One of the facts is "Prince Maurice arrested Johan van Oldenbarnevelt for high treason." 

The aim of the project that has received acclaim from Google is to design a tool that will extract the important facts from a Dutch text and store them in a database as an unconjugated, "stripped" phrase: Prince Maurice, arrest, Johan van Oldenbarnevelt, high treason. Enhancing the original text with a database like this would make it easier for researchers to find answers to their questions.

Google helps academics help academics
The tool for extracting facts must be able to identify and label syntactic and semantic roles. Language technologists from the Faculty of Arts at Radboud University and computer scientists from the Faculty of Science at the same university already have some experience in building similar software. The Google European Digital Humanities Award will enable them to design a tool for the Dutch language.

The project "Extracting Dutch Factoids from Text" will be launched in January 2011. The main applicant is Suzan Verberne, researcher at the Centre for Language and Speech Technology / Centre for Language Studies at Radboud University Nijmegen. Verberne is delighted with the funding: "This will allow us to take the first steps towards extracting knowledge from Dutch texts, which will be of huge help to history and literature researchers in their search through source material."  Google awards the European Digital Humanities Awards to show its support for finding better search methods for large digitised documents.

Source: Radboud University Nijmegen

 

JOIN THE CONVERSATION (0)

COMMENTS

Leave a comment