Structuring Wikipedia for "a machine that can explain its decision in language”

Satoshi Sekine - RIKEN Center for Advanced Intelligence Project, Japan

Structuring Wikipedia for “a machine that can explain its decision in language”

Satoshi Sekine
RIKEN Center for Advanced Intelligence Project, Japan

The final goal of our project is to build “a machine that can explain its decision in language.” One of the resources needed to achieve this goal is the world knowledge which can be easily handled by machines. Wikipedia is a great resource of world knowledge, in particular for named entities. However it is written to be read by humans, and in order to machines to access the resource, we need to make it well-structured. Unlike DBpedia, Freebase or YAGO which contains many noise because these are basically categorized and structured by crowds; we believe the structure, e.g. categories and attributes, has to be designed in a top-down manner. We employed “Extended Named Entity definition created at NYU and we are trying to transfer most of the Wikipedia entries into that structure. This task is known as Knowledge Base Population (KBP) and  the technologies have been improving through shared tasks. However, the fruit of the advancements are not used for resource construction. We conducted the “SHINRA” project under the “Resource by Collaborative Contribution (RbCC)” scheme. We run a shared-task of structuring Japanese Wikipedia for 5 categories, but it also aim to create a resource based on the output of the participated systems. SHINRA-2018 project was started in December 2017 and concluded the first trial in September 2018. The tasks in SHINRA-2019 will include the categorization task for 9 languages, as well as the structuring task for 39 categories of Japanese Wikipedia.

Satoshi Sekine

Dr. Satoshi Sekine is a team-leader at AIP, Japan. AIP is a recently created Japanese government sponsored research laboratory focusing on the AI and related technologies. He is leading the language information access technology team. He got PhD at New York University and worked as an associate research professor at NYU for 20+ years. His main interests are information extraction, named entity, ontology, question and answering and related areas. Google Scholar shows he got 7000+ citations, including the extensive survey paper for named entity technologies. He joined AIP in 2017.