Resources

DeScript (Describing Script Structure)

DeScript is a crowdsourced corpus of event sequence descriptions (ESDs) for different scenarios crowdsourced via Amazon Mechanical Turk. It has 40 scenarios with approximately 100 ESDs each. The corpus also has partial alignments of event descriptions that are semantically similar with respect to the given scenario.

Link to the resource

Reference: Wanzare, L., Zarcone, A. , Thater, S. & Pinkal, M. (2016). DeScript: A Crowdsourced Database for the Acquisition of High-quality Script Knowledge. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 16), Portorož, Slovenia.

Contact person: Lilian Wanzare

Disco-SPICE (Spoken conversations from the SPICE-Ireland corpus annotated with discourse relations)

The resource contains all texts from the Broadcast interview and Telephone conversation genres from the SPICE-Ireland corpus, annotated with discourse relations according to the PDTB 3.0 and CCR frameworks.

Link to the resource

Reference: Rehbein, I., Scholman, M.C.J., Demberg, V. (2016). Annotating discourse relations in spoken language: A comparison of the PDTB and CCR frameworks. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 16), Portorož, Slovenia.

Contact person: Merel Scholman

InScript (Narrative texts annotated with script information)

The InScript corpus contains a total of 1000 narrative texts crowdsourced via Amazon Mechanical Turk. The texts cover 10 different scenarios describing everyday situations like taking a bath, baking a cake etc. It is annotated with script information in the form of scenario-specific events and participants labels. The texts are also annotated with coreference chains linking different mentions of the same entity within the document.

Link to the resource

Reference: Modi, A., Anikina, T. , Ostermann, S. & Pinkal, M. (2016). InScript: Narrative texts annotated with script information. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 16), Portorož, Slovenia.

Contact person: Simon Ostermann

Modeling Semantic Expectations

This resource contains the DR predictions (by humans) on the InScript corpus. These were collected using Amazon Mechanical Turk. For details please refer to the paper mentioned below.

Link to the resource

Reference: Modi, A., Titov, I., Demberg, D., Sayeed, A. & Pinkal, M. (2016). Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction. Transactions of Association for Computational Linguistics (TACL)

 

Back-translation Annotated Implicit Discourse Relations

This resource contains annotated implicit discourse relation instances. These sentences are annotated automatically by the back-translation of parallel corpora. For details please refer to reference below.

Link to the resource

Reference: Shi, W., Yung, F., Rubino, R., & Demberg, V. (2017). Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing.

 

MCScript
MCScript is a new dataset for the task of machine comprehension focussing on commonsense knowledge. Questions were collected based on script scenarios, rather than individual texts, which resulted in question–answer pairs that explicitly involve commonsense knowledge. It comprises 13,939 questions on 2,119 narrative texts and covers 110 different everyday scenarios. Each text is annotated with one of 110 scenarios. Questions are typed with a crowdsourced annotation, indicating whether they can be answered from the text or if commonsense knowledge is needed for finding an answer.

Link to resource

Ostermann, S., Modi, A., Roth, M., Thater, S., Pinkal, M. (to appear): MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.

Contact persons: Simon Osterman and Michael Roth