This textual entailment test suite aims at providing developers of Textual Entailment system additional test and training datasets.
We followed the algorithm proposed in  to increase the size of Textual Entailment Corpus by using Machine Translation systems to generate additional <t,h> pairs.
We used this algorithm proposed to generate additional training dataset starting from RTEx and following a double translation process. We choose Spanish as intermediate language and Microsoft Bing Translator as the only Machine Translation system in this process. Soon we will provide additional datasets.
Additionally, we provide a Cross-Lingual Textual Entailment (CLTE) dataset based on the Monolingual RTE3 dataset used in the Third PASCAL Recognizing Textual Entailment Challenge. The texts (T) are written in English and the hypothesis (H) are written in Spanish. The procedure to generate this dataset can be found in .
Several datasets are provided and a description of their context can be found below :
These datasets are marked for a 3-way decision in terms of entailment: "ENTAILMENT" , "CONTRADICTION" and "UNKNOWN" (same format as RTE5 Pascal ).
3-Way based on RTE Stanford datasets:
These datasets are marked for a 2-way decision in terms of entailment: "ENTAILMENT" , "NO ENTAILMENT".
2-Way based on RTE TAC datasets: These dataset were converted to "2-way task" taking contradiction and unknown pairs as "NO ENTAILMENT" .
- English to Spanish CLTE datasets:
This test suite may be downloaded and used without restriction, it would be appreciate an acknowledgement if you publish results using it, and we would also be interested to hear what performance you get.
1. The algorithm to generate the Monolingual corpus and a description can be found in the following paper:
2. The algorithm to generate the Bilingual corpus and a description can be found in the following paper:
J. Castillo, M. Cardenas, "Using Sentence Semantic Similarity Based on WordNet in Recognizing Textual Entailment". 12th Ibero-American Conference on AI, IBERAMIA 2010, Bahía Blanca, Argentina, November 1-5, 2010, Springer LNAI, in press.
J.Castillo, "A Semantic Oriented Approach to Textual Entailment using WordNet-based Measures". 9th Mexican International Conference on Artificial Intelligence, MICAI 2010, Pachuca, Mexico, November 8-13, 2010, Springer LNAI, in press.
J.Castillo, M. Cardenas, "An Approach to Cross-Lingual Textual Entailment using Web Machine Translation Systems".10th Mexican International Conference on Artificial Intelligence.POLITIS Research journal on computer science and computer engineering with applications, ISSN 1870-9044, Issue 44, December 2011.
For questions please send an e-mail to : Julio Castillo ( jotacastillo A T gmail.com )