Causal Corpus 事件因果關係語料統計

Causal Corpus 事件因果關係語料統計

本文是對因果關係抽取領域數據庫標註及開源狀況的統計。除了對因果關係的標註,一些相似的語料也包含在內,從而爲語料的使用提供靈活性,能夠根據不一樣的目標選取不一樣的語料庫。php

領域簡介

因果關係一般標註爲 ( cause , effect , signal ) 三元組,cause 和 effect 分別表明緣由事件和結果事件,signal 是語言學從因果結構的觸發詞,例如 because, so, thus 等等。html

須要知道的是不一樣的因果語料對於因果關係的定義以及對事件的定義有很大差別,從而致使至今沒有一個大規模的統一語料庫支撐該領域開展開放域的研究。若是給出好的定義也是學術界討論的焦點。git

因果事件語料一般做爲因果事件抽取、因果推斷等任務的基礎,容許使用規則、機器學習、深度學習等方法對事件鏈進行分析。github

採樣策略

本文采用的語料蒐集方法是基於領域關鍵詞(如 causal, relation, causality )從 Google Scholar 獲取種子論文集,根據文獻之間的引述關係,不斷拓展相關文檔範圍,最終獲得領域相關的語料集合。web

對於 arxiv 暫不收錄,只針對已發表的文章進行統計。spring

統計分析

名稱 年份 規模(因果關係數量) 開源狀況 備註
SemEval-2007 task 4 2007 210 ~
The Penn Discourse Treebank 2.0 2008 ~ 沒有專門對因果進行標註。因果被記爲 contingency relationship 的子類。顯式因果,且觸發詞不完整,沒法徹底的表述因果,不少狀況沒有標記。BECauSE Corpus 2.0相對其更加完善。
Bethard et al., 2008 2008 - paper中連接以失效 標註了一個小語料庫,針對被 ’and' 鏈接的事件binary 因果標註。
SemEval-2010 task 8 2010 1,331 每條句子只標註一對因果事件,即便還存在其餘因果事件。實體不標註完整信息,只標註head。
Richer Event Descriptions 2014 1,147 對THYME病例語料標註的豐富,添加了事件共指註釋,同時實現了相鄰句之間的事件關係標註,對因果進行區分, ‘PRECONDITION’ and ‘CAUSE’
Causal-TimeBank 2014 298 提出一種更加普遍覆蓋的語言學的方法來豐富 TimeML 語料庫,使其包含因果關係和觸發詞。要求事件是TimeML中標註的事件,基於語言學特徵進行標註。guideline 不夠精確,更多地依賴於主觀概念。
The Chinese Discourse TreeBank 2015 261 找到的惟二中文語料。
CaTeRS 2016 約700 320篇小說,1600個句子,2708個事件,2715個關係,13種類型。實體不標註完整信息,只標註head。不是標註現實世界的因果,而是故事中結合人的推理可以獲得的因果結論。側重於script and narrative structure learning
AltLex 2016 9,190 利用PDTB和Wikipedia語料,使用distant supervision demonstrates方法,提出了一種自動構建因果標註集的方法,文末做者提到了他沒有對標註的質量進行細緻的驗證。只是做爲一個組件參與分類器從而提高最終性能。
BECauSE Corpus 2.0 2017 1,803 顯式因果。與其餘標註方案的一致性高,語言學因果結構覆蓋完整。同時平行標註了其餘關係,容許同一事件對包含多種關係。對不一樣關係間的重疊進行討論。是目前爲止找到的最好的語料。
Event StoryLine Corpus 2017 5,519 PLOT_ LINK 該語料對故事進行標註,標註條目PLOT_LINK 表達 explanatory relations ,即說明性的、幫助讀者理解故事敘述架構的關係信息,標註結果和因果很是類似,可是出發點又有不一樣。這種關係的目的是使(新聞)故事中事件的連貫性或邏輯聯繫變得清晰,爲事件之間的一種鬆散的因果或時序關係,一件事的說起解釋了/證實了另外一件事的發生。
HIT-CDTB ? 2,138(顯式)+1,526(隱式) HIT篇章關係語料。存疑。

對於各個語料的具體分析還沒有整理完善,有須要的看官能夠郵件聯繫我。數據庫

參考資料

  1. Girju R, Nakov P, Nastase V, et al. Semeval-2007 task 04: Classification of semantic relations between nominals[C]//Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, 2007: 13-18.
  2. Prasad R, Dinesh N, Lee A, et al. The Penn Discourse TreeBank 2.0[C]//LREC. 2008.
  3. Bethard S, Corvey W J, Klingenstein S, et al. Building a Corpus of Temporal-Causal Structure[C]//LREC. 2008.
  4. Hendrickx I, Kim S N, Kozareva Z, et al. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals[C]//Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Association for Computational Linguistics, 2009: 94-99.
  5. O’Gorman T, Wright-Bettner K, Palmer M. Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation[C]//Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016). 2016: 47-56.
  6. Mirza P, Sprugnoli R, Tonelli S, et al. Annotating causality in the TempEval-3 corpus[C]//EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL). Association for Computational Linguistics, 2014: 10-19.
  7. Zhou Y, Xue N. The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations[J]. Language Resources and Evaluation, 2015, 49(2): 397-431.
  8. Mostafazadeh N, Grealish A, Chambers N, et al. CaTeRS: Causal and temporal relation scheme for semantic annotation of event structures[C]//Proceedings of the Fourth Workshop on Events. 2016: 51-61.
  9. Hidey C, McKeown K. Identifying causal relations using parallel Wikipedia articles[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016: 1424-1433.
  10. Dunietz J, Levin L, Carbonell J. The BECauSE corpus 2.0: Annotating causality and overlapping relations[C]//Proceedings of the 11th Linguistic Annotation Workshop. 2017: 95-104.
  11. Caselli T, Vossen P. The event storyline corpus: A new benchmark for causal and temporal relation extraction[C]//Proceedings of the Events and Stories in the News Workshop. 2017: 77-86.
  12. T. N. de Silva, X. Zhibo, Z. Rui, M. Kezhi, Causal relation identification using convolutional neural networks and knowledge based features, World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering 11 (6) (2017) 697–702.
  13. C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J. Oh, M. Tanaka, Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 3466–3473.
  14. C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J. Oh, M. Tanaka, Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 3466–3473.
  15. C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J. Oh, M. Tanaka, Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 3466–3473.
  16. C. Kruengkrai, K. Torisawa, C. Hashimoto, J. Kloetzer, J. Oh, M. Tanaka, Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 3466–3473.
  17. J. Dunietz, J. G. Carbonell, L. S. Levin, Deepcx: A transition-based approach for shallow semantic parsing with complex constructional triggers, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, 2018, pp. 1691–1701.

共享協議

本文由 ArrogantL 整理並在 CC BY-NC-SA 3.0 協議下發布。有任何問題請郵件聯繫 arrogant262@gmail.commarkdown

請各位遵循 Markdown: License 及其它參考文獻的共享協議來使用、修改和發佈。架構

相關文章
相關標籤/搜索