Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages

2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI:10.1109/ICDMW.2014.107

Jingsong Zhang, Yinglin Wang, Dingyu Yang

{"title":"Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages","authors":"Jingsong Zhang, Yinglin Wang, Dingyu Yang","doi":"10.1109/ICDMW.2014.107","DOIUrl":null,"url":null,"abstract":"Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a \"the more similar the higher priority\" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"71 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Data Mining Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2014.107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a "the more similar the higher priority" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.

查看原文本刊更多论文

从多域维基百科页面中自动学习通用定义模式

自动定义提取在自然语言处理领域和基于知识的应用中引起了广泛的兴趣。定义抽取的一个主要任务是从定义句中挖掘模式。现有的定义模式提取方法，要么侧重于通过直觉或观察进行人工提取，要么旨在通过自动提取方法挖掘复杂的定义模式。由于词典语法结构的多样性，手工方法需要大量的人力资源来识别定义模式。它不可避免地会受到不良的影响，尤其是在跨领域的语料库中。后一种方法主要考虑定义提取的精度，但以降低定义的查全率为代价。这两种方法都不适用于跨域定义提取。为了解决这些问题，本文提出了一种从维基百科多领域定义句中自动提取定义模式的解决方案。该方法是在FIND-S算法的基础上改进的，解决了跨领域语料库的定义提取问题。Find-SS采用“越相似优先级越高”的方案来提高学习性能。它可以容纳一些有噪声的信息，并且不需要任何模式种子进行模式学习。实验结果表明，我们的方案明显优于以前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Conference on Data Mining Workshop

自引率

0.00%

发文量