{"title":"面向农业文本本体自动生成的术语和关系提取方法","authors":"Neha Kaushik, N. Chatterjee","doi":"10.1109/ICIT.2016.056","DOIUrl":null,"url":null,"abstract":"Large amount of data is created and stored in electronic media. Agriculture is no exception. Large unprocessed text are available on the various Government and other websites. Despite of large volume and availability, this data is underutilized. This data should be converted to an effective form so as to facilitate better information dissemination. Ontology is an efficient medium to carry out this task. This paper presents a simple and practical approach for automatic term and relationship extraction. Term extraction scheme uses domain-specific patterns to identify seed terms in crops subdomain of agriculture. Subsequently, NLP techniques are used to expand the terms collection. Term extraction scheme performs ahead of Termine, software for term extraction. The relationship extraction scheme employs patterns, position vectors and WordNet similarity to identify four type of relations from the agricultural text pertaining to crops. Relationships extraction scheme is evaluated using 10-fold cross validation. It runs well with an average precision of 88% on training data and 87% on test data. The resulting ontology is quite encouraging for future work.","PeriodicalId":220153,"journal":{"name":"2016 International Conference on Information Technology (ICIT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A Practical Approach for Term and Relationship Extraction for Automatic Ontology Creation from Agricultural Text\",\"authors\":\"Neha Kaushik, N. Chatterjee\",\"doi\":\"10.1109/ICIT.2016.056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large amount of data is created and stored in electronic media. Agriculture is no exception. Large unprocessed text are available on the various Government and other websites. Despite of large volume and availability, this data is underutilized. This data should be converted to an effective form so as to facilitate better information dissemination. Ontology is an efficient medium to carry out this task. This paper presents a simple and practical approach for automatic term and relationship extraction. Term extraction scheme uses domain-specific patterns to identify seed terms in crops subdomain of agriculture. Subsequently, NLP techniques are used to expand the terms collection. Term extraction scheme performs ahead of Termine, software for term extraction. The relationship extraction scheme employs patterns, position vectors and WordNet similarity to identify four type of relations from the agricultural text pertaining to crops. Relationships extraction scheme is evaluated using 10-fold cross validation. It runs well with an average precision of 88% on training data and 87% on test data. The resulting ontology is quite encouraging for future work.\",\"PeriodicalId\":220153,\"journal\":{\"name\":\"2016 International Conference on Information Technology (ICIT)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Information Technology (ICIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2016.056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Information Technology (ICIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2016.056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Practical Approach for Term and Relationship Extraction for Automatic Ontology Creation from Agricultural Text
Large amount of data is created and stored in electronic media. Agriculture is no exception. Large unprocessed text are available on the various Government and other websites. Despite of large volume and availability, this data is underutilized. This data should be converted to an effective form so as to facilitate better information dissemination. Ontology is an efficient medium to carry out this task. This paper presents a simple and practical approach for automatic term and relationship extraction. Term extraction scheme uses domain-specific patterns to identify seed terms in crops subdomain of agriculture. Subsequently, NLP techniques are used to expand the terms collection. Term extraction scheme performs ahead of Termine, software for term extraction. The relationship extraction scheme employs patterns, position vectors and WordNet similarity to identify four type of relations from the agricultural text pertaining to crops. Relationships extraction scheme is evaluated using 10-fold cross validation. It runs well with an average precision of 88% on training data and 87% on test data. The resulting ontology is quite encouraging for future work.