Jogeswar Tripathy, Rasmita Dash, B. K. Pattanayak, Bibhuranjan Mohanty
{"title":"使用POST的自动短语挖掘:最佳方法","authors":"Jogeswar Tripathy, Rasmita Dash, B. K. Pattanayak, Bibhuranjan Mohanty","doi":"10.1109/ODICON50556.2021.9429014","DOIUrl":null,"url":null,"abstract":"Phrase mining is the way toward deriving aspects of expressions from the collection of texts. Several uses of phrase mining include information retrieval/extraction, taxonomy construction, and topic modeling. The existing strategy requires a prepared linguistic analyzer and has an unacceptable execution for new areas since it requires human specialists for labeling the phrase. The phrase generated in those systems for a given input text contains only a single word that may be often unambiguous to the user. The aim is to automate the phrase mining process and enhance its performance. The proposed method is a framework that requires minimal human labeling effort and only shallow linguistic analysis. A POS_tagger is used to extract the important words (nouns and noun phrases) from a text after which text ranking is applied. Then cosine similarity is used to identify the quality phrase from the text. Phrase quality can be estimated at two levels, once after POS_guided segmentation and then re-estimate the score at the end. Compared to the existing method, the proposed method has showna significant improvement in effectiveness and efficiency across different domains. This technique can be reached out to support any language up to a normal learning base (e.g. Wikipedia) of comparing vocabulary is accessible.","PeriodicalId":197132,"journal":{"name":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automated Phrase Mining Using POST: The Best Approach\",\"authors\":\"Jogeswar Tripathy, Rasmita Dash, B. K. Pattanayak, Bibhuranjan Mohanty\",\"doi\":\"10.1109/ODICON50556.2021.9429014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phrase mining is the way toward deriving aspects of expressions from the collection of texts. Several uses of phrase mining include information retrieval/extraction, taxonomy construction, and topic modeling. The existing strategy requires a prepared linguistic analyzer and has an unacceptable execution for new areas since it requires human specialists for labeling the phrase. The phrase generated in those systems for a given input text contains only a single word that may be often unambiguous to the user. The aim is to automate the phrase mining process and enhance its performance. The proposed method is a framework that requires minimal human labeling effort and only shallow linguistic analysis. A POS_tagger is used to extract the important words (nouns and noun phrases) from a text after which text ranking is applied. Then cosine similarity is used to identify the quality phrase from the text. Phrase quality can be estimated at two levels, once after POS_guided segmentation and then re-estimate the score at the end. Compared to the existing method, the proposed method has showna significant improvement in effectiveness and efficiency across different domains. This technique can be reached out to support any language up to a normal learning base (e.g. Wikipedia) of comparing vocabulary is accessible.\",\"PeriodicalId\":197132,\"journal\":{\"name\":\"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ODICON50556.2021.9429014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ODICON50556.2021.9429014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Phrase Mining Using POST: The Best Approach
Phrase mining is the way toward deriving aspects of expressions from the collection of texts. Several uses of phrase mining include information retrieval/extraction, taxonomy construction, and topic modeling. The existing strategy requires a prepared linguistic analyzer and has an unacceptable execution for new areas since it requires human specialists for labeling the phrase. The phrase generated in those systems for a given input text contains only a single word that may be often unambiguous to the user. The aim is to automate the phrase mining process and enhance its performance. The proposed method is a framework that requires minimal human labeling effort and only shallow linguistic analysis. A POS_tagger is used to extract the important words (nouns and noun phrases) from a text after which text ranking is applied. Then cosine similarity is used to identify the quality phrase from the text. Phrase quality can be estimated at two levels, once after POS_guided segmentation and then re-estimate the score at the end. Compared to the existing method, the proposed method has showna significant improvement in effectiveness and efficiency across different domains. This technique can be reached out to support any language up to a normal learning base (e.g. Wikipedia) of comparing vocabulary is accessible.