{"title":"复杂形态语言的生物医学文本检索","authors":"S. Schulz, Martin Honeck, U. Hahn","doi":"10.3115/1118149.1118158","DOIUrl":null,"url":null,"abstract":"Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, named entities, acronyms), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.","PeriodicalId":339993,"journal":{"name":"ACL Workshop on Natural Language Processing in the Biomedical Domain","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Biomedical text retrieval in languages with a complex morphology\",\"authors\":\"S. Schulz, Martin Honeck, U. Hahn\",\"doi\":\"10.3115/1118149.1118158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, named entities, acronyms), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.\",\"PeriodicalId\":339993,\"journal\":{\"name\":\"ACL Workshop on Natural Language Processing in the Biomedical Domain\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACL Workshop on Natural Language Processing in the Biomedical Domain\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1118149.1118158\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACL Workshop on Natural Language Processing in the Biomedical Domain","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1118149.1118158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Biomedical text retrieval in languages with a complex morphology
Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, named entities, acronyms), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.