{"title":"如何用分类回答我的问题","authors":"Kelvin Wu, Lei Yu, M. Cutler","doi":"10.1109/AINAW.2007.356","DOIUrl":null,"url":null,"abstract":"Interest in developing open domain question answering systems that leverage the massive amount of knowledge available on the Web is on the rise. In this investigation, we address the problem of answering How do I questions. Our goal is to use the top results obtained from a search engine to extract and present correct answers. Identifying correct answers to such questions is a hard problem that seems to require deep natural language understanding. Fortunately, answers to How do I questions are often procedural, typically containing a successive sequence of actions. Learning to label text as procedural or non-procedural is an easier problem which we attempted to solve by extracting 12 informative features with which we trained classifiers. However, the corpus built from the top documents retrieved for a set of How do I- equivalent queries turned out to be highly imbalanced. To tackle this issue, sampling techniques were used for a variety of classification methods, yielding reasonable recall and precision for the minority class of procedural texts.","PeriodicalId":338799,"journal":{"name":"21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Answering How do I Questions Using Classification\",\"authors\":\"Kelvin Wu, Lei Yu, M. Cutler\",\"doi\":\"10.1109/AINAW.2007.356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interest in developing open domain question answering systems that leverage the massive amount of knowledge available on the Web is on the rise. In this investigation, we address the problem of answering How do I questions. Our goal is to use the top results obtained from a search engine to extract and present correct answers. Identifying correct answers to such questions is a hard problem that seems to require deep natural language understanding. Fortunately, answers to How do I questions are often procedural, typically containing a successive sequence of actions. Learning to label text as procedural or non-procedural is an easier problem which we attempted to solve by extracting 12 informative features with which we trained classifiers. However, the corpus built from the top documents retrieved for a set of How do I- equivalent queries turned out to be highly imbalanced. To tackle this issue, sampling techniques were used for a variety of classification methods, yielding reasonable recall and precision for the minority class of procedural texts.\",\"PeriodicalId\":338799,\"journal\":{\"name\":\"21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AINAW.2007.356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINAW.2007.356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
开发开放领域问答系统以利用网络上大量可用知识的兴趣正在上升。在这次调查中,我们解决了如何回答问题的问题。我们的目标是使用从搜索引擎获得的顶级结果来提取并呈现正确的答案。识别这些问题的正确答案是一个难题,似乎需要深刻的自然语言理解能力。幸运的是,“How do I”问题的答案通常是程序性的,通常包含一系列连续的动作。学习将文本标记为程序性或非程序性是一个更容易的问题,我们试图通过提取12个信息特征来解决这个问题,我们用这些特征来训练分类器。然而,从一组“如何等同”查询检索到的顶级文档构建的语料库是高度不平衡的。为了解决这个问题,我们将抽样技术用于各种分类方法,对少数类程序文本产生合理的召回率和精度。
Towards Answering How do I Questions Using Classification
Interest in developing open domain question answering systems that leverage the massive amount of knowledge available on the Web is on the rise. In this investigation, we address the problem of answering How do I questions. Our goal is to use the top results obtained from a search engine to extract and present correct answers. Identifying correct answers to such questions is a hard problem that seems to require deep natural language understanding. Fortunately, answers to How do I questions are often procedural, typically containing a successive sequence of actions. Learning to label text as procedural or non-procedural is an easier problem which we attempted to solve by extracting 12 informative features with which we trained classifiers. However, the corpus built from the top documents retrieved for a set of How do I- equivalent queries turned out to be highly imbalanced. To tackle this issue, sampling techniques were used for a variety of classification methods, yielding reasonable recall and precision for the minority class of procedural texts.