Daoud M. Daoud, S. El-Seoud, Fuad Alhosban, Ali Farhat
{"title":"使用无监督机器学习处理自发运行状况阿拉伯语查询的方法","authors":"Daoud M. Daoud, S. El-Seoud, Fuad Alhosban, Ali Farhat","doi":"10.1109/ICCA56443.2022.10039617","DOIUrl":null,"url":null,"abstract":"The goal of this work is to demonstrate that using mixed sublanguage and linguistic processing techniques, is both essential and possible to create a robust NL-based systems. The merging of accurate language processing with the analysis of the sublanguage will undoubtedly improve the processing's correctness and resilience. As a proof-of-concept, we created an experimental system (HASE) to test this hypothesis. The system is a search system for Arabic documents in the health and medical domain. To study the sublanguage we employed machine learning techniques. The initial corpus consists of 40 thousands unedited queries. HASE is built on top of SOLR with the integration of Arabic linguistic processing Component. Responses are generated using IR approach. Altibby is actively deploying HASE in Jordan (the largest health content). The IR component achieves a 90% f-measure when tested with actual noisy free text.","PeriodicalId":153139,"journal":{"name":"2022 International Conference on Computer and Applications (ICCA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Methods for Handling Spontaneous Health Arabic Queries using unsupervised machine learning\",\"authors\":\"Daoud M. Daoud, S. El-Seoud, Fuad Alhosban, Ali Farhat\",\"doi\":\"10.1109/ICCA56443.2022.10039617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of this work is to demonstrate that using mixed sublanguage and linguistic processing techniques, is both essential and possible to create a robust NL-based systems. The merging of accurate language processing with the analysis of the sublanguage will undoubtedly improve the processing's correctness and resilience. As a proof-of-concept, we created an experimental system (HASE) to test this hypothesis. The system is a search system for Arabic documents in the health and medical domain. To study the sublanguage we employed machine learning techniques. The initial corpus consists of 40 thousands unedited queries. HASE is built on top of SOLR with the integration of Arabic linguistic processing Component. Responses are generated using IR approach. Altibby is actively deploying HASE in Jordan (the largest health content). The IR component achieves a 90% f-measure when tested with actual noisy free text.\",\"PeriodicalId\":153139,\"journal\":{\"name\":\"2022 International Conference on Computer and Applications (ICCA)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer and Applications (ICCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCA56443.2022.10039617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer and Applications (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA56443.2022.10039617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Methods for Handling Spontaneous Health Arabic Queries using unsupervised machine learning
The goal of this work is to demonstrate that using mixed sublanguage and linguistic processing techniques, is both essential and possible to create a robust NL-based systems. The merging of accurate language processing with the analysis of the sublanguage will undoubtedly improve the processing's correctness and resilience. As a proof-of-concept, we created an experimental system (HASE) to test this hypothesis. The system is a search system for Arabic documents in the health and medical domain. To study the sublanguage we employed machine learning techniques. The initial corpus consists of 40 thousands unedited queries. HASE is built on top of SOLR with the integration of Arabic linguistic processing Component. Responses are generated using IR approach. Altibby is actively deploying HASE in Jordan (the largest health content). The IR component achieves a 90% f-measure when tested with actual noisy free text.