{"title":"Research Paper Classification and Recommendation System based-on Fine-Tuning BERT","authors":"Dipto Biswas, Joon-Min Gil","doi":"10.1109/IRI58017.2023.00058","DOIUrl":"https://doi.org/10.1109/IRI58017.2023.00058","url":null,"abstract":"In this paper, we compare the performance of two popular NLP models, pre-train fine-tuned BERT and BiLSTM with combined CNN, in terms of the classification and recommendation tasks of research papers. We conduct the performance evaluation of these two models with research journal benchmark dataset. Performance results show that the pre-train fine-tuned BERT model is superior to CNN-BiLSTM combined model in terms of classification performance.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"84 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113932860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LeCAR: Leveraging Context for Enhanced Automotive Specification Retrieval","authors":"Kuan-Wei Wu, Tz-Huan Hsu, Yen-Hao Huang, Yi-Shin Chen, Ho-Lung Wang, Bing-Jing Hsieh, Chi-Hung Hsu","doi":"10.1109/IRI58017.2023.00038","DOIUrl":"https://doi.org/10.1109/IRI58017.2023.00038","url":null,"abstract":"In the domain of automotive manufacturing, specification documents represent intricate descriptions detailing every aspect of a product, design, or service. Conventionally, these specifications demand the deployment of expert teams to manually identify crucial data from the extensive documentation. The need to automate the extraction of candidate information from these documents is increasingly pressing in this industry. This research encounters two central challenges: Firstly, the queries for the specifications input by users are typically concise and ambiguous; secondly, not every word in a query carries the same significance. In response to these challenges, we propose LeCAR, which exploits contextual data to clarify query sentences and concentrate the search scope. Our experiments validate that the proposed method outperforms existing techniques that employ pre-trained language models, all without necessitating additional training data.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121862562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke-ke Feng, Dahai Liu, Yongxin Liu, Hong Liu, H. Song
{"title":"GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration","authors":"Ke-ke Feng, Dahai Liu, Yongxin Liu, Hong Liu, H. Song","doi":"10.1109/IRI58017.2023.00048","DOIUrl":"https://doi.org/10.1109/IRI58017.2023.00048","url":null,"abstract":"The current National Airspace System (NAS) is reaching capacity due to increased air traffic, and is based on outdated pre-tactical planning. This study proposes a more dynamic airspace configuration (DAC) approach that could increase throughput and accommodate fluctuating traffic, ideal for emergencies. The proposed approach constructs the airspace as a constraints-embedded graph, compresses its dimensions, and applies a spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them. Under various traffic conditions, our experiments demonstrate a 50% reduction in workload imbalances. This research could ultimately form the basis for a recommendation system for optimized airspace configuration. Code available at https://github.com/KeFenge2022/GraphDAC.git.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Somayeh Ghanbarzadeh, H. Palangi, Yan Huang, R. C. Moreno, Hamed Khanpour
{"title":"Improving the Reusability of Pre-trained Language Models in Real-world Applications","authors":"Somayeh Ghanbarzadeh, H. Palangi, Yan Huang, R. C. Moreno, Hamed Khanpour","doi":"10.1109/IRI58017.2023.00015","DOIUrl":"https://doi.org/10.1109/IRI58017.2023.00015","url":null,"abstract":"The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs’ reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs’ generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs’ generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132987692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah Aldumaykhi, Saad Otai, Abdulkareem Alsudais
{"title":"Comparing Open Arabic Named Entity Recognition Tools","authors":"Abdullah Aldumaykhi, Saad Otai, Abdulkareem Alsudais","doi":"10.1109/IRI58017.2023.00016","DOIUrl":"https://doi.org/10.1109/IRI58017.2023.00016","url":null,"abstract":"The main objective of this paper is to compare and evaluate the performances of three open Arabic Named Entity Recognition (NER) tools: CAMeL, Hatmi, and Stanza. We collected a corpus consisting of 30 articles written in Modern Standard Arabic (MSA) and manually annotated all the entities of the person, organization, and location types at the article (document) level. Our results suggest a similarity between Stanza and Hatmi with the latter receiving the highest F1 score for the three entity types. However, CAMeL achieved the highest precision values for names of people and organizations. Following this, we implemented a “merge” method that combined the results from the three tools and a “vote” method that tagged named entities only when two of the three identified them as entities. Our results showed that merging achieved the highest overall F1 scores. Moreover, merging had the highest recall values while voting had the highest precision values for the three entity types. This indicates that merging is more suitable when recall is desired, while voting is optimal when precision is required. Finally, we collected a corpus of 21,635 articles related to COVID-19 and applied the merge and vote methods. Our analysis demonstrates the tradeoff between precision and recall for the two methods.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}