{"title":"Machine Learning Offers Opportunities to Advance Library Services","authors":"Samantha Kaplan","doi":"10.18438/eblip30527","DOIUrl":null,"url":null,"abstract":"A Review of:\nWang, Y. (2022). Using machine learning and natural language processing to analyze library chat reference transcripts. Information Technology and Libraries, 41(3). https://doi.org/10.6017/ital.v41i3.14967\nObjective – The study sought to develop a model to predict if library chat questions are reference or non-reference.\nDesign – Supervised machine learning and natural language processing.\nSetting – College of New Jersey academic library.\nSubjects – 8,000 Springshare LibChat transactions collected from 2014 to 2021.\nMethods – The chat logs were downloaded into Excel, cleaned, and individual questions were labelled reference or non-reference by hand. Labelled data were preprocessed to remove nonmeaningful and stop words, and reformatted to lowercase. Data were then stemmed to group words with similar meaning. The feature of question length was then added and data were transformed from text to numeric for text vectorization. Data were then divided into training and testing sets. The Python packages Natural Language Toolkit (NLTK) and scikit-learn were used for analysis, building random forest and gradient boosting models which were evaluated via confusion matrix.\nMain Results – Both models performed very well in precision, recall and accuracy, with the random forest model having better overall results than the gradient boosting model, as well as a more efficient fit time, though slightly longer prediction time.\nConclusion – High volume library chat services could benefit from utilizing machine learning to develop models that inform plugins or chat enhancements to filter chat queries quickly.","PeriodicalId":45227,"journal":{"name":"Evidence Based Library and Information Practice","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evidence Based Library and Information Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18438/eblip30527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A Review of:
Wang, Y. (2022). Using machine learning and natural language processing to analyze library chat reference transcripts. Information Technology and Libraries, 41(3). https://doi.org/10.6017/ital.v41i3.14967
Objective – The study sought to develop a model to predict if library chat questions are reference or non-reference.
Design – Supervised machine learning and natural language processing.
Setting – College of New Jersey academic library.
Subjects – 8,000 Springshare LibChat transactions collected from 2014 to 2021.
Methods – The chat logs were downloaded into Excel, cleaned, and individual questions were labelled reference or non-reference by hand. Labelled data were preprocessed to remove nonmeaningful and stop words, and reformatted to lowercase. Data were then stemmed to group words with similar meaning. The feature of question length was then added and data were transformed from text to numeric for text vectorization. Data were then divided into training and testing sets. The Python packages Natural Language Toolkit (NLTK) and scikit-learn were used for analysis, building random forest and gradient boosting models which were evaluated via confusion matrix.
Main Results – Both models performed very well in precision, recall and accuracy, with the random forest model having better overall results than the gradient boosting model, as well as a more efficient fit time, though slightly longer prediction time.
Conclusion – High volume library chat services could benefit from utilizing machine learning to develop models that inform plugins or chat enhancements to filter chat queries quickly.