虚拟参考分析的自然语言处理

IF 0.4 Q4 INFORMATION SCIENCE & LIBRARY SCIENCE

Evidence Based Library and Information Practice Pub Date : 2022-03-15 DOI:10.18438/eblip30014

Ansh Sharma, Kathryn Barrett, Kirsta Stapelfeldt

{"title":"虚拟参考分析的自然语言处理","authors":"Ansh Sharma, Kathryn Barrett, Kirsta Stapelfeldt","doi":"10.18438/eblip30014","DOIUrl":null,"url":null,"abstract":"Objective – Chat transcript analysis can illuminate user needs by identifying common question topics, but traditional hand coding methods for topic analysis are time-consuming and poorly suited to large datasets. The research team explored the viability of automatic and natural language processing (NLP) strategies to perform rapid topic analysis on a large dataset of transcripts from a consortial chat service.\nMethods – The research team developed a toolchain for data processing and analysis, which incorporated targeted searching for query terms using regular expressions and natural language processing using the Python spaCy library for automatic topic analysis. Processed data was exported to Tableau for visualization. Results were compared to hand-coded data to test the accuracy of conclusions.\nResults – The processed data provided insights about the volume of chats originating from each participating library, the proportion of chats answered by operator groups for each library, and the percentage of chats answered by different staff types. The data also captured the top referring URLs for the service, course codes and file extensions mentioned, and query hits. Natural language processing revealed that the most common topics were related to citation, subscription databases, and finding full-text articles, which aligns with common question types identified in hand-coded transcripts.\nConclusion – Compared to hand coding, automatic and NLP processing approaches have benefits in terms of the volume of data that can be analyzed and the time frame required for analysis, but they come with a trade-off in accuracy, such as false hits. Therefore, computational approaches should be used to supplement traditional hand coding methods. As NLP becomes more accurate, approaches such as these may widen avenues of insight into virtual reference and patron needs.","PeriodicalId":45227,"journal":{"name":"Evidence Based Library and Information Practice","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Natural Language Processing for Virtual Reference Analysis\",\"authors\":\"Ansh Sharma, Kathryn Barrett, Kirsta Stapelfeldt\",\"doi\":\"10.18438/eblip30014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective – Chat transcript analysis can illuminate user needs by identifying common question topics, but traditional hand coding methods for topic analysis are time-consuming and poorly suited to large datasets. The research team explored the viability of automatic and natural language processing (NLP) strategies to perform rapid topic analysis on a large dataset of transcripts from a consortial chat service.\\nMethods – The research team developed a toolchain for data processing and analysis, which incorporated targeted searching for query terms using regular expressions and natural language processing using the Python spaCy library for automatic topic analysis. Processed data was exported to Tableau for visualization. Results were compared to hand-coded data to test the accuracy of conclusions.\\nResults – The processed data provided insights about the volume of chats originating from each participating library, the proportion of chats answered by operator groups for each library, and the percentage of chats answered by different staff types. The data also captured the top referring URLs for the service, course codes and file extensions mentioned, and query hits. Natural language processing revealed that the most common topics were related to citation, subscription databases, and finding full-text articles, which aligns with common question types identified in hand-coded transcripts.\\nConclusion – Compared to hand coding, automatic and NLP processing approaches have benefits in terms of the volume of data that can be analyzed and the time frame required for analysis, but they come with a trade-off in accuracy, such as false hits. Therefore, computational approaches should be used to supplement traditional hand coding methods. As NLP becomes more accurate, approaches such as these may widen avenues of insight into virtual reference and patron needs.\",\"PeriodicalId\":45227,\"journal\":{\"name\":\"Evidence Based Library and Information Practice\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2022-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evidence Based Library and Information Practice\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18438/eblip30014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evidence Based Library and Information Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18438/eblip30014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 3

摘要

目标-聊天记录分析可以通过识别常见的问题主题来阐明用户需求，但用于主题分析的传统手工编码方法耗时且不适合大型数据集。研究团队探索了自动和自然语言处理（NLP）策略的可行性，以对来自联合聊天服务的大型转录本数据集进行快速主题分析。方法——研究团队开发了一个用于数据处理和分析的工具链，其中包括使用正则表达式有针对性地搜索查询词，以及使用Python spaCy库进行自动主题分析的自然语言处理。处理后的数据被导出到Tableau进行可视化。将结果与手工编码的数据进行比较，以检验结论的准确性。结果——处理后的数据提供了有关每个参与图书馆的聊天量、每个图书馆操作员组回答的聊天比例以及不同员工类型回答的聊天百分比的见解。这些数据还捕获了该服务的顶级引用URL、提到的课程代码和文件扩展名以及查询命中率。自然语言处理显示，最常见的主题与引文、订阅数据库和查找全文文章有关，这与手工编码的成绩单中确定的常见问题类型一致。结论-与手动编码相比，自动和NLP处理方法在可分析的数据量和分析所需的时间框架方面具有优势，但它们在准确性方面存在权衡，例如误命中。因此，应该使用计算方法来补充传统的手工编码方法。随着NLP变得更加准确，这样的方法可能会拓宽深入了解虚拟参考和客户需求的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Natural Language Processing for Virtual Reference Analysis

Objective – Chat transcript analysis can illuminate user needs by identifying common question topics, but traditional hand coding methods for topic analysis are time-consuming and poorly suited to large datasets. The research team explored the viability of automatic and natural language processing (NLP) strategies to perform rapid topic analysis on a large dataset of transcripts from a consortial chat service. Methods – The research team developed a toolchain for data processing and analysis, which incorporated targeted searching for query terms using regular expressions and natural language processing using the Python spaCy library for automatic topic analysis. Processed data was exported to Tableau for visualization. Results were compared to hand-coded data to test the accuracy of conclusions. Results – The processed data provided insights about the volume of chats originating from each participating library, the proportion of chats answered by operator groups for each library, and the percentage of chats answered by different staff types. The data also captured the top referring URLs for the service, course codes and file extensions mentioned, and query hits. Natural language processing revealed that the most common topics were related to citation, subscription databases, and finding full-text articles, which aligns with common question types identified in hand-coded transcripts. Conclusion – Compared to hand coding, automatic and NLP processing approaches have benefits in terms of the volume of data that can be analyzed and the time frame required for analysis, but they come with a trade-off in accuracy, such as false hits. Therefore, computational approaches should be used to supplement traditional hand coding methods. As NLP becomes more accurate, approaches such as these may widen avenues of insight into virtual reference and patron needs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Evidence Based Library and Information Practice INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

0.80

自引率

12.50%

发文量

审稿时长

12 weeks