Scientific reference style using rule-based machine learning

Afrida Helen, Aditya Pradana, Muhammad Afif
{"title":"Scientific reference style using rule-based machine learning","authors":"Afrida Helen, Aditya Pradana, Muhammad Afif","doi":"10.26555/ijain.v9i3.1056","DOIUrl":null,"url":null,"abstract":"Regular Expressions (RegEx) can be employed as a technique for supervised learning to define and search for specific patterns inside text. This work devised a method that utilizes regular expressions to convert the reference style of academic papers into several styles, dependent on the specific needs of the target publication or conference. Our research aimed to detect distinctive patterns of reference styles using RegEx and compare them with a dataset including various reference styles. We gathered a diverse range of reference format categories, encompassing seven distinct classes, from various sources such as academic papers, journals, conference proceedings, and books. Our approach involves employing RegEx to convert one referencing format to another based on the user's specific preferences. The proposed model demonstrated an accuracy of 57.26% for book references and 57.56% for journal references. We used the similarity ratio and Levenshtein distance to evaluate the dataset's performance. The model achieved a 97.8% similarity ratio with a Levenshtein distance of 2. Notably, the APA style for journal references yielded the best results. However, the effectiveness of the extraction function varies depending on the reference style. For APA style, the model showed a 99.97% similarity ratio with a Levenshtein distance of 1. Overall, our proposed model outperforms baseline machine learning models in this task. This study introduces an automated program that utilizes regular expressions to modify academic reference formats. This will enhance the efficiency, precision, and adaptability of academic publishing.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v9i3.1056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Regular Expressions (RegEx) can be employed as a technique for supervised learning to define and search for specific patterns inside text. This work devised a method that utilizes regular expressions to convert the reference style of academic papers into several styles, dependent on the specific needs of the target publication or conference. Our research aimed to detect distinctive patterns of reference styles using RegEx and compare them with a dataset including various reference styles. We gathered a diverse range of reference format categories, encompassing seven distinct classes, from various sources such as academic papers, journals, conference proceedings, and books. Our approach involves employing RegEx to convert one referencing format to another based on the user's specific preferences. The proposed model demonstrated an accuracy of 57.26% for book references and 57.56% for journal references. We used the similarity ratio and Levenshtein distance to evaluate the dataset's performance. The model achieved a 97.8% similarity ratio with a Levenshtein distance of 2. Notably, the APA style for journal references yielded the best results. However, the effectiveness of the extraction function varies depending on the reference style. For APA style, the model showed a 99.97% similarity ratio with a Levenshtein distance of 1. Overall, our proposed model outperforms baseline machine learning models in this task. This study introduces an automated program that utilizes regular expressions to modify academic reference formats. This will enhance the efficiency, precision, and adaptability of academic publishing.
利用基于规则的机器学习,打造科学参考文献风格
正则表达式(RegEx)可作为一种监督学习技术来定义和搜索文本中的特定模式。这项工作设计了一种方法,利用正则表达式将学术论文的参考文献样式转换成多种样式,这取决于目标出版物或会议的具体需要。我们的研究旨在利用 RegEx 检测参考文献样式的独特模式,并将其与包含各种参考文献样式的数据集进行比较。我们从学术论文、期刊、会议论文集和书籍等各种来源收集了各种参考文献格式类别,包括七个不同的类别。我们的方法包括使用 RegEx,根据用户的特定偏好将一种参考文献格式转换为另一种。所提出的模型对书籍参考文献的准确率为 57.26%,对期刊参考文献的准确率为 57.56%。我们使用相似比和莱文斯坦距离来评估数据集的性能。该模型的相似比达到了 97.8%,莱文斯坦距离为 2。不过,提取功能的有效性因参考文献样式而异。总体而言,我们提出的模型在这项任务中的表现优于基线机器学习模型。本研究介绍了一种利用正则表达式修改学术参考文献格式的自动化程序。这将提高学术出版的效率、精确度和适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Advances in Intelligent Informatics
International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信