科学文献检索使用结构编码字符串与trie索引

Q3 Social Sciences
Sourish Dhar, S. Roy, Arnab Paul
{"title":"科学文献检索使用结构编码字符串与trie索引","authors":"Sourish Dhar, S. Roy, Arnab Paul","doi":"10.3233/isu-220155","DOIUrl":null,"url":null,"abstract":"Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.","PeriodicalId":39698,"journal":{"name":"Information Services and Use","volume":"23 1","pages":"241-259"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scientific document retrieval using structure encoded string with trie indexing\",\"authors\":\"Sourish Dhar, S. Roy, Arnab Paul\",\"doi\":\"10.3233/isu-220155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.\",\"PeriodicalId\":39698,\"journal\":{\"name\":\"Information Services and Use\",\"volume\":\"23 1\",\"pages\":\"241-259\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Services and Use\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/isu-220155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Services and Use","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/isu-220155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

摘要

从科学文献中检索数学表达式是一项具有挑战性的任务,因为数学表达式或公式与传统文本有很大的不同。数学表达式是高度符号化和复杂的。此外,数学公式的结构所传达的语义是不容忽视的。提出了一种基于数学公式查询的科学文献检索系统。本文探讨了结构编码字符串(SES)的概念,该概念已被用于数学表达式,以捕捉公式结构之间的关系。为了提高检索速度,提出了一种基于模式的索引方案。采用Jaro-Winkler相似度进行匹配和排序。采用标准评价指标对实验结果进行了报告,并与现有类似系统进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scientific document retrieval using structure encoded string with trie indexing
Retrieving mathematical expressions from scientific documents is a challenging task as mathematical expressions or formulae are quite different from the traditional text. Mathematical expressions are highly symbolic and complex. Moreover, the structure of a mathematical formula conveys a semantic meaning which cannot be overlooked. This paper proposes a scientific document retrieval system based on mathematical formula query. The paper explores the concept of Structure Encoded String (SES), which has been employed for mathematical expressions to capture the relations among the formula structures. A pattern based trie indexing scheme has been proposed for faster retrieval. The Jaro-Winkler Similarity has been adopted for matching and ranking. Experiments are conducted, results are reported using standard evaluation measures and compared with similar existing systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Services and Use
Information Services and Use Social Sciences-Library and Information Sciences
CiteScore
0.90
自引率
0.00%
发文量
41
期刊介绍: Information Services & Use is an information and information technology oriented publication with a wide scope of subject matters. International in terms of both audience and authorship, the journal aims at leaders in information management and applications in an attempt to keep them fully informed of fast-moving developments in fields such as: online systems, offline systems, electronic publishing, library automation, education and training, word processing and telecommunications. These areas are treated not only in general, but also in specific contexts; applications to business and scientific fields are sought so that a balanced view is offered to the reader.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信