{"title":"Knowledge models from PDF textbooks","authors":"Isaac Alpizar Chacon, Sergey Sosnovsky","doi":"10.1080/13614568.2021.1889692","DOIUrl":null,"url":null,"abstract":"ABSTRACT Textbooks are educational documents created, structured and formatted by domain experts with the primary purpose to explain the knowledge in the domain to a novice. Authors use their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of textbooks carry the elements of domain knowledge implicitly encoded by their authors. Our paper presents an extensible approach towards automated extraction of knowledge models from textbooks and enrichment of their content with additional links (both internal and external). The textbooks themselves essentially become hypertext documents where individual pages are annotated with important concepts in the domain. The evaluation experiments examine several aspects and stages of the approach, including the accuracy of model extraction, the pragmatic quality of extracted models using one of their possible applications— semantic linking of textbooks in the same domain, the accuracy of linking models to external knowledge sources and the effect of integration of multiple textbooks from the same domain. The results indicate high accuracy of model extraction on symbolic, syntactic and structural levels across textbooks and domains, and demonstrate the added value of the extracted models on the semantic level.","PeriodicalId":54386,"journal":{"name":"New Review of Hypermedia and Multimedia","volume":"27 1","pages":"128 - 176"},"PeriodicalIF":1.4000,"publicationDate":"2021-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/13614568.2021.1889692","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Review of Hypermedia and Multimedia","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/13614568.2021.1889692","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 10
Abstract
ABSTRACT Textbooks are educational documents created, structured and formatted by domain experts with the primary purpose to explain the knowledge in the domain to a novice. Authors use their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of textbooks carry the elements of domain knowledge implicitly encoded by their authors. Our paper presents an extensible approach towards automated extraction of knowledge models from textbooks and enrichment of their content with additional links (both internal and external). The textbooks themselves essentially become hypertext documents where individual pages are annotated with important concepts in the domain. The evaluation experiments examine several aspects and stages of the approach, including the accuracy of model extraction, the pragmatic quality of extracted models using one of their possible applications— semantic linking of textbooks in the same domain, the accuracy of linking models to external knowledge sources and the effect of integration of multiple textbooks from the same domain. The results indicate high accuracy of model extraction on symbolic, syntactic and structural levels across textbooks and domains, and demonstrate the added value of the extracted models on the semantic level.
期刊介绍:
The New Review of Hypermedia and Multimedia (NRHM) is an interdisciplinary journal providing a focus for research covering practical and theoretical developments in hypermedia, hypertext, and interactive multimedia.