{"title":"A new methodology for automatic creation of concept maps of Turkish texts","authors":"Merve Bayrak, Deniz Dal","doi":"10.1007/s10579-023-09713-9","DOIUrl":null,"url":null,"abstract":"<p>Concept maps are two-dimensional visual tools that describe the relationships between concepts belonging to a particular subject. The manual creation of these maps entails problems such as requiring expertise in the relevant field, minimizing visual complexity, and integrating maps, especially in terms of text-intensive documents. In order to overcome these problems, automatic creation of concept maps is required. On the other hand, the production of a fully automated and human-hand quality concept map from a document has not yet been achieved satisfactorily. Motivated by this observation, this study aims to develop a new methodology for automatic creation of the concept maps from Turkish text documents for the first time in the literature. In this respect, within the scope of this study, a new heuristic algorithm has been developed using the Turkish Natural Language Processing software chain and the Graphviz tool to automatically extract concept maps from Turkish texts. The proposed algorithm works with the principle of obtaining concepts based on the dependencies of Turkish words in sentences. The algorithm also determines the sentences to be added to the concept map with a new sentence scoring mechanism. The developed algorithm has been applied on a total of 20 data sets in the fields of Turkish Literature, Geography, Science, and Computer Sciences. The effectiveness of the algorithm has been analyzed with three different performance evaluation criteria, namely precision, recall and F-score. The findings have revealed that the proposed algorithm is quite effective in Turkish texts containing concepts. It has also been observed that the sentence selection algorithm produces results close to the average value in terms of the performance criteria being evaluated. According to the findings, the concept maps automatically obtained by the proposed algorithm are quite similar to the concept maps extracted manually. On the other hand, there is a limitation of the developed algorithm since it is dependent on a natural language processing tool and therefore requires manual intervention in some cases.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"41 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-023-09713-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Concept maps are two-dimensional visual tools that describe the relationships between concepts belonging to a particular subject. The manual creation of these maps entails problems such as requiring expertise in the relevant field, minimizing visual complexity, and integrating maps, especially in terms of text-intensive documents. In order to overcome these problems, automatic creation of concept maps is required. On the other hand, the production of a fully automated and human-hand quality concept map from a document has not yet been achieved satisfactorily. Motivated by this observation, this study aims to develop a new methodology for automatic creation of the concept maps from Turkish text documents for the first time in the literature. In this respect, within the scope of this study, a new heuristic algorithm has been developed using the Turkish Natural Language Processing software chain and the Graphviz tool to automatically extract concept maps from Turkish texts. The proposed algorithm works with the principle of obtaining concepts based on the dependencies of Turkish words in sentences. The algorithm also determines the sentences to be added to the concept map with a new sentence scoring mechanism. The developed algorithm has been applied on a total of 20 data sets in the fields of Turkish Literature, Geography, Science, and Computer Sciences. The effectiveness of the algorithm has been analyzed with three different performance evaluation criteria, namely precision, recall and F-score. The findings have revealed that the proposed algorithm is quite effective in Turkish texts containing concepts. It has also been observed that the sentence selection algorithm produces results close to the average value in terms of the performance criteria being evaluated. According to the findings, the concept maps automatically obtained by the proposed algorithm are quite similar to the concept maps extracted manually. On the other hand, there is a limitation of the developed algorithm since it is dependent on a natural language processing tool and therefore requires manual intervention in some cases.
期刊介绍:
Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications.
Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use.
Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.