{"title":"开放数据集之间共享概念的揭示","authors":"Miloš Bogdanović, Nataša Veljković, Milena Frtunić Gligorijević, Darko Puflović, Leonid Stoimenov","doi":"10.1016/j.websem.2020.100624","DOIUrl":null,"url":null,"abstract":"<div><p><span>Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data<span><span> portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of </span>semantic similarity measures based on </span></span>natural language processing<span> mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.</span></p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On revealing shared conceptualization among open datasets\",\"authors\":\"Miloš Bogdanović, Nataša Veljković, Milena Frtunić Gligorijević, Darko Puflović, Leonid Stoimenov\",\"doi\":\"10.1016/j.websem.2020.100624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data<span><span> portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of </span>semantic similarity measures based on </span></span>natural language processing<span> mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.</span></p></div>\",\"PeriodicalId\":49951,\"journal\":{\"name\":\"Journal of Web Semantics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Semantics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1570826820300573\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826820300573","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
On revealing shared conceptualization among open datasets
Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.
期刊介绍:
The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.