Sandaru Seneviratne, Sergio J. Rodr'iguez M'endez, Xuecheng Zhang, Pouya Ghiasnezhad Omran, K. Taylor, A. Haller
{"title":"TNNT: The Named Entity Recognition Toolkit","authors":"Sandaru Seneviratne, Sergio J. Rodr'iguez M'endez, Xuecheng Zhang, Pouya Ghiasnezhad Omran, K. Taylor, A. Haller","doi":"10.1145/3460210.3493550","DOIUrl":"https://doi.org/10.1145/3460210.3493550","url":null,"abstract":"Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed decisions. This paper presentsfootnoteThe manuscript follows guidelines to showcase a demonstration that introduces an overview of how the toolkit works: input document set, initial settings, processing, and output set. The input document set is artificial in order to show various toolkit capabilities. TNNT, a toolkit that automates the extraction of categorised named entities from unstructured information encoded in source documents, using diverse state-of-the-art (SOTA) Natural Language Processing (NLP) tools and NER models.TNNT integrates 21 different NER models as part of a Knowledge Graph Construction Pipeline (KGCP) that takes a document set as input and processes it based on the defined settings, applying the selected blocks of NER models to output the results. The toolkit generates all results with an integrated summary of the extracted entities, enabling enhanced data analysis to support the KGCP, and also, to aid further NLP tasks.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132041438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luigi Asprino, Valentina Anita Carriero, V. Presutti
{"title":"Extraction of Common Conceptual Components from Multiple Ontologies","authors":"Luigi Asprino, Valentina Anita Carriero, V. Presutti","doi":"10.1145/3460210.3493542","DOIUrl":"https://doi.org/10.1145/3460210.3493542","url":null,"abstract":"Understanding large ontologies is still an issue, and has an impact on many ontology engineering tasks. We describe a novel method for identifying and extracting conceptual components from domain ontologies, which are used to understand and compare them. The method is applied to two corpora of ontologies in the Cultural Heritage and Conference domain, respectively. The results, which show good quality, are evaluated by manual inspection and by correlation with datasets and tool performance from the ontology alignment evaluation initiative.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117158816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Abdelaziz, Julian T Dolby, Jamie McCusker, Kavitha Srinivas
{"title":"A Toolkit for Generating Code Knowledge Graphs","authors":"I. Abdelaziz, Julian T Dolby, Jamie McCusker, Kavitha Srinivas","doi":"10.1145/3460210.3493578","DOIUrl":"https://doi.org/10.1145/3460210.3493578","url":null,"abstract":"Knowledge graphs have been proven extremely useful in powering diverse applications in semantic search and natural language understanding. In this work, we present GraphGen4Code, a toolkit to build code knowledge graphs that can similarly power various applications such as program search, code understanding, bug detection, and code automation. GraphGen4Code uses generic techniques to capture code semantics with the key nodes in the graph representing classes, functions and methods. Edges indicate function usage (e.g., how data flows through function calls, as derived from program analysis of real code), and documentation about functions (e.g., code documentation, usage documentation, or forum discussions such as StackOverflow). Our toolkit uses named graphs in RDF to model graphs per program, or can output graphs as JSON. We show the scalability of the toolkit by applying it to 1.3 million Python files drawn from GitHub, 2,300 Python modules, and 47 million forum posts. This results in an integrated code graph with over 2 billion triples. We make the toolkit to build such graphs as well as the sample extraction of the 2 billion triples graph publicly available to the community for use.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}