Proceedings of the 11th on Knowledge Capture Conference最新文献

筛选

英文中文

TNNT: The Named Entity Recognition Toolkit TNNT:命名实体识别工具包

Proceedings of the 11th on Knowledge Capture Conference Pub Date : 2021-08-31 DOI: 10.1145/3460210.3493550

Sandaru Seneviratne, Sergio J. Rodr'iguez M'endez, Xuecheng Zhang, Pouya Ghiasnezhad Omran, K. Taylor, A. Haller

{"title":"TNNT: The Named Entity Recognition Toolkit","authors":"Sandaru Seneviratne, Sergio J. Rodr'iguez M'endez, Xuecheng Zhang, Pouya Ghiasnezhad Omran, K. Taylor, A. Haller","doi":"10.1145/3460210.3493550","DOIUrl":"https://doi.org/10.1145/3460210.3493550","url":null,"abstract":"Extraction of categorised named entities from text is a complex task given the availability of a variety of Named Entity Recognition (NER) models and the unstructured information encoded in different source document formats. Processing the documents to extract text, identifying suitable NER models for a task, and obtaining statistical information is important in data analysis to make informed decisions. This paper presentsfootnoteThe manuscript follows guidelines to showcase a demonstration that introduces an overview of how the toolkit works: input document set, initial settings, processing, and output set. The input document set is artificial in order to show various toolkit capabilities. TNNT, a toolkit that automates the extraction of categorised named entities from unstructured information encoded in source documents, using diverse state-of-the-art (SOTA) Natural Language Processing (NLP) tools and NER models.TNNT integrates 21 different NER models as part of a Knowledge Graph Construction Pipeline (KGCP) that takes a document set as input and processes it based on the defined settings, applying the selected blocks of NER models to output the results. The toolkit generates all results with an integrated summary of the extracted entities, enabling enhanced data analysis to support the KGCP, and also, to aid further NLP tasks.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132041438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Extraction of Common Conceptual Components from Multiple Ontologies 从多个本体中提取公共概念组件

Proceedings of the 11th on Knowledge Capture Conference Pub Date : 2021-06-24 DOI: 10.1145/3460210.3493542

Luigi Asprino, Valentina Anita Carriero, V. Presutti

引用次数: 2

A Toolkit for Generating Code Knowledge Graphs 生成代码知识图的工具包

Proceedings of the 11th on Knowledge Capture Conference Pub Date : 2020-02-21 DOI: 10.1145/3460210.3493578

I. Abdelaziz, Julian T Dolby, Jamie McCusker, Kavitha Srinivas

{"title":"A Toolkit for Generating Code Knowledge Graphs","authors":"I. Abdelaziz, Julian T Dolby, Jamie McCusker, Kavitha Srinivas","doi":"10.1145/3460210.3493578","DOIUrl":"https://doi.org/10.1145/3460210.3493578","url":null,"abstract":"Knowledge graphs have been proven extremely useful in powering diverse applications in semantic search and natural language understanding. In this work, we present GraphGen4Code, a toolkit to build code knowledge graphs that can similarly power various applications such as program search, code understanding, bug detection, and code automation. GraphGen4Code uses generic techniques to capture code semantics with the key nodes in the graph representing classes, functions and methods. Edges indicate function usage (e.g., how data flows through function calls, as derived from program analysis of real code), and documentation about functions (e.g., code documentation, usage documentation, or forum discussions such as StackOverflow). Our toolkit uses named graphs in RDF to model graphs per program, or can output graphs as JSON. We show the scalability of the toolkit by applying it to 1.3 million Python files drawn from GitHub, 2,300 Python modules, and 47 million forum posts. This results in an integrated code graph with over 2 billion triples. We make the toolkit to build such graphs as well as the sample extraction of the 2 billion triples graph publicly available to the community for use.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

首页上一页