会话数据集的语义分析:利用语义关系提高其质量

Int. J. Semantic Comput. Pub Date : 2020-09-01 DOI:10.1142/s1793351x2050004x

Maria Krommyda, Verena Kantere

{"title":"会话数据集的语义分析:利用语义关系提高其质量","authors":"Maria Krommyda, Verena Kantere","doi":"10.1142/s1793351x2050004x","DOIUrl":null,"url":null,"abstract":"As more and more datasets become available, their utilization in different applications increases in popularity. Their volume and production rate, however, means that their quality and content control is in most cases non-existing, resulting in many datasets that contain inaccurate information of low quality. Especially, in the field of conversational assistants, where the datasets come from many heterogeneous sources with no quality assurance, the problem is aggravated. We present here an integrated platform that creates task- and topic-specific conversational datasets to be used for training conversational agents. The platform explores available conversational datasets, extracts information based on semantic similarity and relatedness, and applies a weight-based score function to rank the information based on its value for the specific task and topic. The finalized dataset can then be used for the training of an automated conversational assistance over accurate data of high quality.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Semantic Analysis for Conversational Datasets: Improving Their Quality Using Semantic Relationships\",\"authors\":\"Maria Krommyda, Verena Kantere\",\"doi\":\"10.1142/s1793351x2050004x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As more and more datasets become available, their utilization in different applications increases in popularity. Their volume and production rate, however, means that their quality and content control is in most cases non-existing, resulting in many datasets that contain inaccurate information of low quality. Especially, in the field of conversational assistants, where the datasets come from many heterogeneous sources with no quality assurance, the problem is aggravated. We present here an integrated platform that creates task- and topic-specific conversational datasets to be used for training conversational agents. The platform explores available conversational datasets, extracts information based on semantic similarity and relatedness, and applies a weight-based score function to rank the information based on its value for the specific task and topic. The finalized dataset can then be used for the training of an automated conversational assistance over accurate data of high quality.\",\"PeriodicalId\":217956,\"journal\":{\"name\":\"Int. J. Semantic Comput.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Semantic Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1793351x2050004x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x2050004x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着越来越多的数据集变得可用，它们在不同应用程序中的使用率也越来越高。然而，它们的数量和生产速度意味着它们的质量和内容控制在大多数情况下不存在，导致许多数据集包含低质量的不准确信息。特别是在会话助手领域，数据集来自许多异构来源，没有质量保证，问题更加严重。我们在这里提出了一个集成平台，它创建特定于任务和主题的会话数据集，用于训练会话代理。该平台探索可用的会话数据集，根据语义相似性和相关性提取信息，并应用基于权重的评分函数根据其对特定任务和主题的价值对信息进行排名。最终的数据集可以用于训练高质量的准确数据上的自动会话辅助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic Analysis for Conversational Datasets: Improving Their Quality Using Semantic Relationships

As more and more datasets become available, their utilization in different applications increases in popularity. Their volume and production rate, however, means that their quality and content control is in most cases non-existing, resulting in many datasets that contain inaccurate information of low quality. Especially, in the field of conversational assistants, where the datasets come from many heterogeneous sources with no quality assurance, the problem is aggravated. We present here an integrated platform that creates task- and topic-specific conversational datasets to be used for training conversational agents. The platform explores available conversational datasets, extracts information based on semantic similarity and relatedness, and applies a weight-based score function to rank the information based on its value for the specific task and topic. The finalized dataset can then be used for the training of an automated conversational assistance over accurate data of high quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Semantic Comput.

自引率

0.00%

发文量