在实体和集群级别支持雇主名称规范化

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-13 DOI:10.1145/3097983.3098093

Qiaoling Liu, F. Javed, Vachik S. Dave, Ankita Joshi

{"title":"在实体和集群级别支持雇主名称规范化","authors":"Qiaoling Liu, F. Javed, Vachik S. Dave, Ankita Joshi","doi":"10.1145/3097983.3098093","DOIUrl":null,"url":null,"abstract":"In the recruitment domain, the employer name normalization task, which links employer names in job postings or resumes to entities in an employer knowledge base (KB), is important to many business applications. In previous work, we proposed the CompanyDepot system, which used machine learning techniques to address the problem. After applying it to several applications at CareerBuilder, we faced several new challenges: 1) how to avoid duplicate normalization results when the KB is noisy and contains many duplicate entities; 2) how to address the vocabulary gap between query names and entity names in the KB; and 3) how to use the context available in jobs and resumes to improve normalization quality. To address these challenges, in this paper we extend the previous CompanyDepot system to normalize employer names not only at entity level, but also at cluster level by mapping a query to a cluster in the KB that best matches the query. We also propose a new metric based on success rate and diversity reduction ratio for evaluating the cluster-level normalization. Moreover, we perform query expansion based on five data sources to address the vocabulary gap challenge and leverage the url context for the employer names in many jobs and resumes to improve normalization quality. We show that the proposed CompanyDepot-V2 system outperforms the previous CompanyDepot system and several other baseline systems over multiple real-world datasets. We also demonstrate the large improvement on normalization quality from entity-level to cluster-level normalization.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Supporting Employer Name Normalization at both Entity and Cluster Level\",\"authors\":\"Qiaoling Liu, F. Javed, Vachik S. Dave, Ankita Joshi\",\"doi\":\"10.1145/3097983.3098093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the recruitment domain, the employer name normalization task, which links employer names in job postings or resumes to entities in an employer knowledge base (KB), is important to many business applications. In previous work, we proposed the CompanyDepot system, which used machine learning techniques to address the problem. After applying it to several applications at CareerBuilder, we faced several new challenges: 1) how to avoid duplicate normalization results when the KB is noisy and contains many duplicate entities; 2) how to address the vocabulary gap between query names and entity names in the KB; and 3) how to use the context available in jobs and resumes to improve normalization quality. To address these challenges, in this paper we extend the previous CompanyDepot system to normalize employer names not only at entity level, but also at cluster level by mapping a query to a cluster in the KB that best matches the query. We also propose a new metric based on success rate and diversity reduction ratio for evaluating the cluster-level normalization. Moreover, we perform query expansion based on five data sources to address the vocabulary gap challenge and leverage the url context for the employer names in many jobs and resumes to improve normalization quality. We show that the proposed CompanyDepot-V2 system outperforms the previous CompanyDepot system and several other baseline systems over multiple real-world datasets. We also demonstrate the large improvement on normalization quality from entity-level to cluster-level normalization.\",\"PeriodicalId\":314049,\"journal\":{\"name\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3097983.3098093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3097983.3098093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在招聘领域，雇主名称规范化任务(将招聘启事或简历中的雇主名称链接到雇主知识库中的实体)对许多业务应用程序都很重要。在之前的工作中，我们提出了CompanyDepot系统，它使用机器学习技术来解决这个问题。在将其应用于CareerBuilder的几个应用程序后，我们面临着几个新的挑战:1)当知识库有噪声并且包含许多重复的实体时，如何避免重复的规范化结果;2)如何解决知识库中查询名称和实体名称之间的词汇缺口;3)如何利用工作和简历中可用的上下文来提高规范化质量。为了应对这些挑战，在本文中，我们扩展了以前的CompanyDepot系统，通过将查询映射到最匹配查询的知识库中的集群，不仅在实体级别规范化雇主名称，而且在集群级别规范化雇主名称。我们还提出了一种基于成功率和多样性减少率的新度量来评估聚类水平归一化。此外，我们基于五个数据源执行查询扩展，以解决词汇缺口挑战，并利用许多工作和简历中雇主名称的url上下文来提高规范化质量。我们表明，在多个真实数据集上，建议的CompanyDepot- v2系统优于以前的CompanyDepot系统和其他几个基线系统。我们还演示了从实体级规范化到集群级规范化在规范化质量上的巨大改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Supporting Employer Name Normalization at both Entity and Cluster Level

In the recruitment domain, the employer name normalization task, which links employer names in job postings or resumes to entities in an employer knowledge base (KB), is important to many business applications. In previous work, we proposed the CompanyDepot system, which used machine learning techniques to address the problem. After applying it to several applications at CareerBuilder, we faced several new challenges: 1) how to avoid duplicate normalization results when the KB is noisy and contains many duplicate entities; 2) how to address the vocabulary gap between query names and entity names in the KB; and 3) how to use the context available in jobs and resumes to improve normalization quality. To address these challenges, in this paper we extend the previous CompanyDepot system to normalize employer names not only at entity level, but also at cluster level by mapping a query to a cluster in the KB that best matches the query. We also propose a new metric based on success rate and diversity reduction ratio for evaluating the cluster-level normalization. Moreover, we perform query expansion based on five data sources to address the vocabulary gap challenge and leverage the url context for the employer names in many jobs and resumes to improve normalization quality. We show that the proposed CompanyDepot-V2 system outperforms the previous CompanyDepot system and several other baseline systems over multiple real-world datasets. We also demonstrate the large improvement on normalization quality from entity-level to cluster-level normalization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量