SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges. 法学硕士时代的作者归属：问题、方法和挑战。

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2024-12-01 DOI: 10.1145/3715073.3715076

Baixiang Huang, Canyu Chen, Kai Shu

{"title":"Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges.","authors":"Baixiang Huang, Canyu Chen, Kai Shu","doi":"10.1145/3715073.3715076","DOIUrl":"10.1145/3715073.3715076","url":null,"abstract":"<p><p>Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have blurred the lines between human and machine authorship, posing significant challenges for traditional methods. We present a comprehensive literature review that examines the latest research on authorship attribution in the era of LLMs. This survey systematically explores the landscape of this field by categorizing four representative problems: (1) Human-written Text Attribution; (2) LLM-generated Text Detection; (3) LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution. We also discuss the challenges related to ensuring the generalization and explainability of authorship attribution methods. Generalization requires the ability to generalize across various domains, while explainability emphasizes providing transparent and understandable insights into the decisions made by these models. By evaluating the strengths and limitations of existing methods and benchmarks, we identify key open problems and future research directions in this field. This literature review serves a roadmap for researchers and practitioners interested in understanding the state of the art in this rapidly evolving field. Additional resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io/.</p>","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"26 2","pages":"21-43"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12019761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144055709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning for streaming data: state of the art, challenges, and opportunities 流数据的机器学习:现状、挑战和机遇

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373470

Heitor Murilo Gomes, Jesse Read, A. Bifet, J. P. Barddal, João Gama

引用次数: 146

Tracking and analyzing dynamics of news-cycles during global pandemics: a historical perspective 跟踪和分析全球大流行期间新闻周期的动态:历史视角

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373476

Sorour E. Amiri, Anika Tabassum, E. Ewing, B. Prakash

{"title":"Tracking and analyzing dynamics of news-cycles during global pandemics: a historical perspective","authors":"Sorour E. Amiri, Anika Tabassum, E. Ewing, B. Prakash","doi":"10.1145/3373464.3373476","DOIUrl":"https://doi.org/10.1145/3373464.3373476","url":null,"abstract":"How does the tone of reporting during a disease outbreak change in relation to the number of cases, categories of victims, and accumulating deaths? How do newspapers and medical journals contribute to the narrative of a historical pandemic? Can data mining experts help history scholars to scale up the process of examining articles, extracting new insights and understanding the public opinion of a pandemic? We explore these problems in this paper, using the 19thcentury Russian Flu epidemic as an example. We study two different types of historical data sources: the US medical discussion and popular reporting during the epidemic, from its outbreak in late 1889 through the successive waves that lasted through 1893. We analyze and compare these articles and reports to answer three major questions. First, we analyze how newspapers and medical journals report the Russian flu and describe the situation. Next, we help historians in understanding the tone of related reports and how they vary across data sources. We also examine the temporal changes in the discussion to get an in-depth understanding of how public opinion changed about the pandemic. Finally, we aggregate all of the algorithms in an easy to use framework GrippeStory to help history scholars investigate historical pandemic data in general, across chronological periods and locations. Our extensive experiments and analysis on a large number of historical articles show that GrippeStory gives meaningful and useful results for historians and it outperforms the baselines.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"31 1","pages":"91-100"},"PeriodicalIF":0.0,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81905969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Interview with Dr. Balaji Krishnapuram, Winner of SIGKDD Service Award 采访Balaji Krishnapuram博士，SIGKDD服务奖得主

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373466

Balaji Krishnapuram

引用次数: 0

Misinformation in Social Media: Definition, Manipulation, and Detection 社交媒体中的错误信息:定义、操纵和检测

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373475

Liang Wu, Fred Morstatter, Kathleen M. Carley, Huan Liu

{"title":"Misinformation in Social Media: Definition, Manipulation, and Detection","authors":"Liang Wu, Fred Morstatter, Kathleen M. Carley, Huan Liu","doi":"10.1145/3373464.3373475","DOIUrl":"https://doi.org/10.1145/3373464.3373475","url":null,"abstract":"The widespread dissemination of misinformation in social media has recently received a lot of attention in academia. While the problem of misinformation in social media has been intensively studied, there are seemingly different definitions for the same problem, and inconsistent results in different studies. In this survey, we aim to consolidate the observations, and investigate how an optimal method can be selected given specific conditions and contexts. To this end, we first introduce a definition for misinformation in social media and we examine the difference between misinformation detection and classic supervised learning. Second, we describe the diffusion of misinformation and introduce how spreaders propagate misinformation in social networks. Third, we explain characteristics of individual methods of misinformation detection, and provide commentary on their advantages and pitfalls. By reflecting applicability of different methods, we hope to enable the intensive research in this area to be conveniently reused in real-world applications and open up potential directions for future studies.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"20 1","pages":"80-90"},"PeriodicalIF":0.0,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84537320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 218

Solve for Good: A Data Science for Social Good Marketplace 为善解决:社会公益市场的数据科学

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373468

R. Ghani, Lisa Green, Alberto Bengoa, Mohak Shah

{"title":"Solve for Good: A Data Science for Social Good Marketplace","authors":"R. Ghani, Lisa Green, Alberto Bengoa, Mohak Shah","doi":"10.1145/3373464.3373468","DOIUrl":"https://doi.org/10.1145/3373464.3373468","url":null,"abstract":"Solve for Good is a platform for social good organizations to pose their problems that need data intensive help, and for volunteers to help solve those problems. Once the projects are submitted by the organization, they go through a scoping process (done by scoping volunteers and guided by our Data Science Scoping Process). Once a project scope is finalized, it becomes available for data science volunteers to start working on. The finished work is reviewed by a QA team consisting of volunteers and staff of the organization that submitted the project.\u0000 Solve for Good comes out of our experience working with government agencies, non-profits, universities, volunteers, professionals, students, and the private sector over the last several years. We repeatedly get contacted by governments, non-profits, and other social good organizations asking for help with data projects. We also have smart, passionate individuals who contact us offering their help, often in a volunteer capacity, on weekends, evenings, or for a few days or weeks. Solve for Good is our attempt at linking these two. We are just starting out, and looking for help in doing this better, and getting feedback from you. Join us at http://www.solveforgood.org as a volunteer to help solve problems, as an organization to submit problems, as partners to help us expand the platform and provide resources to run it, and as corporations or foundations to loan volunteers and donate resources used in solving these problems.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"3-5"},"PeriodicalIF":0.0,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82817908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Multi-Label Topic Models 多标签主题模型综述

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373474

Sophie Burkhardt, S. Kramer

引用次数: 10

The Holy Grail of: Teaming humans and machine learning for detecting cyber threats 终极目标:将人类和机器学习结合起来，检测网络威胁

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373472

Ignacio Arnaldo, K. Veeramachaneni

引用次数: 4

Gene Expression and Protein Function: A Survey of Deep Learning Methods 基因表达和蛋白质功能:深度学习方法综述

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-11-26 DOI: 10.1145/3373464.3373471

Saket K. Sathe, Sayani Aggarwal, Jiliang Tang

{"title":"Gene Expression and Protein Function: A Survey of Deep Learning Methods","authors":"Saket K. Sathe, Sayani Aggarwal, Jiliang Tang","doi":"10.1145/3373464.3373471","DOIUrl":"https://doi.org/10.1145/3373464.3373471","url":null,"abstract":"Deep learning methods have found increasing interest in recent years because of their wide applicability for prediction and inference in numerous disciplines such as image recognition, natural language processing, and speech recognition. Computational biology is a data-intensive field in which the types of data can be very diverse. These different types of structured data require different neural architectures. The problems of gene expression and protein function prediction are related areas in computational biology (since genes control the production of proteins). This survey provides an overview of the various types of problems in this domain and the neural architectures that work for these data sets. Since deep learning is a new field compared to traditional machine learning, much of the work in this area corresponds to traditional machine learning rather than deep learning. However, as the sizes of protein and gene expression data sets continue to grow, the possibility of using data-hungry deep learning methods continues to increase. Indeed, the previous five years have seen a sudden increase in deep learning models, although some areas of protein analytics and gene expression still remain relatively unexplored. Therefore, aside from the survey on the deep learning work directly related to these problems, we also point out existing deep learning work from other domains that has the potential to be applied to these domains.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"4 1","pages":"23-38"},"PeriodicalIF":0.0,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87412096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Top Challenges from the first Practical Online Controlled Experiments Summit 第一届实用在线控制实验峰会的主要挑战

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2019-05-13 DOI: 10.1145/3331651.3331655

Somit Gupta, Ron Kohavi, Diane Tang, Ya Xu

{"title":"Top Challenges from the first Practical Online Controlled Experiments Summit","authors":"Somit Gupta, Ron Kohavi, Diane Tang, Ya Xu","doi":"10.1145/3331651.3331655","DOIUrl":"https://doi.org/10.1145/3331651.3331655","url":null,"abstract":"Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale and encourage further academic and industrial exploration, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations have tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018.\u0000 While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"43 3 1","pages":"20-35"},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72987057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 101