Ruifang He , Fei Huang , Jinsong Ma , Jinpeng Zhang , Yongkai Zhu , Shiqi Zhang , Jie Bai
{"title":"Few-shot cross domain event discovery in narrative text","authors":"Ruifang He , Fei Huang , Jinsong Ma , Jinpeng Zhang , Yongkai Zhu , Shiqi Zhang , Jie Bai","doi":"10.1016/j.ipm.2024.103901","DOIUrl":"10.1016/j.ipm.2024.103901","url":null,"abstract":"<div><div>Cross-domain event detection presents notable challenges in the form of data scarcity, and existing few-shot algorithms only consider events whose types are predefined, resulting in low coverage or excessive trivial identification results. To address this issue, this paper proposes the task <em>Few-shot Cross Domain Event Discovery</em>, which includes two subtasks: <em>Domain Event Discovery</em> and <em>Few-shot Domain Adaptation</em>. The former aims to identify the <em>type-agnostic event triggers</em>, and the latter completes domain adaptation with only a few annotated domain samples. Additionally, we introduce a positive–negative balanced sampling mechanism and a novel domain parameter adapter for these two subtasks, respectively. Extensive experiments on the DuEE dataset and the ACE2005 dataset show that our proposed method outperforms the current state-of-the-art method by 6.3% in Mix-F1 score on average. Moreover, we achieve SOTA performance in all domains of the DuEE dataset.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142437744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaorui Ma , Xinhao Hu , Fenlin Liu , Xiangyang Luo , Shicheng Zhang , Wenxin Tai , Guoming Ren , Zheng Er , Mingming Xu
{"title":"Landmark-v6: A stable IPv6 landmark representation method based on multi-feature clustering","authors":"Zhaorui Ma , Xinhao Hu , Fenlin Liu , Xiangyang Luo , Shicheng Zhang , Wenxin Tai , Guoming Ren , Zheng Er , Mingming Xu","doi":"10.1016/j.ipm.2024.103921","DOIUrl":"10.1016/j.ipm.2024.103921","url":null,"abstract":"<div><div>Highly reliable network entity landmarks are crucial for applications like geolocation-aware personalized service recommendations, traceability, and fraud detection. Traditionally, landmark acquisition methods have relied on data mining of rules or network behaviours to establish mappings between IP addresses and geolocation information. However, IPv6 address allocation policies, due to their dynamics and multi-homing phenomenon, pose a risk of IPv6 address deactivation for traditional IPv6 landmarks. To address the issues of reduced numbers and instability in traditional IPv6 landmarks, we propose a novel IPv6 landmark representation method, “landmark-v6”, which is grounded in multi-feature clustering. Firstly, IPv6 addresses are filtered based on multiple attributes derived from network entity fingerprints and routing features. Subsequently, a set of IPv6 addresses is associated with another set through multi-feature clustering. Second, the fine-grained IPv6 addresses are further refined by clustering based on precise physical spatial geolocation information, resulting in candidate landmarks that consist of IPv6 prefixes and geolocation data. Finally, the reliability of these landmarks is determined and evaluated using the voting resolution mechanism in the Candidate Landmark Evaluation task. Our experimental evaluation, spanning 10 months and conducted in three real-world areas, Zhengzhou, Hong Kong, and Shanghai, demonstrates the effectiveness of landmark-v6. Specifically, landmark-v6 obtains 933, 746, and 859 IPv6 prefix landmarks in Zhengzhou, Hong Kong, and Shanghai, respectively. These results surpass those obtained with existing rule or network behaviour-based methods such as Structon, SVMM, and SLG. Landmark-v6 offers a more robust and accurate approach to acquiring IPv6 landmarks, making it well-suited for various applications that necessitate reliable geolocation information. It effectively tackles the challenges posed by the dynamic nature of IPv6 addresses, enhancing both the stability.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142437743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanghua Liu , Jia Zhang , Peng Lv , Chenlong Wang , Huan Wang , Di Wang
{"title":"TAAD: Time-varying adversarial anomaly detection in dynamic graphs","authors":"Guanghua Liu , Jia Zhang , Peng Lv , Chenlong Wang , Huan Wang , Di Wang","doi":"10.1016/j.ipm.2024.103912","DOIUrl":"10.1016/j.ipm.2024.103912","url":null,"abstract":"<div><div>The timely detection of anomalous nodes that can cause significant harm is essential in real-world networks. One challenge for anomaly detection in dynamic graphs is the identification of abnormal nodes at newly emerged moments. Unfortunately, existing methods tend to learn nontransferable features from historical moments that do not generalize well to newly emerged moments. In response to this challenge, we propose Time-varying Adversarial Anomaly Detection (TAAD), a generalizable model to learn transferable features from historical moments, which can transfer prior anomaly knowledge to newly emerged moments. It comprises four components: the feature extractor, the anomaly detector, the time-varying discriminator and the score generator. The time-varying discriminator cooperates with the feature extractor to conduct adversarial training, which decreases the distributional differences in the feature representations of nodes between historical and newly emerged moments to learn transferable features. The score generator measures the distributional differences of feature representations between normal and abnormal nodes, and further learns discriminable features. Extensive experiments conducted with four different datasets present that the proposed TAAD outperforms state-of-the-art methods.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Idoia Gamiz , Cristina Regueiro , Eduardo Jacob , Oscar Lage , Marivi Higuero
{"title":"PRoT-FL: A privacy-preserving and robust Training Manager for Federated Learning","authors":"Idoia Gamiz , Cristina Regueiro , Eduardo Jacob , Oscar Lage , Marivi Higuero","doi":"10.1016/j.ipm.2024.103929","DOIUrl":"10.1016/j.ipm.2024.103929","url":null,"abstract":"<div><div>Federated Learning emerged as a promising solution to enable collaborative training between organizations while avoiding centralization. However, it remains vulnerable to privacy breaches and attacks that compromise model robustness, such as data and model poisoning. This work presents PRoT-FL, a privacy-preserving and robust Training Manager capable of coordinating different training sessions at the same time. PRoT-FL conducts each training session through a Federated Learning scheme that is resistant to privacy attacks while ensuring robustness. To do so, the model exchange is conducted by a “Private Training Protocol” through secure channels and the protocol is combined with a public blockchain network to provide auditability, integrity and transparency. The original contribution of this work includes: (i) the proposal of a “Private Training Protocol” that breaks the link between a model and its generator, (ii) the integration of this protocol into a complete system, PRoT-FL, which acts as an orchestrator and manages multiple trainings and (iii) a privacy, robustness and performance evaluation. The theoretical analysis shows that PRoT-FL is suitable for a wide range of scenarios, being capable of dealing with multiple privacy attacks while maintaining a flexible selection of methods against attacks that compromise robustness. The experimental results are conducted using three benchmark datasets and compared with traditional Federated Learning using different robust aggregation rules. The results show that those rules still apply to PRoT-FL and that the accuracy of the final model is not degraded while maintaining data privacy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mechanism of online public opinion formation in major risk events in China: A qualitative comparative analysis","authors":"Bingqin Han , Shuang Song , Diyi Liu , Jiapei Mo","doi":"10.1016/j.ipm.2024.103924","DOIUrl":"10.1016/j.ipm.2024.103924","url":null,"abstract":"<div><div>Understanding societal attitudes toward major risk events after they occur poses a significant challenge for governments. This study employs fuzzy set qualitative comparative analysis (fsQCA) to examine 88 cases of major risk events in China from 2019 to 2023, categorized into natural disasters, accidents, social security threats, and public health crises. We propose an integrated theoretical framework combining information ecology theory and social mentality theory, aiming to uncover the driving pathways that shape positive and negative online social mentalities during these events. The findings reveal the following insights: (1) Media and government significantly influence online community attitudes during major natural disasters. (2) In major accidents, the social environment predominantly shapes stable aspects of societal mentality, yet media and government also adapt dynamically, influencing online societal attitudes accordingly. (3) Major social security events exhibit a diverse trajectory in online social mentality, underscoring the intricate factors affecting public sentiment. The study emphasizes the role of free agents in generating negative online attitudes. (4) During major public health crises, the scale of the event and media coverage exert considerable influence, with media responsiveness varying with shifts in event magnitude. Furthermore, coordinated ecological factors influence the trajectory of online societal attitude changes. These findings offer valuable insights and strategies for managing public opinion during significant risk events.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Grivas , Claire Grover , Richard Tobin , Clare Llewellyn , Eleojo Oluwaseun Abubakar , Chunyu Zheng , Chris Dibben , Alan Marshall , Jamie Pearce , Beatrice Alex
{"title":"Perceptions of Edinburgh: Capturing neighbourhood characteristics by clustering geoparsed local news","authors":"Andreas Grivas , Claire Grover , Richard Tobin , Clare Llewellyn , Eleojo Oluwaseun Abubakar , Chunyu Zheng , Chris Dibben , Alan Marshall , Jamie Pearce , Beatrice Alex","doi":"10.1016/j.ipm.2024.103910","DOIUrl":"10.1016/j.ipm.2024.103910","url":null,"abstract":"<div><div>The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being.</div><div>News media provides social and community information that may be useful in health studies. Here we propose a methodology for characterising neighbourhoods by using local news articles. More specifically, we show how we can use Natural Language Processing (NLP) to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles.</div><div>Our work is novel because we combine street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics. We evaluate our outputs and show via a confluence of evidence, both from a qualitative and a quantitative perspective, that the themes we extract from news articles are sensible and reflect many characteristics of the real world. This is significant because it allows us to better understand the effects of neighbourhoods on health. Our findings on neighbourhood characterisation using news data will support a new generation of place-based research which examines a wider set of spatial processes and how they affect health, enabling new epidemiological research.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Li, Jintao Tang, Pancheng Wang, Shasha Li, Ting Wang
{"title":"Maximizing discrimination masking for faithful question answering with machine reading","authors":"Dong Li, Jintao Tang, Pancheng Wang, Shasha Li, Ting Wang","doi":"10.1016/j.ipm.2024.103915","DOIUrl":"10.1016/j.ipm.2024.103915","url":null,"abstract":"<div><div>Despite recent advancements, like Large Language Models (LLMs), in Question Answering with Machine Reading (QAMR), improving the factuality and faithfulness of QAMR models remains a significant challenge. QAMR models require both language knowledge and world knowledge to answer questions. Language knowledge encompasses syntax, semantics, pragmatics, and other language-specific elements. The extent of language knowledge reflects the model’s language understanding capabilities. World knowledge, which refers to people’s cognition of the world, may be parameterized knowledge of the pre-trained language models or textual knowledge of passages. We conduct a comparative study on these two kinds of knowledge and find that language knowledge is stable, while only part of world knowledge is stable and reliable. This motivates us to utilize textual knowledge of passages and avoid parameterized unstable world knowledge of pre-trained language models for QAMR task. To this end, this paper introduces the concept of <em>Answerable without relying on unstable world knowledge external to the passage (AUKE) to determine whether a question can be answered without using parameterized unstable world knowledge of pre-trained language models</em>. We then define <em>evidence</em> as the simplest substring in the passage that supports AUKE. Based on <em>evidence</em>, we introduce a novel faithfulness metric for the QAMR task. We propose a methodology that combines automated processes with manual refinement to augment QAMR datasets with evidence annotations to facilitate faithfulness evaluations. We apply this method to the Chinese QAMR dataset CMRC 2018 and DRCD to extend two datasets that support evidence-based faithfulness evaluation, CMRCFF (CMRC with Faithfulness) and DRCDFF (CMRC with Faithfulness). To alleviate the potential factuality and faithfulness issues induced by unstable world knowledge, we propose a method called Maximizing Discrimination Masking (MDM), which masks the word with the highest degree of distinguishability. MDM is an approximation method designed to circumvent the reliance on parameterized unstable world knowledge embedded within pre-trained language models utilized by QAMR systems. We conduct experiments under the fine-tune setting and few-shot setting on CMRCFF and DRCDFF. The results verify that our MDM approach can effectively improve the factuality and faithfulness of the models.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Kun Tang , Heyan Huang , Xuewen Shi , Xian-Ling Mao
{"title":"Bridging insight gaps in topic dependency discovery with a knowledge-inspired topic model","authors":"Yi-Kun Tang , Heyan Huang , Xuewen Shi , Xian-Ling Mao","doi":"10.1016/j.ipm.2024.103911","DOIUrl":"10.1016/j.ipm.2024.103911","url":null,"abstract":"<div><div>Discovering intricate dependencies between topics in topic modeling is challenging due to the noisy and incomplete nature of real-world data and the inherent complexity of topic dependency relationships. In practice, certain basic dependency relationships have been manually annotated and can serve as valuable knowledge resources, enhancing the learning of topic dependencies. To this end, we propose a novel topic model, called Knowledge-Inspired Dependency-Aware Dirichlet Neural Topic Model (KDNTM). Specifically, we first propose Dependency-Aware Dirichlet Neural Topic Model (DepDirNTM), which can discover semantically coherent topics and complex dependencies between these topics from textual data. Then, we propose three methods to leverage accessible external dependency knowledge under the framework of DepDirNTM to enhance the discovery of topic dependencies. Extensive experiments on real-world corpora demonstrate that our models outperform 12 state-of-the-art baselines in terms of topic quality and multi-labeled text classification in most cases, achieving up to a 14% improvement in topic quality over the best baseline. Visualizations of the learned dependency relationships further highlight the benefits of integrating external knowledge, confirming its substantial impact on the effectiveness of topic modeling.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingfeng Zeng , Li Lin , Rui Jiang , Weiyu Huang , Dijia Lin
{"title":"NNEnsLeG: A novel approach for e-commerce payment fraud detection using ensemble learning and neural networks","authors":"Qingfeng Zeng , Li Lin , Rui Jiang , Weiyu Huang , Dijia Lin","doi":"10.1016/j.ipm.2024.103916","DOIUrl":"10.1016/j.ipm.2024.103916","url":null,"abstract":"<div><div>The proliferation of fraud in online shopping has accompanied the development of e-commerce, leading to substantial economic losses, and affecting consumer trust in online shopping. However, few studies have focused on fraud detection in e-commerce due to its diversity and dynamism. In this work, we conduct a feature set specifically for e-commerce payment fraud, around transactions, user behavior, and account relevance. We propose a novel comprehensive model called Neural Network Based Ensemble Learning with Generation (NNEnsLeG) for fraud detection. In this model, ensemble learning, data generation, and parameter-passing are designed to cope with extreme data imbalance, overfitting, and simulating the dynamics of fraud patterns. We evaluate the model performance in e-commerce payment fraud detection with >310,000 pieces of e-commerce account data. Then we verify the effectiveness of the model design and feature engineering through ablation experiments, and validate the generalization ability of the model in other payment fraud scenarios. The experimental results show that NNEnsLeG outperforms all the benchmarks and proves the effectiveness of generative data and parameter-passing design, presenting the practical application of the NNEnsLeG model in e-commerce payment fraud detection.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runzhe Zhang , Xiang Yu , Ben Zhang , Qinglan Ren , Yakun Ji
{"title":"Discovering technology opportunities of latecomers based on RGNN and patent data: The example of Huawei in self-driving vehicle industry","authors":"Runzhe Zhang , Xiang Yu , Ben Zhang , Qinglan Ren , Yakun Ji","doi":"10.1016/j.ipm.2024.103908","DOIUrl":"10.1016/j.ipm.2024.103908","url":null,"abstract":"<div><div>Emerging technologies provide competitive opportunities for latecomers to catch up with leading giants. As most of the extant literature indicated, types of single-dimensional relations from patent data have been revealed in technology opportunity discovery (TOD) research. Still, few have been aware of the more complex characteristics extracted from higher-dimensional patent information such as the patentee-technology relation. To derive this valuable relation for more robust results, this article introduces a novel TOD method, utilizing a recursive graph neural network (RGNN) to transform this high-dimensional information into measures of heterogeneity as internal capability, and combining it with external challenges evaluated by the competitiveness index, to identify technological opportunities. Taking the self-driving vehicle (SDV) industry with 33,347 patent families from 2010 to 2021 as the initial dataset, it shows significant performance promotions compared to previous analogous TOD models. Meanwhile, tested by recent filing patent data, the predicted opportunities are consistent with Huawei and other enterprises. Upon illuminating the intense technological competition situation among the preeminent SDV firms worldwide as a case exploration, this research contributes theoretical and practical views to the TOD research and network analysis.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}