On the use of trajectory data for tackling data scarcity

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-01-13 DOI:10.1016/j.is.2025.102523

Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez

{"title":"On the use of trajectory data for tackling data scarcity","authors":"Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez","doi":"10.1016/j.is.2025.102523","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies have enabled the ubiquitous capturing of the location of moving objects. As a result, trajectory data are abundantly available and there is an increasing trend in analyzing them in the context of mobility data science. However, the abundant availability of trajectory data makes them compelling for other tasks too. In this paper, we propose the use of these data to tackle the data scarcity problem in data analysis by appropriately transforming them to extract relevant knowledge. The challenge lies not just in leveraging these abundant trajectory data, but in accurately deriving information from them that closely approximates the target variable of interest. Such knowledge can be used to generate or supplement the scarcely available datasets in a data analytics problem, thereby enhancing model learning. We showcase the feasibility of our approach in the domain of fishing where there is an abundance of trajectory data but a scarcity of detailed catch information. By using environmental data as explanatory variables, we build and compare models to predict fishing productivity using the actual catches from fishing reports and/or the inferred knowledge from the vessel’s trajectories. The results show that, mainly due to trajectory data being larger in volume than fishing data, models trained with the former obtain a precision 7.9% higher, despite the simplicity of the applied transformations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"130 ","pages":"Article 102523"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000080","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies have enabled the ubiquitous capturing of the location of moving objects. As a result, trajectory data are abundantly available and there is an increasing trend in analyzing them in the context of mobility data science. However, the abundant availability of trajectory data makes them compelling for other tasks too. In this paper, we propose the use of these data to tackle the data scarcity problem in data analysis by appropriately transforming them to extract relevant knowledge. The challenge lies not just in leveraging these abundant trajectory data, but in accurately deriving information from them that closely approximates the target variable of interest. Such knowledge can be used to generate or supplement the scarcely available datasets in a data analytics problem, thereby enhancing model learning. We showcase the feasibility of our approach in the domain of fishing where there is an abundance of trajectory data but a scarcity of detailed catch information. By using environmental data as explanatory variables, we build and compare models to predict fishing productivity using the actual catches from fishing reports and/or the inferred knowledge from the vessel’s trajectories. The results show that, mainly due to trajectory data being larger in volume than fishing data, models trained with the former obtain a precision 7.9% higher, despite the simplicity of the applied transformations.

查看原文本刊更多论文

利用轨迹数据解决数据稀缺问题

近年来，配备gps的移动设备的可用性和其他便宜的位置跟踪技术使得无处不在的移动物体的位置捕获成为可能。因此，轨迹数据非常丰富，在移动数据科学的背景下，对轨迹数据的分析有越来越大的趋势。然而，轨迹数据的丰富可用性也使它们对其他任务具有吸引力。在本文中，我们建议利用这些数据，通过适当的转换来提取相关知识，以解决数据分析中的数据稀缺性问题。挑战不仅在于利用这些丰富的轨迹数据，还在于从这些数据中准确地提取出与感兴趣的目标变量接近的信息。这些知识可以用来生成或补充数据分析问题中很少可用的数据集，从而增强模型学习。我们展示了我们的方法在渔业领域的可行性，那里有丰富的轨迹数据，但缺乏详细的捕捞信息。通过使用环境数据作为解释变量，我们建立并比较模型，利用捕鱼报告中的实际捕获量和/或从船舶轨迹推断的知识来预测渔业生产力。结果表明，主要由于轨迹数据比钓鱼数据体积更大，尽管应用转换简单，但前者训练的模型精度高出7.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.