Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez
{"title":"On the use of trajectory data for tackling data scarcity","authors":"Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez","doi":"10.1016/j.is.2025.102523","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies have enabled the ubiquitous capturing of the location of moving objects. As a result, trajectory data are abundantly available and there is an increasing trend in analyzing them in the context of mobility data science. However, the abundant availability of trajectory data makes them compelling for other tasks too. In this paper, we propose the use of these data to tackle the data scarcity problem in data analysis by appropriately transforming them to extract relevant knowledge. The challenge lies not just in leveraging these abundant trajectory data, but in accurately deriving information from them that closely approximates the target variable of interest. Such knowledge can be used to generate or supplement the scarcely available datasets in a data analytics problem, thereby enhancing model learning. We showcase the feasibility of our approach in the domain of fishing where there is an abundance of trajectory data but a scarcity of detailed catch information. By using environmental data as explanatory variables, we build and compare models to predict fishing productivity using the actual catches from fishing reports and/or the inferred knowledge from the vessel’s trajectories. The results show that, mainly due to trajectory data being larger in volume than fishing data, models trained with the former obtain a precision 7.9% higher, despite the simplicity of the applied transformations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"130 ","pages":"Article 102523"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000080","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies have enabled the ubiquitous capturing of the location of moving objects. As a result, trajectory data are abundantly available and there is an increasing trend in analyzing them in the context of mobility data science. However, the abundant availability of trajectory data makes them compelling for other tasks too. In this paper, we propose the use of these data to tackle the data scarcity problem in data analysis by appropriately transforming them to extract relevant knowledge. The challenge lies not just in leveraging these abundant trajectory data, but in accurately deriving information from them that closely approximates the target variable of interest. Such knowledge can be used to generate or supplement the scarcely available datasets in a data analytics problem, thereby enhancing model learning. We showcase the feasibility of our approach in the domain of fishing where there is an abundance of trajectory data but a scarcity of detailed catch information. By using environmental data as explanatory variables, we build and compare models to predict fishing productivity using the actual catches from fishing reports and/or the inferred knowledge from the vessel’s trajectories. The results show that, mainly due to trajectory data being larger in volume than fishing data, models trained with the former obtain a precision 7.9% higher, despite the simplicity of the applied transformations.
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.