{"title":"A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories","authors":"Haitao Zhang, Che Yu, Yan Jin","doi":"10.4018/ijdwm.2020070101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070101","url":null,"abstract":"Trajectoryisasignificantfactorforclassifyingfunctionsofspatialregions.Manyspatialclassification methods use trajectories to detect buildings and districts in urban settings. However, methods thatonly take intoconsideration the localspatiotemporalcharacteristics indicatedby trajectories maygenerateinaccurateresults.Inthisarticle,anovelmethodforclassifyingfunctionofspatial regionsbasedontwosetsofcharacteristicsindicatedbytrajectoriesisproposed,inwhichthelocal spatiotemporalcharacteristicsaswellastheglobalconnectioncharacteristicsareobtainedthrough twosetsofcalculations.Themethodwasevaluatedintwoexperiments:onethatmeasuredchanges in theclassificationmetric throughasplits ratiofactor,andone thatcompared theclassification performancebetweentheproposedmethodandmethodsbasedonasinglesetofcharacteristics.The resultsshowedthattheproposedmethodismoreaccuratethanthetwotraditionalmethods,witha precisionvalueof0.93,arecallvalueof0.77,andanF-Measurevalueof0.84. KeyWoRDS Function of Spatial Regions, Global Connection Characteristics, Local Spatiotemporal Characteristics, Spatial Classification, Trajectory","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"128 1","pages":"1-19"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77056513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem","authors":"D. Devi, S. Namasudra, Seifedine Kadry","doi":"10.4018/ijdwm.2020070104","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070104","url":null,"abstract":"The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"15 1","pages":"60-86"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81780316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar
{"title":"Recommender Systems Using Collaborative Tagging","authors":"Latha Banda, Karan Singh, Le Hoang Son, Mohamed Abdel-Basset, Pham Huy Thong, H. Huynh, D. Taniar","doi":"10.4018/ijdwm.2020070110","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070110","url":null,"abstract":"Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"7 1","pages":"183-200"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87852321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval","authors":"Na Deng, Caiquan Xiong","doi":"10.4018/ijdwm.2020070105","DOIUrl":"https://doi.org/10.4018/ijdwm.2020070105","url":null,"abstract":"IntheretrievalandminingoftraditionalChinesemedicine(TCM)patents,akeystepisChineseword segmentationandnamedentityrecognition.However,thealiasphenomenonoftraditionalChinese medicinescausesgreatchallengestoChinesewordsegmentationandnamedentityrecognitioninTCM patents,whichdirectlyaffectstheeffectofpatentmining.Becauseofthelackofacomprehensive Chineseherbalmedicinenamethesaurus,traditionalthesaurus-basedChinesewordsegmentation andnamedentityrecognitionarenotsuitableformedicineidentificationinTCMpatents.Inviewof thepresentsituation,usingthelanguagecharacteristicsandstructuralcharacteristicsofTCMpatent texts,amodifiedandserializedco-trainingmethodtorecognizemedicinenamesfromTCMpatent abstract texts isproposed.Experimentsshowthat thismethodcanmaintainhighaccuracyunder relativelylowtimecomplexity.Inaddition,thismethodcanalsobeexpandedtotherecognitionof othernamedentitiesinTCMpatents,suchasdiseasenames,preparationmethods,andsoon. KeyWoRDS Annotation, Co-Training, Machine Learning, Medicine Name, Patent Mining, Patent Retrieval, Traditional Chinese Medicine","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"42 1","pages":"87-107"},"PeriodicalIF":1.2,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73526987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Mining in Programs: Clustering Programs Based on Structure Metrics and Execution Values","authors":"Tiantian Wang, Kechao Wang, Xiaohong Su, Lin Liu","doi":"10.4018/ijdwm.2020040104","DOIUrl":"https://doi.org/10.4018/ijdwm.2020040104","url":null,"abstract":"Software exists in various control systems, such as security-critical systems and so on. Existing program clustering methods are limited in identifying functional equivalent programs with different syntactic representations. To solve this problem, firstly, a clustering method based on structured metric vectors was proposed to quickly identify structurally similar programs from a large number of existing programs. Next, a clustering method based on similar execution value sequences was proposed, to accurately identify the functional equivalent programs with code variations. This approach has been applied in automatic program repair, to identify sample programs from a large pool of template programs. The average purity value is 0.95576 and the average entropy is 0.15497. This means that the clustering partition is consistent with the expected partition.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"73 1","pages":"48-63"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84572447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collective Entity Disambiguation Based on Hierarchical Semantic Similarity","authors":"Bingjing Jia, Hu Yang, Bin Wu, Ying Xing","doi":"10.4018/ijdwm.2020040101","DOIUrl":"https://doi.org/10.4018/ijdwm.2020040101","url":null,"abstract":"Entity disambiguation involves mapping mentions in texts to the corresponding entities in a given knowledge base. Most previous approaches were based on handcrafted features and failed to capture semantic information over multiple granularities. For accurately disambiguating entities, various information aspects of mentions and entities should be used in. This article proposes a hierarchical semantic similarity model to find important clues related to mentions and entities based on multiple sources of information, such as contexts of the mentions, entity descriptions and categories. This model can effectively measure the semantic matching between mentions and target entities. Global features are also added, including prior popularity and global coherence, to improve the performance. In order to verify the effect of hierarchical semantic similarity model combined with global features, named HSSMGF, experiments were carried out on five publicly available benchmark datasets. Results demonstrate the proposed method is very effective in the case that documents have more mentions.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"65 1","pages":"1-17"},"PeriodicalIF":1.2,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91002662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}