{"title":"A Graph Theoretic Approach for the Identification of Objects Shape Taken from MPEG-7 Database","authors":"J. Pujari, J. Karur, K. Kale, V. Swamy","doi":"10.14257/IJDTA.2017.10.3.02","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.3.02","url":null,"abstract":"Objects never occur in isolation, instead, vary with other objects and in particular environment. In order to recognize the objects efficiently which are similar, there is a need for automating this problem. In this paper, we have proposed an approach to identify objects from MPEG-7 database consisting of 69 classes using graph theory. Graph parameters like graph eccentricity, graph diameter, graph radius and graph center values were used to form the feature vector. Back propagation neural network (BPNN) is used as a classifier. Features were reduced based on their performance in identification. Experimental results prove that an average identification accuracy of 91% is attained. The study is extended by combining other feature extraction techniques to train the neural network. This work finds its applications to train the robots in automobile industries to handle the objects.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"151 1","pages":"11-30"},"PeriodicalIF":0.0,"publicationDate":"2017-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77372620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining High-Utility Itemsets Based on Multiple Minimum Support and Multiple Minimum Utility Thresholds","authors":"Fazla Elahe, Kun Zhang","doi":"10.14257/IJDTA.2017.10.3.03","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.3.03","url":null,"abstract":"Mining high utility itemsets from a transactional database refer to the discovery of high utility itemsets that generate high profit and several approaches have been proposed for this task in recent years. Algorithms like HUIM-MMU and MHU-Growth overcome the limitation of using a single threshold for the whole database. However, they still generate a large number of candidate itemsets and thus it degrades the performance of the algorithms. In this paper, we address this issue by combining two different kinds of thresholds used by HUIM-MMU and MHU-Growth. By using these two thresholds we propose two algorithms namely HUIM-MMSU and HUIM-IMMSU. HUIM-MMSU is a candidate generation and retest based algorithm, which relies on sorted downward closure (SDC) property. On the other hand, HUIM-IMMSU uses a tree-like data structure. Experiment result shows that the proposed two algorithms can effectively discover high utility itemsets from the transactional database.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"64 1","pages":"31-44"},"PeriodicalIF":0.0,"publicationDate":"2017-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82844495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Application of TF-IDF with Time Factor in the Cluster of Micro-blog Theme","authors":"Song Yu, Yangchen Wang, Tianchi Mo, Mingyan Liu, Hui Liu, Zhifang Liao","doi":"10.14257/ijdta.2017.10.2.03","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.2.03","url":null,"abstract":"Time factor is of great significance for the topic clustering for Micro-blog. Usually, the topics discussed most frequently during a certain period may become the hot issues. Therefore, this article has successfully obtained the method of TF-IDF-TF by different division of periods and setting of different weights, then applied it to the ULPIR Microblog content corpus, with the hierarchical clustering method and k-means method being used to make statistic analysis. The result of the experiment shows that, compared with the traditional TF-IDF( term frequency- inverse document frequency ), the TF-IDF-TF method could provide more accurate clustering result, especially for specific topics during the period when users play most frequently.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"8 1","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78727732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Analysis of Various Similarity Measures for Finding Similarity of Two Documents","authors":"Maedeh Afzali, Suresh Kumar","doi":"10.14257/IJDTA.2017.10.2.02","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.2.02","url":null,"abstract":"Similarity measurements are elemental concepts in text mining and information retrieval that helps us to quantify the similarity between documents, which is effective in the improvement of the performance of search engines and browsing techniques. Nowadays, varieties of similarity measures are available, but it is not clear that which similarity measure is more effective in finding the similarity of text documents. The aim of this paper is to provide a comparative analysis of various term based similarity measures such as Cosine similarity, Jaccard and Dice coefficient in order to evaluate the performance of this similarity measures in finding the similarity of two text documents.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"51 1","pages":"23-30"},"PeriodicalIF":0.0,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80933381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Wang, Haijing Jiang, T. Zhou, Wei Ding, Chen Zhiyuan
{"title":"A Novel Event-centric Trend Detection Algorithm for Online Social Graph Analysis","authors":"Ling Wang, Haijing Jiang, T. Zhou, Wei Ding, Chen Zhiyuan","doi":"10.14257/IJDTA.2017.10.2.04","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.2.04","url":null,"abstract":"Nowadays, the identification of the most popular and important topics discussed over social networks, is became a vital societal concern. For real-time tracking the hot topics, we proposed a novel event-centric trend detection algorithm, which called Ec_TD algorithm to attempt to add event attributes into the structure of the social networks, then, mining the subgraphs induced by specific attributes which using correlation function measures the correlation of event-changing attributes based on the attribute-extended social network structure. Our experiment shows that Ec_TD algorithm is performed significantly better in real-time event detecting and mining the potential relationships between attributes and vertexes. Moreover, we used true big data to test this algorithm which has substantially reduced respond time, and to prove the feasible of the idea.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"33 1","pages":"41-50"},"PeriodicalIF":0.0,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81684601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Small Business Credit Scoring: Application of Multiple Hybrids Neural Network","authors":"Chi Guo-tai, Mohammad Zoynul Abedin, F. Moula","doi":"10.14257/IJDTA.2017.10.2.01","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.2.01","url":null,"abstract":"In recent years, hybrid models have proven to be a promising approach for the forecasting of credit status, therefore, the aim of this project is to examine the prediction performance of hybrid classifiers. Particularly, the combination of the feature engineering with popular neural network (NN) classifiers; an hybridization approach, is compared with hybrid classifier, NN classifiers, and three well-known baseline classifiers, i.e. stepwise discriminant analysis (SDA), stepwise logistic regression (SLR), and decision trees (DTs). Overall, we executed a 12+8+ (8×8) experimental design that resulted in 84 unique classification models; i.e., 12 baseline models, 8 NN models, and 64 hybrid models, a multiple hybrid; are examined over a large credit scoring dataset from a Chinese commercial bank. Besides, thirteen evaluation measures are used for the assessment task and this may be the first effort to link up multiple hybrid classifiers with multiple performance metrics for the evaluation of small business credit. The results reveal that the predictive and distinguish ability of the F ratio based SDA with multilayer perceptron based NN classifier (SDA FR +MLP), a hybrid model, outperforms both of the one–dimensional scoring models (baseline model and NN model) and its hybrid counterparts.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"15 1","pages":"1-22"},"PeriodicalIF":0.0,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73366564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Research Trends in Regional Innovation Using Text Mining","authors":"Ju Seop Park, Soongoo Hong, N. R. Kim, Bo Ra Kang","doi":"10.14257/IJDTA.2017.10.8.09","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.8.09","url":null,"abstract":"To aid local governments in solving various regional innovation issues related to regional development, trend analyses should first be conducted. In this study, 579 abstracts published in academic journals between year 2003 and year 2015 were analyzed to examine the research trends of topics related to regional innovation through a keyword frequency analysis and a social network analysis, both of which are text mining techniques. As a result of these analyses, the most frequent keyword that appeared through the clustering of participating entities was regional innovation system during the Roh Moo-Hyun administration. During the Lee Myung-Bak administration, the most frequent keyword obtained through the participation of local residents was regional innovation focused on overall business development, which continued through to the Park Geun-Hye administration. This study suggests a big data analysis method to derive the core problems related to regional innovation and may trigger follow-up research. Furthermore, the results of this study can be used as basic data for local governments and administrative agencies to establish regional innovation policies.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"18 1","pages":"91-98"},"PeriodicalIF":0.0,"publicationDate":"2017-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80129375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Teacher Workload Control Strategy Based on Conductive Knowledge Mining","authors":"Ye Guangzi, Chen Yuqiang, Liao Weihua","doi":"10.14257/IJDTA.2017.10.1.23","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.23","url":null,"abstract":"With the extension data mining technology, the conductive knowledge mining method is applied into the management of college teachers’ workload. Under the active transformation of control strategy, the conductive effect and its confidence of teachers’ workload are calculated, to obtain the conductivity and the conductivity interval, and mine the conductive knowledge of quantitative or qualitative change. A case study of a college shows that the conductive knowledge with a higher support and confidence is helpful for the management departments of colleges to understand the degree of positive or negative effects of some strategies on teachers’ research and teaching workload in quantity, so that they can find an appropriate strategy used to control the teachers’ workload.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"29 1","pages":"245-258"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72761799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Random Forest Approach Using Specific Under Sampling Strategy","authors":"L. Prasanthi, R. K. Kumar, K. Srinivas","doi":"10.1007/978-981-10-3223-3_24","DOIUrl":"https://doi.org/10.1007/978-981-10-3223-3_24","url":null,"abstract":"","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"35 1","pages":"259-270"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74803370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review on Software Defect Prediction Techniques Using Product Metrics","authors":"R. Jayanthi, L. Florence, Arti Arya","doi":"10.14257/IJDTA.2017.10.1.15","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.15","url":null,"abstract":"Presently, complexity and volume of software systems are increasing with a rapid rate. In some cases it improves performance and brings efficient outcome, but unfortunately in several situations it leads to elevated cost for testing, meaningless outcome and inferior quality, even there is no trustworthiness of the products. Fault prediction in software plays a vital role in enhancing the software excellence as well as it helps in software testing to decrease the price and time. Conventionally, to describe the difficulty and calculate the duration of the programming, software metrics can be utilized. To forecast the amount of faults in module and utilizing software metrics, an extensive investigation is performed. With the purpose of recognizing the causes which importantly enhances the fault prediction models related to product metrics, this empirical research is made. This paper visits various software metrics and suggested procedures through which software defect prediction is enhanced and also summarizes those techniques.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"32 1","pages":"163-174"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80467481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}