{"title":"Analyzing immediate correlations between names and pop culture of North America in the 21st century","authors":"A. Gurnett, Robin Besson, M. O. Shafiq, R. Alhajj","doi":"10.1109/IRI.2014.7051923","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051923","url":null,"abstract":"In this paper we looked at effect of pop culture on naming of babies in the subsequent years from its appearance. By employing a data mining based dynamic estimation algorithm we attempt to predict the most popular names in the year after the data in the database ends as well the names which will be unique. Our proposed solution is based on the rules found in this paper through the use of algorithms such as rough set theory which determined how much of an effect each subset of pop culture has on new parents. With the results of this effect an algorithm has been developed which is analyzed here. This algorithm has been created with the intended target audience of future parents as well as businesses looking to create personalized items and build targeted marketing strategies.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of implied scenarios in multiagent systems with clustering agents' communications","authors":"F. H. Fard, B. Far","doi":"10.1109/IRI.2014.7051895","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051895","url":null,"abstract":"Software agents in Multiagent Systems (MAS) have several interactions that are designed and represented in the scenarios of the system. These communications should be verified to detect whether the agents will show a new behavior in their execution, which is known as emergent behavior or implied scenario. Most research use different versions of state machines modeling for the detection of implied scenarios, which consider the states of one/all agents. The existing detection processes ignore the interactions among agents. In this paper, besides modeling the states and agents' behaviors, we model the agents' interactions derived from their designs, to detect implied scenarios. A new type of implied scenario that occurs when a process misses the information about its common communications in multiple scenarios is studied in this paper. This type of implied scenario cannot be detected with other approaches. Various situations that can lead to this implied scenario are ruled. Moreover, a detection methodology based on clustering the agents' communications from the scenarios of the system is presented. The results are verified through a case study.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121776673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hidden treasure? Evaluating and extending latent methods for link-based classification","authors":"Aaron Fleming, Luke K. McDowell, Zane Markel","doi":"10.1109/IRI.2014.7051954","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051954","url":null,"abstract":"Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by \"friendships\" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125025910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using incremental clustering technique in collaborative filtering data update","authors":"Xiwei Wang, Jun Zhang","doi":"10.1109/IRI.2014.7051920","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051920","url":null,"abstract":"Collaborative filtering (CF) techniques are widely used by online shops in their recommender systems. It is well known that the nonnegative matrix factorization (NMF) based CF algorithms are popular and can provide reasonable product recommendations. However, the dimensions of the factor matrices in NMF need to be predetermined and updated when necessary. Moreover, data arrives in every second so the recommender systems must be capable of updating the fast growing data in a timely manner. In this paper, we propose an approach that incorporates incremental clustering technique into NMF based data update algorithm which can determine the dimensions of the factor matrices and update them automatically. The approach clusters users' and items' auxiliary information and uses them as constraints in NMF for data update. The cluster quantities are used as the dimensions of the factor matrices. With more data coming in, the incremental clustering algorithm determines whether to increase the number of clusters or merge the existing clusters. Experiments on three different datasets (MovieLens, Sushi and LibimSeTi) are conducted to examine the proposed approach. The results show that our approach can update the data quickly and provide encouraging prediction accuracy.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Peer review in online forums: Classifying feedback-sentiment","authors":"G. Harris, A. Panangadan, V. Prasanna","doi":"10.1109/IRI.2014.7051947","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051947","url":null,"abstract":"Replies posted in technical online forums often contain feedback to the author of the parent comment in the form of agreement, doubt, gratitude, contradiction, etc. We call this feedback-sentiment. Inference of feedback-sentiment has application in expert finding, fact validation, and answer validation. To study feedback-sentiment, we use nearly 25 million comments from a popular discussion forum (Slash-dot, org), spanning over 10 years. We propose and test a heuristic that feedback-sentiment most commonly appears in the first sentence of a forum reply. We introduce a novel interactive decision tree system that allows us to train a classifier using principles from active learning. We classify individual reply sentences as positive, negative, or neutral, and then test the accuracy of our classifier against labels provided by human annotators (using Amazon's Mechanical Turk). We show how our classifier outperforms three general-purpose sentiment classifiers for the task of finding feedback-sentiment.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130675964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randall Wald, Flavio Villanustre, T. Khoshgoftaar, R. Zuech, J. Robinson, Edin A. Muharemagic
{"title":"Using feature selection and classification to build effective and efficient firewalls","authors":"Randall Wald, Flavio Villanustre, T. Khoshgoftaar, R. Zuech, J. Robinson, Edin A. Muharemagic","doi":"10.1109/IRI.2014.7051979","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051979","url":null,"abstract":"Firewalls form an essential element of modern network security, detecting and discarding malicious packets before they can cause harm to the network being protected. However, these firewalls must process a large number of packets very quickly, and so can't always make decisions based on all of the packets' properties (features). Thus, it is important to understand which features are most relevant in determining if a packet is malicious, and whether a simple model built from these features can be as effective as a model which uses all information on each packet. We explore a dataset with real-world firewall data to answer these questions, ranking the features with 22 feature selection techniques and building classification models using four classifiers (learners). Our results show that the top two features are proto and dst (representing the network protocol and destination IP address, respectively), and that models built using these two features in combination with the Naive Bayes learner are highly effective while being minimally computationally expensive. Such models have the potential to replace conventional firewalls while lowering computational needs.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132272359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Khoshgoftaar, Alireza Fazelpour, D. Dittman, Amri Napolitano
{"title":"Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data","authors":"T. Khoshgoftaar, Alireza Fazelpour, D. Dittman, Amri Napolitano","doi":"10.1109/IRI.2014.7051906","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051906","url":null,"abstract":"Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the problem of excessive features in the dataset may be alleviated through feature selection. In this work, we examine various approaches for applying these techniques simultaneously to tackle both of these challenges and build effective classification models. In particular, we ask whether the order of these techniques and the use of unsampled or sampled datasets for building classification models makes a difference. We conducted an empirical study on a series of seven high-dimensional and severely imbalanced biological datasets using six commonly used learners and four feature selection rankers from three different families of feature selection techniques. We compared three different data-sampling approaches: data sampling followed by feature selection using the unsampled data (DS-FS-UnSam) and selected features; data sampling followed by feature selection using the sampled data (DS-FS-Sam) and selected features; and feature selection followed by data sampling (FS-DS) using sampled data and selected features. We used Random Undersampling (RUS) to achieve the minority: majority class ratios of 35:65 and 50:50. The experimental results show that there are statistically significant differences among the three data-sampling approaches only when using the class ratio of 50:50, with a multiple comparison test showing that DS-FS-UnSam outperforms the other approaches. Thus, although specific combinations of learner and ranker may favor other approaches, across all choices of learner and ranker we would recommend the use of the DS-FS-UnSam approach for this class ratio. On the other hand, with the 35:65 class ratio, DS-FS-Sam was most frequently the top-performing approach, and although it was not statistically significantly better than the other approaches, we would generally recommend this approach be used for the 35:65 class ratio (although specific choices of learner and ranker may vary). Overall, we can see that the optimal approach will depend on the choice of class ratio.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129816208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Aiello, Parnian Najafi Borazjani, Ermanno Battista, Massimiliano Albanese
{"title":"Next-generation technologies for preventing accidental death of children trapped in parked vehicles","authors":"V. Aiello, Parnian Najafi Borazjani, Ermanno Battista, Massimiliano Albanese","doi":"10.1109/IRI.2014.7051931","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051931","url":null,"abstract":"Integration of computational and physical elements into cyber-physical systems is increasingly finding application in a number of different domains, including smart power grids, medical technologies, and building automation. In this paper, we study how the notion of cyber-physical integration can be applied to the design of the next generation of safety devices for saving the life of children inadvertently left into parked vehicles. In the United States alone, an average 38 children die from heatstroke after being left into parked vehicles by their caregivers. To be effective, next-generation safety devices will need to have the capability of sensing the environment in and around the vehicle, integrating and processing data from an array of different sensors, assessing the risk in real time, and triggering appropriate corrective actions aimed at removing or mitigating the risk factors for the child.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132480293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-oriented intelligent transportation systems","authors":"H. Ibrahim, B. Far","doi":"10.1109/IRI.2014.7051907","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051907","url":null,"abstract":"Real-time analysis of traffic data is a key challenge in intelligent transportation system. It aims at discovering useful traffic patterns that can help decision makers better manage the transportation system and test and introduce new policies. Discovered patterns can also be used to support road users to reach their destination safely and with reasonable commuting time. In this paper, a number of key challenges associated with transportation systems and possible solutions are discussed. A method that analyzes real-time traffic data to predict future status of traffic flow and incidents is introduced. The proposed method includes three phases: offline, real-time, and decision support phases. In this paper, a decision tree classification model is constructed and validated for an accident dataset. Possible benefits of using the constructed model are demonstrated using results of the classification analysis.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115165753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UML activity diagram to event-B: A model transformation approach based on the institution theory","authors":"Amine Achouri, Leila Jemni Ben Ayed","doi":"10.1109/IRI.2014.7051974","DOIUrl":"https://doi.org/10.1109/IRI.2014.7051974","url":null,"abstract":"Making jointly a semi formal language and a formal language can be seen as the transformation of a semi formal model into a formal model. Thus, this task can be considered as a model transformation from an abstract model into another concrete one. In this context, the paper at hand comes up with an approach to model transformation from a UML Activity Diagram (UML AD) into the Event-B model. The approach will be fully detailed in the paper by putting the stress on its different steps. The issue of semantic preserving during the transformation process will also be discussed. The latter is performed after defining two local semantics for the UML AD and Event-B specification. The mathematical foundation of our approach is the institution theory. Such a theory establishes a new notion of algebraic semantic for the source and target formalisms. Additionally, with the institution morphisms, we define a semantic correctness and coherence of the model transformation.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116148321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}