T. Khoshgoftaar, A. Folleco, J. V. Hulse, Lofton A. Bullard
{"title":"Software Quality Imputation in the Presence of Noisy Data","authors":"T. Khoshgoftaar, A. Folleco, J. V. Hulse, Lofton A. Bullard","doi":"10.1109/IRI.2006.252462","DOIUrl":"https://doi.org/10.1109/IRI.2006.252462","url":null,"abstract":"The detrimental effects of noise in a dependent variable on the accuracy of software quality imputation techniques were studied. The imputation techniques used in this work were Bayesian multiple imputation, mean imputation, instance-based learning, regression imputation, and the REPTree decision tree. These techniques were used to obtain software quality imputations for a large military command, control, and communications system dataset (CCCS). The underlying quality of data was a significant factor affecting the accuracy of the imputation techniques. Multiple imputation and regression imputation were top performers, while mean imputation was ineffective","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Chinese Text Chunking","authors":"PanPan Liao, Y. Liu, Lin Chen","doi":"10.1109/IRI.2006.252475","DOIUrl":"https://doi.org/10.1109/IRI.2006.252475","url":null,"abstract":"Text chunking is an effective method to decrease the difficulty of natural language parsing. In this paper, a statistical method based on hidden Markov model (HMM) is used for Chinese text chunking. Moreover, a transformation based error-driven learning approach is adopted to improve the performance. The definition of transformation rule templates is the key problem of this machine learning approach. All the templates are learned from the corpus automatically in this paper. The precision using HMM is 88.19% and the precision is 92.67% combining HMM and transformation based error-driven learning","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121812055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Flexible Architecture for Integrating Analysis Operations Into a Scientific Data Repository","authors":"Brian Harrington, R. Brazile, K. Swigger","doi":"10.1109/IRI.2006.252394","DOIUrl":"https://doi.org/10.1109/IRI.2006.252394","url":null,"abstract":"In this paper, we propose a flexible architecture for integrating analysis operations into a scientific data repository. The architecture is based on a Web services framework to ensure interoperability and to avoid problems associated with common firewalls. Furthermore, to aid in ease of use, the services and operations can be searched and accessed similar to files in the repository. Services can be built around existing tools and libraries allowing us to leverage the large number of existing analysis tools","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121818337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Markov Chain and Nearest Neighbor Criteria in an Experience Based Study Planning System with Linear Time Search and Scalability","authors":"Juan Carlos Segura-Ramirez, Willie Chang","doi":"10.1109/IRI.2006.252447","DOIUrl":"https://doi.org/10.1109/IRI.2006.252447","url":null,"abstract":"Most automated rule-based expert systems developed to aid student study planning and advising have appeared to be ephemeral due to the dynamic property in the ever-changing curricular requirements and rules. We propose a novel case-based study planning system with the search criteria based on the experience-indicated probability in Markov chains and the nearest-neighbor measurement for matches. We provide query results of course sequences to students who need to meet certain constraints such as to graduate within a certain number of academic terms, maintaining a minimal grade-point average, etc., all drawn from past graduate records. The time complexity of computing the nearest-neighbor indices to find the maximum similarity can be very large. Our implementation method achieves a linear-time complexity in both searching and scaling the system. When updating with a new record, each parametric combination represented by a sorted list of the records is linearly looked up, and the new record value is inserted to keep the list sorted. Since each query input is a set of constraints in a pre-determined order, the parametric combinations have an associated sorted list to look up in a one-pass linear process. The first-order Markov chains can also be updated with a linear time complexity whenever a new graduate record is introduced. The probability matrix is first looked up by row and then column, representing a pair of courses taken in two adjacent academic terms, and the look-up time is also linear","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121958032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Labeling Network Event Records for Intrusion Detection in aWireless LAN","authors":"T. Khoshgoftaar, Chris Seiffert, Naeem Seliya","doi":"10.1109/IRI.2006.252413","DOIUrl":"https://doi.org/10.1109/IRI.2006.252413","url":null,"abstract":"A data mining approach to network intrusion detection requires a training dataset of network event records labeled as either normal or an attack type. Since there are too many events to track, such datasets are typically very large. This is particularly so in a WLAN where number of non-wired devices communicating with the WLAN can be too many and adhoc. This creates a problem for the domain expert in labeling all records in the training dataset which is used to train a machine learner. We present a simple approach by which the number of network records the expert has to examine is a relatively small proportion of the given training dataset. A clustering algorithm is used to form relatively coherent groups which the expert examines as a whole to label records as one of four classes: red (definite intrusion), yellow (possibly intrusion), blue (probably normal), and green (definite normal). An ensemble classifier-based data cleansing approach is then used to detect records that were likely mislabeled by the expert. The proposed approach is investigated with a case study of a real-world WLAN. An ensemble classifier-based intrusion detection model built using the labeled training dataset demonstrates the effectiveness of the labeling approach and the good generalization accuracy","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122383969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Morphological Tagging of Contemporary Uighur Corpus","authors":"G. Altenbek","doi":"10.1109/IRI.2006.252474","DOIUrl":"https://doi.org/10.1109/IRI.2006.252474","url":null,"abstract":"In this paper, we propose methods of Uighur word lemmatization by using of morphemic analysis and word's structural analysis, integrating morphological processing and part-of speech (POS) tagging, so as to find linguistic information and automatic POS of Uighur Corpus as the final purpose. For the regular words, the accuracy of word lemmatization reach 85% and POS reach 80%","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127155670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joslaine Cristina Jeske de Freitas, S. Julia, R. Valette
{"title":"Fuzzy Continuous Resource Allocation Mechanisms in Workflow Management Systems","authors":"Joslaine Cristina Jeske de Freitas, S. Julia, R. Valette","doi":"10.1109/IRI.2006.252460","DOIUrl":"https://doi.org/10.1109/IRI.2006.252460","url":null,"abstract":"In this paper, an approach based on a fuzzy hybrid Petri net model is proposed to solve the resource allocation problem of workflow management systems. Hybrid resource allocation mechanisms are modeled by hybrid Petri net with discrete transitions where discrete resources represent equipment and continuous resources represent employees availability. To express in a more realistic way the resource allocation mechanisms when human behavior is considered, fuzzy sets delimited by possibility distributions of the triangular form are associated with the marking of the places which represent human availability. New firing rules and the fuzzy marking evolution of the new model are defined. Such a fuzzy resource allocation model is applied to an example of resource allocation mechanism of a \"handle complaint process\"","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121229580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using JPL Micro-Imagers for the Johnson Space Center International Space Station Cell Growth Experiment","authors":"A. Behar, F. Bruhn, J. Matthews, E. Means","doi":"10.1109/IRI.2006.252384","DOIUrl":"https://doi.org/10.1109/IRI.2006.252384","url":null,"abstract":"A development program is occurring to produce the International Space Station (ISS), Bio Tech Facility (BTF) at the Johnson Space Center (JSC). This facility will allow numerous experiments related to micro-gravity cell development, growth and adaptation to be preformed on the International Space Station. JSC desired an upgrade to current capabilities that included micro-imagers. JPL has provided these micro-imagers for the JSC ISS cell growth experiment program The imagers are to be used to make heuristic decisions on the growth and health of growing cells and as a means of recording data on their development. This paper describes the imager produced, its intended application and the subsequent testing that has gone on to ensure functionality of such a system in the space radiation environment","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Ships in Surveillance Video","authors":"Qiming Luo, T. Khoshgoftaar, A. Folleco","doi":"10.1109/IRI.2006.252453","DOIUrl":"https://doi.org/10.1109/IRI.2006.252453","url":null,"abstract":"Object classification is an important component in a complete visual surveillance system. In the context of coastline surveillance, we present an empirical study on classifying 402 instances of ship regions into 6 types based on their shape features. The ship regions were extracted from surveillance videos and the 6 types of ships as well as the ground truth classification labels were provided by human observers. The shape feature of each region was extracted using MPEG-7 region-based shape descriptor. We applied k nearest neighbor to classify ships based on the similarity of their shape features, and the classification accuracy based on stratified ten-fold cross validation is about 91%. The proposed classification procedure based on MPEG-7 region-based shape descriptor and k nearest neighbor algorithm is robust to noise and imperfect object segmentation. It can also be applied to the classification of other rigid objects, such as airplanes, vehicles, etc","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114483343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoshitaka Sakurai, T. Onoyama, S. Kubota, Yoshihiro Nakamura, S. Tsuruta
{"title":"A Multi-world Intelligent Genetic Algorithm to Interactively Optimize Large-scale TSP","authors":"Yoshitaka Sakurai, T. Onoyama, S. Kubota, Yoshihiro Nakamura, S. Tsuruta","doi":"10.1109/IRI.2006.252421","DOIUrl":"https://doi.org/10.1109/IRI.2006.252421","url":null,"abstract":"To optimize large-scale distribution networks, solving about 1000 middle scale (around 40 cities) TSPs (traveling salesman problems) within an interactive length of time (max. 30 seconds) is required. Yet, expert-level (less than 3% of errors) accuracy is necessary. To realize the above requirements, a multi-world intelligent GA method was developed. This method combines a high-speed GA with an intelligent GA holding problem-oriented knowledge that is effective for some special location patterns. If conventional methods were applied, solutions for more than 20 out of 20,000 cases were below expert-level accuracy. However, the developed method could solve all of 20,000 cases at expert-level","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128373673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}