{"title":"A Combination-based Semantic Similarity Measure using Multiple Information Sources","authors":"Hoa A. Nguyen, H. Al-Mubaid","doi":"10.1109/IRI.2006.252484","DOIUrl":"https://doi.org/10.1109/IRI.2006.252484","url":null,"abstract":"The semantic similarity techniques are interested in determining how much two concepts, or terms, are similar according to a given ontology. This paper proposes a method for measuring semantic similarity/distance between terms. The measure combines strengths and complements weaknesses of existing measures that use ontology as primary source. The proposed measure uses a new feature of common specificity (CSpec) besides the path length feature. The CSpec feature is derived from (1) information content of concepts, and (2) information content of the ontology given a corpus. We evaluated the proposed measure with benchmark test set of term pairs scored for similarity by human experts. The experimental results demonstrated that our similarity measure is effective and outperforms the existing measures. The proposed semantic similarity measure gives the best correlation (0.874) with human scores in the benchmark test set compared to the existing measures","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116337919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity issues in attribute implications from data with fuzzy attributes","authors":"R. Belohlávek, Vilém Vychodil","doi":"10.1109/IRI.2006.252401","DOIUrl":"https://doi.org/10.1109/IRI.2006.252401","url":null,"abstract":"We study similarity in formal concept analysis of data tables with fuzzy attributes. We focus on similarity related to attribute implications, i.e. rules A rArr B describing dependencies \"each object which has all attributes from A has also all attributes from B\". We present several formulas for estimation of similarity of outputs in terms of similarity of inputs. The results answer some natural questions such as how much do truth degrees of A1 rArr B and A2 rArr B differ in terms of similarity of A1 to A2?","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114261377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomized Local Extrema for Heuristic Selection in TSP","authors":"Q. Liang, S. Rubin","doi":"10.1109/IRI.2006.252436","DOIUrl":"https://doi.org/10.1109/IRI.2006.252436","url":null,"abstract":"It follows from the search randomizations in space-time among candidate heuristics that the optimality of an arbitrary heuristic is unsolvable. There are a countable infinite number of theories that may be decomposed into stronger local proofs. Local inductive randomization depends on domain symmetry for tractability. TSP problems exhibit tentative domain symmetry and potential space-time randomness in domain solution evolution. Heuristics in the domain of the TSP can be found and selected with a suitable representation, randomization, and symmetric induction with a significantly reduced time. Better representation of the TSP problem facilitates a better solution","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127170935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enrollment Prediction through Data Mining","authors":"Svetlana S. Aksenova, Du Zhang, M. Lu","doi":"10.1109/IRI.2006.252466","DOIUrl":"https://doi.org/10.1109/IRI.2006.252466","url":null,"abstract":"In this paper, we describe our study on enrollment prediction using support vector machines and rule-based predictive models. The goal is to predict the total enrollment headcount that is composed of new (freshman and transfer), continued and returned students. The proposed approach builds predictive models for new, continued and returned students, respectively first, and then aggregates their predictive results from which the model for the total headcount is generated. The types of data utilized during the mining process include population, employment, tuition and fees, household income, high school graduates, and historical enrollment data. Support vector machines produce the initial predictive results, which are then used by a tool called Cubist to generate easy-to-understand rule-based predictive models. Finally we present some empirical results on enrollment prediction for computer science students at California State University, Sacramento","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127782289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensing Super-position: Human Sensing Beyond the Visual Spectrum","authors":"D. Maluf, P. Tran","doi":"10.1109/IRI.2006.252481","DOIUrl":"https://doi.org/10.1109/IRI.2006.252481","url":null,"abstract":"The coming decade of fast, cheap and miniaturized electronics and sensory devices opens new pathways for the development of sophisticated equipment to overcome limitations of the human senses. This paper addresses the technical feasibility of augmenting human vision through sensing super-position by mixing natural human sensing. The current implementation of the device translates visual and other passive or active sensory instruments into sounds, which become relevant when the visual resolution is insufficient for very difficult and particular sensing tasks. A successful sensing super-position meets many human and pilot vehicle system requirements. The system can be further developed into cheap, portable, and low power taking into account the limited capabilities of the human user as well as the typical characteristics of his dynamic environment. The system operates in real time, giving the desired information for the particular augmented sensing tasks. The sensing super-position device increases the image resolution perception and is obtained via an auditory representation as well as the visual representation. Auditory mapping is performed to distribute an image in time. The three-dimensional spatial brightness and multi-spectral maps of a sensed image are processed using real-time image processing techniques (e.g. histogram normalization) and transformed into a two-dimensional map of an audio signal as a function of frequency and time. This paper details the approach of developing sensing super-position systems as a way to augment the human vision system by exploiting the capabilities of the human hearing system as an additional neural input. The human hearing system is capable of learning to process and interpret extremely complicated and rapidly changing auditory patterns. The known capabilities of the human hearing system to learn and understand complicated auditory patterns provided the basic motivation for developing an image-to-sound mapping system. The human brain is superior to most existing computer systems in rapidly extracting relevant information from blurred, noisy, and redundant images. From a theoretical viewpoint, this means that the available bandwidth is not exploited in an optimal way. While image-processing techniques can manipulate, condense and focus the information (e.g., Fourier transforms), keeping the mapping as direct and simple as possible might also reduce the risk of accidentally filtering out important clues. After all, especially a perfect non-redundant sound representation is prone to loss of relevant information in the non-perfect human hearing system. Also, a complicated non-redundant image-to-sound mapping may well be far more difficult to learn and comprehend than a straightforward mapping, while the mapping system would increase in complexity and cost. This work demonstrates some basic information processing for optimal information capture for head-mounted systems","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125974966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Axiomatizing Relational Network for Knowledge Engineering - Exploring WordNet and FrameNet","authors":"I. Chow, B. Wong","doi":"10.1109/IRI.2006.252423","DOIUrl":"https://doi.org/10.1109/IRI.2006.252423","url":null,"abstract":"The focus of this paper is showing how linguistic information can be modeled in an ontological engineering environment for knowledge management and acquisition, and on this basis made accessible for hierarchical and axiomatic processing. The simplicity of relational network notation models stratal linguistic information solely with reference to sets of interconnecting nodes. Axioms can be effortlessly declared upon the simplicity of the notation such that the knowledge base can be easily extended with the power of inference. Fruitful new knowledge can thus be acquired through axiomatic inference in terms of uncovering latent links between concepts and/or instances in the knowledge base. With this model, various linguistic resources, WordNet and FrameNet originally encoding different domains of linguistic knowledge, are now capable of interfacing with each other, retrieving and generating underlying linguistic information, serving as a more comprehensive NLP tool","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126031241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Component Non-functional Interoperability Analysis: A UML-based and Goal-Oriented Approach","authors":"Sam Supakkul, E. Oladimeji, L. Chung","doi":"10.1109/IRI.2006.252439","DOIUrl":"https://doi.org/10.1109/IRI.2006.252439","url":null,"abstract":"Component-based development (CBD) has a great potential of reducing development cost and time by integrating existing software components. But it also faces many challenges one of which is ensuring interoperability of the components that may have been developed with different functional and non-functional goals. The software community has traditionally focused more on the functional aspect of the interoperability such as syntactic and semantic compatibility. However, incompatibility from the non-functional aspect could lead to poor quality such as insufficient security or even inoperable system. This paper presents a preliminary framework for analyzing non-functional requirements (NFRs) defined for the component required and provided interfaces. The components are considered non-functionally interoperable when they agree on the definition and implementation techniques used to achieve the NFRs. Any detected mismatches can be resolved using a combination of the three presented tactics, including replacing the server component, negotiating for more attainable NFRs, or using an adapter component to bridge the non-functional differences. A running example based on a simplified Web-based conference management system is used to illustrate the application of this framework","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Method for Biological Sequence Clustering","authors":"Wei-bang Chen, Chengcui Zhang","doi":"10.1109/IRI.2006.252427","DOIUrl":"https://doi.org/10.1109/IRI.2006.252427","url":null,"abstract":"In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121646356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Tzong-Han Tsai, Hong-Jie Dai, Hsieh-Chuan Hung, Cheng-Lung Sung, Min-Yuh Day, W. Hsu
{"title":"Chinese Word Segmentation with Minimal Linguistic Knowledge: An Improved Conditional Random Fields Coupled with Character Clustering and Automatically Discovered Template Matching","authors":"Richard Tzong-Han Tsai, Hong-Jie Dai, Hsieh-Chuan Hung, Cheng-Lung Sung, Min-Yuh Day, W. Hsu","doi":"10.1109/IRI.2006.252425","DOIUrl":"https://doi.org/10.1109/IRI.2006.252425","url":null,"abstract":"This paper addresses three major problems of closed task Chinese word segmentation (CWS): word overlap, tagging sentences interspersed with non-Chinese words, and long named entity (NE) identification. For the first, we use additional bigram features to approximate trigram and tetragram features. For the second, we first apply K-means clustering to identify non-Chinese characters. Then, we employ a two-tagger architecture: one for Chinese text and the other for non-Chinese text. Finally, we post-process our CWS output using automatically generated templates. Our results show that additional bigrams can effectively identify more unknown words. Secondly, using our two-tagger method, segmentation performance on sentences containing non-Chinese words is significantly improved when non-Chinese characters are sparse in the training corpus. Lastly, identification of long NEs and long words is also enhanced by template-based post-processing. Using corpora in closed task of SIGHAN CWS, our best system achieves F-scores of 0.956, 0.947, and 0.965 on the AS, HK, and MSR corpora respectively, compared to the best context scores of 0.952, 0.943, and 0.964 in SIGHAN Bakeoff 2005. In AS, this performance is comparable to the best result (F = 0.956) in the open task","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133823336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concept-based Web Search using Domain Prediction and Parallel Query Expansion","authors":"Rahul Joshi, Y. Aslandogan","doi":"10.1109/IRI.2006.252407","DOIUrl":"https://doi.org/10.1109/IRI.2006.252407","url":null,"abstract":"We address the problem of irrelevant results for short queries on Web search engines using latent semantic indexing in the WordSpace model and query expansion. First, we predict the potential concept topics, which are the domains for the search terms. Next, we expand the search terms in each of the predicted domains in parallel. We then submit separate queries, specialized for each domain, to a general-purpose search engine. The user is presented with categorized search results under the predicted domains. We prepared a categorized text collection (corpus) using Web directory listing to build word association models. We compare the results obtained using this corpus with those using Reuters corpus. User evaluations indicate that our approach helps the users avoid having to examine irrelevant Web search results, especially with short queries","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133015122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}