{"title":"选择语言特征提取文本中蛋白质-蛋白质相互作用","authors":"T. Phan, T. Ohkawa, Akihiro Yamamoto","doi":"10.1109/BIBE.2017.00-58","DOIUrl":null,"url":null,"abstract":"Extracting protein-protein interactions (PPIs) from articles is important in comprehending the underlying biological processes. With advances of natural language processing, many automatic PPI extraction methods from articles such as the machine learning-based methods, including the feature-based methods and the kernel-based ones, have been developed. However, the results of these methods still need to be improved much more. We propose a novel method to extract PPIs from articles. We use many diverse features, including lexical features obtained from sentences and features obtained from parse trees. We also devise new features extracted from shortest dependency paths obtained from dependency trees. In our method, after the training data and the test data are partitioned into subsets based on the basic structures of the sentences and the process of the feature selection (FS) is performed, we decrease the values of all the features, which belong to each group of similar features, of each instance by multiplying them with corresponding shrink coefficients of features. These shrink coefficients are determined automatically. Our experimental results using five corpora show the usefulness of the proposed method.","PeriodicalId":262603,"journal":{"name":"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Protein-Protein Interaction Extraction from Text by Selecting Linguistic Features\",\"authors\":\"T. Phan, T. Ohkawa, Akihiro Yamamoto\",\"doi\":\"10.1109/BIBE.2017.00-58\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting protein-protein interactions (PPIs) from articles is important in comprehending the underlying biological processes. With advances of natural language processing, many automatic PPI extraction methods from articles such as the machine learning-based methods, including the feature-based methods and the kernel-based ones, have been developed. However, the results of these methods still need to be improved much more. We propose a novel method to extract PPIs from articles. We use many diverse features, including lexical features obtained from sentences and features obtained from parse trees. We also devise new features extracted from shortest dependency paths obtained from dependency trees. In our method, after the training data and the test data are partitioned into subsets based on the basic structures of the sentences and the process of the feature selection (FS) is performed, we decrease the values of all the features, which belong to each group of similar features, of each instance by multiplying them with corresponding shrink coefficients of features. These shrink coefficients are determined automatically. Our experimental results using five corpora show the usefulness of the proposed method.\",\"PeriodicalId\":262603,\"journal\":{\"name\":\"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2017.00-58\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2017.00-58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Protein-Protein Interaction Extraction from Text by Selecting Linguistic Features
Extracting protein-protein interactions (PPIs) from articles is important in comprehending the underlying biological processes. With advances of natural language processing, many automatic PPI extraction methods from articles such as the machine learning-based methods, including the feature-based methods and the kernel-based ones, have been developed. However, the results of these methods still need to be improved much more. We propose a novel method to extract PPIs from articles. We use many diverse features, including lexical features obtained from sentences and features obtained from parse trees. We also devise new features extracted from shortest dependency paths obtained from dependency trees. In our method, after the training data and the test data are partitioned into subsets based on the basic structures of the sentences and the process of the feature selection (FS) is performed, we decrease the values of all the features, which belong to each group of similar features, of each instance by multiplying them with corresponding shrink coefficients of features. These shrink coefficients are determined automatically. Our experimental results using five corpora show the usefulness of the proposed method.