{"title":"ResidueFinder: extracting individual residue mentions from protein literature.","authors":"Ton E Becker, Eric Jakobsson","doi":"10.1186/s13326-021-00243-3","DOIUrl":"https://doi.org/10.1186/s13326-021-00243-3","url":null,"abstract":"<p><strong>Background: </strong>The revolution in molecular biology has shown how protein function and structure are based on specific sequences of amino acids. Thus, an important feature in many papers is the mention of the significance of individual amino acids in the context of the entire sequence of the protein. MutationFinder is a widely used program for finding mentions of specific mutations in texts. We report on augmenting the positive attributes of MutationFinder with a more inclusive regular expression list to create ResidueFinder, which finds mentions of native amino acids as well as mutations. We also consider parameter options for both ResidueFinder and MutationFinder to explore trade-offs between precision, recall, and computational efficiency. We test our methods and software in full text as well as abstracts.</p><p><strong>Results: </strong>We find there is much more variety of formats for mentioning residues in the entire text of papers than in abstracts alone. Failure to take these multiple formats into account results in many false negatives in the program. Since MutationFinder, like several other programs, was primarily tested on abstracts, we found it necessary to build an expanded regular expression list to achieve acceptable recall in full text searches. We also discovered a number of artifacts arising from PDF to text conversion, which we wrote elements in the regular expression library to address. Taking into account those factors resulted in high recall on randomly selected primary research articles. We also developed a streamlined regular expression (called \"cut\") which enables a several hundredfold speedup in both MutationFinder and ResidueFinder with only a modest compromise of recall. All regular expressions were tested using expanded F-measure statistics, i.e., we compute F<sub>β</sub> for various values of where the larger the value of β the more recall is weighted, the smaller the value of β the more precision is weighted.</p><p><strong>Conclusions: </strong>ResidueFinder is a simple, effective, and efficient program for finding individual residue mentions in primary literature starting with text files, implemented in Python, and available in SourceForge.net. The most computationally efficient versions of ResidueFinder could enable creation and maintenance of a database of residue mentions encompassing all articles in PubMed.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"14"},"PeriodicalIF":1.9,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00243-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39210088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olga Majewska, Charlotte Collins, Simon Baker, Jari Björne, Susan Windisch Brown, Anna Korhonen, Martha Palmer
{"title":"BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine.","authors":"Olga Majewska, Charlotte Collins, Simon Baker, Jari Björne, Susan Windisch Brown, Anna Korhonen, Martha Palmer","doi":"10.1186/s13326-021-00247-z","DOIUrl":"https://doi.org/10.1186/s13326-021-00247-z","url":null,"abstract":"<p><strong>Background: </strong>Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames.</p><p><strong>Results: </strong>We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks.</p><p><strong>Conclusion: </strong>This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"12"},"PeriodicalIF":1.9,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00247-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39188796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Note on the Cardinalities of Sets of Scalar Alternatives","authors":"S. Mascarenhas","doi":"10.1093/jos/ffab011","DOIUrl":"https://doi.org/10.1093/jos/ffab011","url":null,"abstract":"\u0000 Formal theories of scalar implicature appeal crucially to a set of alternatives. These are the alternative statements that a speaker could have made but chose not to in pragmatic accounts, and the alternative statements that figure in the computation of exhaustivity operators in grammatical approaches. I show that the three sufficiently explicit theories of alternatives in the literature generate sets of alternatives that grow at least exponentially as a function of the input, and that these theories generate very large sets even for relatively small inputs. For pragmatic accounts of scalar implicature, I argue these results are hard or impossible to square with what we know independently about manipulating alternatives from the psychology of human reasoning. I propose that they pose a weaker but more general challenge for grammatical approaches, since alternatives as required by exhaustivity operators occur elsewhere in grammar, for example as part of the semantics of operators like “only” and “even.”","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"73 1","pages":"473-482"},"PeriodicalIF":1.9,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86077672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Contribution of Gestures to the Semantics of Non-Canonical Questions","authors":"Michela Ippolito","doi":"10.1093/JOS/FFAB007","DOIUrl":"https://doi.org/10.1093/JOS/FFAB007","url":null,"abstract":"\u0000 The symbolic gesture MAT (mano a tulipano) used by native speakers of Italian characterizes non-canonical wh questions when used both as a co-speech and pro-speech gesture. MAT can be executed with either a fast tempo contour or a slow tempo contour. Tempo is semantically significant: descriptively, a fast tempo characterizes a biased but information-seeking non-canonical question; a slow tempo characterizes a rhetorical non-canonical question. I argue that the fast contour is the default tempo of MAT and that it brings about a biased interpretation. Slowing down the movement occurs when the feature [slow] is added: the semantic contribution of this feature is to add the presupposition that the question is resolved in the conversational context, resulting in the rhetorical interpretation of the question.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"363-392"},"PeriodicalIF":1.9,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84995235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subclausal Local Contexts","authors":"A. Anvari, Kyle Blumberg","doi":"10.1093/JOS/FFAB004","DOIUrl":"https://doi.org/10.1093/JOS/FFAB004","url":null,"abstract":"\u0000 One of the central topics in semantic theory over the last few decades concerns the nature of local contexts. Recently, theorists have tried to develop general, non-stipulative accounts of local contexts (Ingason, 2016; Mandelkern & Romoli, 2017a; Schlenker, 2009). In this paper, we contribute to this literature by drawing attention to the local contexts of subclausal expressions. More specifically, we focus on the local contexts of quantificational determiners, e.g. ‘all’, ‘both’, etc. Our central tool for probing the local contexts of subclausal elements is the principle Maximize Presupposition! (Percus, 2006; Singh, 2011). The empirical basis of our investigation concerns some data discussed by Anvari (2018b), e.g. the fact that sentences such as ‘All of the two presidential candidates are crooked’ are unacceptable. In order to explain this, we suggest that the local context of determiners needs to contain the information carried by their restrictor. However, no existing non-stipulative account predicts this. Consequently, we think that the local contexts of subclausal expressions will likely have to be stipulated. This result has important consequences for debates in semantics and pragmatics, e.g. those around the so-called “explanatory problem” for dynamic semantics (Heim, 1990; Schlenker, 2009; Soames, 1982).","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"174 1","pages":"393-414"},"PeriodicalIF":1.9,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78542982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi
{"title":"Learning adaptive representations for entity recognition in the biomedical domain.","authors":"Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi","doi":"10.1186/s13326-021-00238-0","DOIUrl":"https://doi.org/10.1186/s13326-021-00238-0","url":null,"abstract":"<p><strong>Background: </strong>Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task.</p><p><strong>Results: </strong>This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F<sub>1</sub> score.</p><p><strong>Conclusions: </strong>Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"10"},"PeriodicalIF":1.9,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00238-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38990725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IrGO: Iranian traditional medicine General Ontology and knowledge base.","authors":"Ayeh Naghizadeh, Mahdi Salamat, Donya Hamzeian, Shaghayegh Akbari, Hossein Rezaeizadeh, Mahdi Alizadeh Vaghasloo, Reza Karbalaei, Mehdi Mirzaie, Mehrdad Karimi, Mohieddin Jafari","doi":"10.1186/s13326-021-00237-1","DOIUrl":"https://doi.org/10.1186/s13326-021-00237-1","url":null,"abstract":"<p><strong>Background: </strong>Iranian traditional medicine, also known as Persian Medicine, is a holistic school of medicine with a long prolific history. It describes numerous concepts and the relationships between them. However, no unified language system has been proposed for the concepts of this medicine up to the present time. Considering the extensive terminology in the numerous textbooks written by the scholars over centuries, comprehending the totality of concepts is obviously a very challenging task. To resolve this issue, overcome the obstacles, and code the concepts in a reusable manner, constructing an ontology of the concepts of Iranian traditional medicine seems a necessity.</p><p><strong>Construction and content: </strong>Makhzan al-Advieh, an encyclopedia of materia medica compiled by Mohammad Hossein Aghili Khorasani, was selected as the resource to create an ontology of the concepts used to describe medicinal substances. The steps followed to accomplish this task included (1) compiling the list of classes via examination of textbooks, and text mining the resource followed by manual review to ensure comprehensiveness of extracted terms; (2) arranging the classes in a taxonomy; (3) determining object and data properties; (4) specifying annotation properties including ID, labels (English and Persian), alternative terms, and definitions (English and Persian); (5) ontology evaluation. The ontology was created using Protégé with adherence to the principles of ontology development provided by the Open Biological and Biomedical Ontology (OBO) foundry.</p><p><strong>Utility and discussion: </strong>The ontology was finalized with inclusion of 3521 classes, 15 properties, and 20,903 axioms in the Iranian traditional medicine General Ontology (IrGO) database, freely available at http://ir-go.net/ . An indented list and an interactive graph view using WebVOWL were used to visualize the ontology. All classes were linked to their instances in UNaProd database to create a knowledge base of ITM materia medica.</p><p><strong>Conclusion: </strong>We constructed an ontology-based knowledge base of ITM concepts in the domain of materia medica to help offer a shared and common understanding of this concept, enable reuse of the knowledge, and make the assumptions explicit. This ontology will aid Persian medicine practitioners in clinical decision-making to select drugs. Extending IrGO will bridge the gap between traditional and conventional schools of medicine, helping guide future research in the process of drug discovery.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"9"},"PeriodicalIF":1.9,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00237-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38881446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alyson Maslowski, Halim Abbas, Kelley Abrams, Sharief Taraman, Ford Garberson, Susan Segar
{"title":"Project Rosetta: a childhood social, emotional, and behavioral developmental feature mapping.","authors":"Alyson Maslowski, Halim Abbas, Kelley Abrams, Sharief Taraman, Ford Garberson, Susan Segar","doi":"10.1186/s13326-021-00242-4","DOIUrl":"https://doi.org/10.1186/s13326-021-00242-4","url":null,"abstract":"<p><strong>Background: </strong>A wide array of existing instruments are commonly used to assess childhood behavior and development for the evaluation of social, emotional and behavioral disorders such as Autism Spectrum Disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), and anxiety. Many of these instruments either focus on one diagnostic category or encompass a broad set of childhood behaviors. We analyze a wide range of standardized behavioral instruments and identify a comprehensive, structured semantic hierarchical grouping of child behavioral observational features. We use the hierarchy to create Rosetta: a new set of behavioral assessment questions, designed to be minimal yet comprehensive in its coverage of clinically relevant behaviors. We maintain a full mapping from every functional feature in every covered instrument to a corresponding question in Rosetta.</p><p><strong>Results: </strong>In all, 209 Rosetta questions are shown to cover all the behavioral concepts targeted in the eight existing standardized instruments.</p><p><strong>Conclusion: </strong>The resulting hierarchy can be used to create more concise instruments across various ages and conditions, as well as create more robust overlapping datasets for both clinical and research use.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"8"},"PeriodicalIF":1.9,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8051063/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38876641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreea Grigoriu, Amrapali Zaveri, Gerhard Weiss, Michel Dumontier
{"title":"SIENA: Semi-automatic semantic enhancement of datasets using concept recognition.","authors":"Andreea Grigoriu, Amrapali Zaveri, Gerhard Weiss, Michel Dumontier","doi":"10.1186/s13326-021-00239-z","DOIUrl":"https://doi.org/10.1186/s13326-021-00239-z","url":null,"abstract":"<p><strong>Background: </strong>The amount of available data, which can facilitate answering scientific research questions, is growing. However, the different formats of published data are expanding as well, creating a serious challenge when multiple datasets need to be integrated for answering a question.</p><p><strong>Results: </strong>This paper presents a semi-automated framework that provides semantic enhancement of biomedical data, specifically gene datasets. The framework involved a concept recognition task using machine learning, in combination with the BioPortal annotator. Compared to using methods which require only the BioPortal annotator for semantic enhancement, the proposed framework achieves the highest results.</p><p><strong>Conclusions: </strong>Using concept recognition combined with machine learning techniques and annotation with a biomedical ontology, the proposed framework can provide datasets to reach their full potential of providing meaningful information, which can answer scientific research questions.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"5"},"PeriodicalIF":1.9,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13326-021-00239-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25514924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emma Norris, Janna Hastings, Marta M Marques, Ailbhe N Finnerty Mutlu, Silje Zink, Susan Michie
{"title":"Why and how to engage expert stakeholders in ontology development: insights from social and behavioural sciences.","authors":"Emma Norris, Janna Hastings, Marta M Marques, Ailbhe N Finnerty Mutlu, Silje Zink, Susan Michie","doi":"10.1186/s13326-021-00240-6","DOIUrl":"10.1186/s13326-021-00240-6","url":null,"abstract":"<p><strong>Background: </strong>Incorporating the feedback of expert stakeholders in ontology development is important to ensure content is appropriate, comprehensive, meets community needs and is interoperable with other ontologies and classification systems. However, domain experts are often not formally engaged in ontology development, and there is little available guidance on how this involvement should best be conducted and managed. Social and behavioural science studies often involve expert feedback in the development of tools and classification systems but have had little engagement with ontology development. This paper aims to (i) demonstrate how expert feedback can enhance ontology development, and (ii) provide practical recommendations on how to conduct expert feedback in ontology development using methodologies from the social and behavioural sciences.</p><p><strong>Main body: </strong>Considerations for selecting methods for engaging stakeholders are presented. Mailing lists and issue trackers as existing methods used frequently in ontology development are discussed. Advisory boards and working groups, feedback tasks, consensus exercises, discussions and workshops are presented as potential methods from social and behavioural sciences to incorporate in ontology development.</p><p><strong>Conclusions: </strong>A variety of methods from the social and behavioural sciences exist to enable feedback from expert stakeholders in ontology development. Engaging domain experts in ontology development enables depth and clarity in ontology development, whilst also establishing advocates for an ontology upon its completion.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"12 1","pages":"4"},"PeriodicalIF":1.9,"publicationDate":"2021-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7985588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}