{"title":"基于自然语言处理的玻璃文献元分析及结构因子数据库提取","authors":"Mohd Zaki , Sahith Reddy Namireddy , Tanu Pittie , Vaibhav Bihani , Shweta Rani Keshri , Vineeth Venugopal , Nitya Nand Gosvami , Jayadeva , N.M. Anoop Krishnan","doi":"10.1016/j.nocx.2022.100103","DOIUrl":null,"url":null,"abstract":"<div><p>Although scientific journals stand as a reliable peer-reviewed source of data, it is often too tedious to manually extract relevant information from papers. This could be attributed to the unstructured data such as images, text, captions, and non-standard reporting of data in tables. Here, using natural language processing (NLP), we introduce a corpus of around ~100,000 glass science-related research papers and 106,238 images published in them, that allow for easy navigation and query-based searching through the database. We perform a meta-analysis of the literature in the corpus employing NLP tools. Specifically, we analyze the trends in the number of publications based on countries, research areas, and journals, thereby giving a broad overview of the progress in glass science over the last six decades. Further, as a demonstration of information extraction, we extract the structure factor data of ~450 glass compositions, thereby creating the first-ever public repository on the structure factor of glasses.</p></div>","PeriodicalId":37132,"journal":{"name":"Journal of Non-Crystalline Solids: X","volume":"15 ","pages":"Article 100103"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590159122000231/pdfft?md5=43aa4e2b361ca09396dee7fb8159452d&pid=1-s2.0-S2590159122000231-main.pdf","citationCount":"5","resultStr":"{\"title\":\"Natural language processing-guided meta-analysis and structure factor database extraction from glass literature\",\"authors\":\"Mohd Zaki , Sahith Reddy Namireddy , Tanu Pittie , Vaibhav Bihani , Shweta Rani Keshri , Vineeth Venugopal , Nitya Nand Gosvami , Jayadeva , N.M. Anoop Krishnan\",\"doi\":\"10.1016/j.nocx.2022.100103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Although scientific journals stand as a reliable peer-reviewed source of data, it is often too tedious to manually extract relevant information from papers. This could be attributed to the unstructured data such as images, text, captions, and non-standard reporting of data in tables. Here, using natural language processing (NLP), we introduce a corpus of around ~100,000 glass science-related research papers and 106,238 images published in them, that allow for easy navigation and query-based searching through the database. We perform a meta-analysis of the literature in the corpus employing NLP tools. Specifically, we analyze the trends in the number of publications based on countries, research areas, and journals, thereby giving a broad overview of the progress in glass science over the last six decades. Further, as a demonstration of information extraction, we extract the structure factor data of ~450 glass compositions, thereby creating the first-ever public repository on the structure factor of glasses.</p></div>\",\"PeriodicalId\":37132,\"journal\":{\"name\":\"Journal of Non-Crystalline Solids: X\",\"volume\":\"15 \",\"pages\":\"Article 100103\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2590159122000231/pdfft?md5=43aa4e2b361ca09396dee7fb8159452d&pid=1-s2.0-S2590159122000231-main.pdf\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Non-Crystalline Solids: X\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590159122000231\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Physics and Astronomy\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Non-Crystalline Solids: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590159122000231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Physics and Astronomy","Score":null,"Total":0}
Natural language processing-guided meta-analysis and structure factor database extraction from glass literature
Although scientific journals stand as a reliable peer-reviewed source of data, it is often too tedious to manually extract relevant information from papers. This could be attributed to the unstructured data such as images, text, captions, and non-standard reporting of data in tables. Here, using natural language processing (NLP), we introduce a corpus of around ~100,000 glass science-related research papers and 106,238 images published in them, that allow for easy navigation and query-based searching through the database. We perform a meta-analysis of the literature in the corpus employing NLP tools. Specifically, we analyze the trends in the number of publications based on countries, research areas, and journals, thereby giving a broad overview of the progress in glass science over the last six decades. Further, as a demonstration of information extraction, we extract the structure factor data of ~450 glass compositions, thereby creating the first-ever public repository on the structure factor of glasses.