{"title":"使用BERT的基于模板的表格数据NLG","authors":"Srushti Gajbhiye, M. Lopes","doi":"10.1109/GHCI50508.2021.9514032","DOIUrl":null,"url":null,"abstract":"With the data size growing exponentially, machines need to be well-equipped to understand all kinds of data. Tabular content is preferred over textual content by humans as it presents inter-related data in a simplified way. Humans are also able to co-relate two or more tables with each other, even when it is not explicitly stated. Machines lack both of these abilities, making it taxing to work directly with tables. This paper proposes an approach to summarize tabular data from PDF documents and convert it to textual content as is better suited for machine consumption. The generated content delivers insights to humans and minimizes redundant efforts. We have tested our hypothesis on financial credit notes with promising results attesting to its applicability in PDF documents having tables of various formats.","PeriodicalId":378325,"journal":{"name":"2021 Grace Hopper Celebration India (GHCI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Template-based NLG for tabular data using BERT\",\"authors\":\"Srushti Gajbhiye, M. Lopes\",\"doi\":\"10.1109/GHCI50508.2021.9514032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the data size growing exponentially, machines need to be well-equipped to understand all kinds of data. Tabular content is preferred over textual content by humans as it presents inter-related data in a simplified way. Humans are also able to co-relate two or more tables with each other, even when it is not explicitly stated. Machines lack both of these abilities, making it taxing to work directly with tables. This paper proposes an approach to summarize tabular data from PDF documents and convert it to textual content as is better suited for machine consumption. The generated content delivers insights to humans and minimizes redundant efforts. We have tested our hypothesis on financial credit notes with promising results attesting to its applicability in PDF documents having tables of various formats.\",\"PeriodicalId\":378325,\"journal\":{\"name\":\"2021 Grace Hopper Celebration India (GHCI)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Grace Hopper Celebration India (GHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GHCI50508.2021.9514032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Grace Hopper Celebration India (GHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GHCI50508.2021.9514032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
With the data size growing exponentially, machines need to be well-equipped to understand all kinds of data. Tabular content is preferred over textual content by humans as it presents inter-related data in a simplified way. Humans are also able to co-relate two or more tables with each other, even when it is not explicitly stated. Machines lack both of these abilities, making it taxing to work directly with tables. This paper proposes an approach to summarize tabular data from PDF documents and convert it to textual content as is better suited for machine consumption. The generated content delivers insights to humans and minimizes redundant efforts. We have tested our hypothesis on financial credit notes with promising results attesting to its applicability in PDF documents having tables of various formats.