{"title":"将主题建模技术应用于退化文本:Transición期间的西班牙历史出版社(1977-1982)","authors":"C. G. Figuerola","doi":"10.1145/3284179.3284319","DOIUrl":null,"url":null,"abstract":"Topic modeling techniques are applied in the field of Digital Humanities, specifically wit historical texts some often. However, digitizing documents often produces texts with poor readability. This is the case of the historical press, in which to the degrading of the support must be added the layout, the inclusion of advertisements, illustrations, etc. This paper describes the application of topic modeling to a specific Spanish newspaper with these difficulties; as well as the same application during the same period to another newspaper converted to text manually. The comparison of the results shows consistency between both newspapers","PeriodicalId":370465,"journal":{"name":"Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Applying topic modeling techniques to degraded texts: Spanish historical press during the Transición (1977-1982)\",\"authors\":\"C. G. Figuerola\",\"doi\":\"10.1145/3284179.3284319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Topic modeling techniques are applied in the field of Digital Humanities, specifically wit historical texts some often. However, digitizing documents often produces texts with poor readability. This is the case of the historical press, in which to the degrading of the support must be added the layout, the inclusion of advertisements, illustrations, etc. This paper describes the application of topic modeling to a specific Spanish newspaper with these difficulties; as well as the same application during the same period to another newspaper converted to text manually. The comparison of the results shows consistency between both newspapers\",\"PeriodicalId\":370465,\"journal\":{\"name\":\"Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3284179.3284319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3284179.3284319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying topic modeling techniques to degraded texts: Spanish historical press during the Transición (1977-1982)
Topic modeling techniques are applied in the field of Digital Humanities, specifically wit historical texts some often. However, digitizing documents often produces texts with poor readability. This is the case of the historical press, in which to the degrading of the support must be added the layout, the inclusion of advertisements, illustrations, etc. This paper describes the application of topic modeling to a specific Spanish newspaper with these difficulties; as well as the same application during the same period to another newspaper converted to text manually. The comparison of the results shows consistency between both newspapers