Roberto Basili, D. Lopresti, Christoph Ringlstetter, Shourya Roy, K. Schulz, L. V. Subramaniam
{"title":"噪声非结构化文本数据(AND)分析第四届研讨会总结","authors":"Roberto Basili, D. Lopresti, Christoph Ringlstetter, Shourya Roy, K. Schulz, L. V. Subramaniam","doi":"10.1145/1871437.1871788","DOIUrl":null,"url":null,"abstract":"Noisy unstructured text data is ubiquitous in real-world communication. Natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, microblogs, wikis, chatlogs and web pages), contact centers (complaints, emails, call transcriptions, message summaries), and mobile phones (SMS) is often noisy – contains spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuation, missing case information and special characters. Informal communications are not the only source of noisy text; Text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured scene images, are also noisy.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Summary of the 4th workshop on analytics for noisy unstructured text data (AND)\",\"authors\":\"Roberto Basili, D. Lopresti, Christoph Ringlstetter, Shourya Roy, K. Schulz, L. V. Subramaniam\",\"doi\":\"10.1145/1871437.1871788\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Noisy unstructured text data is ubiquitous in real-world communication. Natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, microblogs, wikis, chatlogs and web pages), contact centers (complaints, emails, call transcriptions, message summaries), and mobile phones (SMS) is often noisy – contains spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuation, missing case information and special characters. Informal communications are not the only source of noisy text; Text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured scene images, are also noisy.\",\"PeriodicalId\":310611,\"journal\":{\"name\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1871437.1871788\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871437.1871788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Summary of the 4th workshop on analytics for noisy unstructured text data (AND)
Noisy unstructured text data is ubiquitous in real-world communication. Natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, microblogs, wikis, chatlogs and web pages), contact centers (complaints, emails, call transcriptions, message summaries), and mobile phones (SMS) is often noisy – contains spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuation, missing case information and special characters. Informal communications are not the only source of noisy text; Text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured scene images, are also noisy.