Challenges in Analyzing Software Documentation in Portuguese

Christoph Treude, C. Prolo, Fernando Marques Figueira Filho
{"title":"Challenges in Analyzing Software Documentation in Portuguese","authors":"Christoph Treude, C. Prolo, Fernando Marques Figueira Filho","doi":"10.1109/SBES.2015.27","DOIUrl":null,"url":null,"abstract":"Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.","PeriodicalId":329313,"journal":{"name":"2015 29th Brazilian Symposium on Software Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 29th Brazilian Symposium on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBES.2015.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.
分析葡萄牙语软件文档的挑战
许多自动分析、总结或转换软件工件的工具依赖于自然语言处理工具来解释由软件开发人员生成的自然语言文本,例如文档、代码注释、提交消息或错误报告。处理由软件开发人员生成的自然语言文本是具有挑战性的,因为在其他文本中没有发现独特的特征,例如代码术语的存在和不完整句子的系统使用。此外,由于许多关键字和编程概念都是用英文名称指代的,所以说葡萄牙语的开发人员编写的文本会混合多种语言。在本文中,我们提供了分析用葡萄牙语编写的软件工件的挑战的经验见解。我们使用两个葡萄牙语工具分析了来自Stack Overflow葡萄牙语版本的100个问题标题,并发现了导致很少句子被完全正确标记的多个问题。基于这些结果,我们提出了启发式方法来改进软件开发人员用葡萄牙语生成的自然语言文本的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信