Jinqiu Yang, Erik Wittern, Annie T. T. Ying, Julian T Dolby, Lin Tan
{"title":"从文档中提取Web API规范","authors":"Jinqiu Yang, Erik Wittern, Annie T. T. Ying, Julian T Dolby, Lin Tan","doi":"10.1145/3196398.3196411","DOIUrl":null,"url":null,"abstract":"Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily rely on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting significant parts of such specifications from web API documentation pages. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods – collectively describing the endpoints of the API. We evaluate whether D2Spec can accurately extract endpoints from documentation on 116 web APIs. The results show that D2Spec achieves a precision of 87.1% in identifying base URLs, a precision of 80.3% and a recall of 80.9% in generating path templates, and a precision of 83.8% and a recall of 77.2% in extracting HTTP methods. In addition, in an evaluation on 64 APIs with pre-existing API specifications, D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. API consumers would benefit from D2Spec pointing them to, and allowing them thus to fix, such inconsistencies.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"20 7-8 1","pages":"454-464"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Towards Extracting Web API Specifications from Documentation\",\"authors\":\"Jinqiu Yang, Erik Wittern, Annie T. T. Ying, Julian T Dolby, Lin Tan\",\"doi\":\"10.1145/3196398.3196411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily rely on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting significant parts of such specifications from web API documentation pages. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods – collectively describing the endpoints of the API. We evaluate whether D2Spec can accurately extract endpoints from documentation on 116 web APIs. The results show that D2Spec achieves a precision of 87.1% in identifying base URLs, a precision of 80.3% and a recall of 80.9% in generating path templates, and a precision of 83.8% and a recall of 77.2% in extracting HTTP methods. In addition, in an evaluation on 64 APIs with pre-existing API specifications, D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. API consumers would benefit from D2Spec pointing them to, and allowing them thus to fix, such inconsistencies.\",\"PeriodicalId\":6639,\"journal\":{\"name\":\"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)\",\"volume\":\"20 7-8 1\",\"pages\":\"454-464\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3196398.3196411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196398.3196411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
摘要
Web API规范是机器可读的API描述。这些规范与相关工具相结合,简化并支持api的使用。然而,尽管web api的分布越来越多,规范却很少,它们的创建和维护严重依赖于第三方的手工工作。在本文中,我们提出了一种自动方法和一个名为D2Spec的相关工具,用于从web API文档页面中提取此类规范的重要部分。给定一个API的种子在线文档页面,D2Spec首先爬取API上的所有文档页面,然后使用一组机器学习技术提取基本URL、路径模板和HTTP方法——它们共同描述API的端点。我们评估了D2Spec是否能够准确地从116个web api的文档中提取端点。结果表明,D2Spec在识别基础url方面的准确率为87.1%,在生成路径模板方面的准确率为80.3%,召回率为80.9%,在提取HTTP方法方面的准确率为83.8%,召回率为77.2%。此外,在对已有API规范的64个API的评估中,D2Spec揭示了web API文档与其相应的公开可用规范之间的许多不一致之处。API消费者将受益于D2Spec向他们指出并允许他们修复这种不一致。
Towards Extracting Web API Specifications from Documentation
Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily rely on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting significant parts of such specifications from web API documentation pages. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods – collectively describing the endpoints of the API. We evaluate whether D2Spec can accurately extract endpoints from documentation on 116 web APIs. The results show that D2Spec achieves a precision of 87.1% in identifying base URLs, a precision of 80.3% and a recall of 80.9% in generating path templates, and a precision of 83.8% and a recall of 77.2% in extracting HTTP methods. In addition, in an evaluation on 64 APIs with pre-existing API specifications, D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. API consumers would benefit from D2Spec pointing them to, and allowing them thus to fix, such inconsistencies.