Documents as Data, Data as Documents: What we learned about Semi-Structured Information for our Open World of Cloud & Devices

Proceedings of the 2015 ACM Symposium on Document Engineering Pub Date : 2015-09-08 DOI:10.1145/2682571.2797070

J. Paoli

{"title":"Documents as Data, Data as Documents: What we learned about Semi-Structured Information for our Open World of Cloud & Devices","authors":"J. Paoli","doi":"10.1145/2682571.2797070","DOIUrl":null,"url":null,"abstract":"Many of us always believed in a unique vision unifying documents and data through semantically-rich semi-structured information. This vision is even more critical today in our open interconnected world of Clouds and Devices. The last 20 years represents a real-life worldwide experiment in this area that fueled a massive set of market applications. In this talk, we review the history and trends of a lot of what is enabling today's core interchanges on the internet: from initial research adding document user interfaces to data, to the specification of structured documents, to the generalization of document markup techniques to the wide acceptance of document databases. We will also review our share of historical acronyms such as 'Star', 'Grif', 'OpenDoc', 'WorldWideWeb/Nexus', 'Amaya', 'InfoPath' 'HTML', 'SGML', 'XML', 'JSON', 'YAML', 'Markdown', 'Schema', 'Semantics','MongoDB', 'Hadoop', 'DocumentDB' and many others. We will then turn, cautiously and humbly, to the future and try to guess: what would the world need? And what do we need to think about to make it happen? We truly believe in the potential of the open Internet. We see pieces of information (that we once called \"Diamonds of the Internet\"), being created, shared, re-shaped, re-routed, modified by users or tiny small devices, understood through big data and machine learning, and processed by cloud services. We see the potential of fundamentally designing open platforms connected worldwide. By bridging technologies, we create higher level abstractions and thus more complex organisms (software) that can help everyone. But at the core remains the need for semi-structured open information fundamentally unifying documents and data.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2682571.2797070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many of us always believed in a unique vision unifying documents and data through semantically-rich semi-structured information. This vision is even more critical today in our open interconnected world of Clouds and Devices. The last 20 years represents a real-life worldwide experiment in this area that fueled a massive set of market applications. In this talk, we review the history and trends of a lot of what is enabling today's core interchanges on the internet: from initial research adding document user interfaces to data, to the specification of structured documents, to the generalization of document markup techniques to the wide acceptance of document databases. We will also review our share of historical acronyms such as 'Star', 'Grif', 'OpenDoc', 'WorldWideWeb/Nexus', 'Amaya', 'InfoPath' 'HTML', 'SGML', 'XML', 'JSON', 'YAML', 'Markdown', 'Schema', 'Semantics','MongoDB', 'Hadoop', 'DocumentDB' and many others. We will then turn, cautiously and humbly, to the future and try to guess: what would the world need? And what do we need to think about to make it happen? We truly believe in the potential of the open Internet. We see pieces of information (that we once called "Diamonds of the Internet"), being created, shared, re-shaped, re-routed, modified by users or tiny small devices, understood through big data and machine learning, and processed by cloud services. We see the potential of fundamentally designing open platforms connected worldwide. By bridging technologies, we create higher level abstractions and thus more complex organisms (software) that can help everyone. But at the core remains the need for semi-structured open information fundamentally unifying documents and data.

查看原文本刊更多论文

文档即数据，数据即文档:我们在云和设备的开放世界中对半结构化信息的了解

我们中的许多人始终相信通过语义丰富的半结构化信息统一文档和数据的独特愿景。在今天这个由云和设备组成的开放互联世界中，这一愿景显得尤为重要。在过去的20年里，这一领域在世界范围内进行了一次真实的实验，推动了大量的市场应用。在这次演讲中，我们回顾了许多促成当今互联网核心交换的历史和趋势:从最初的研究向数据添加文档用户界面，到结构化文档的规范，再到文档标记技术的推广，再到文档数据库的广泛接受。我们还将回顾我们的历史首字母缩略词，如“Star”，“Grif”，“OpenDoc”，“WorldWideWeb/Nexus”，“Amaya”，“InfoPath”，“HTML”，“SGML”，“XML”，“JSON”，“YAML”，“Markdown”，“Schema”，“Semantics”，“MongoDB”，“Hadoop”，“DocumentDB”和许多其他人。然后，我们将谨慎而谦卑地转向未来，并尝试猜测:世界需要什么?我们需要考虑些什么来实现它呢?我们坚信开放互联网的潜力。我们看到信息片段(我们曾经称之为“互联网的钻石”)被用户或小型设备创造、共享、重新塑造、重新路由、修改，通过大数据和机器学习被理解，并通过云服务进行处理。我们看到了从根本上设计连接全球的开放平台的潜力。通过桥接技术，我们创建了更高层次的抽象，从而创建了可以帮助每个人的更复杂的组织(软件)。但其核心仍然是需要从根本上统一文档和数据的半结构化开放信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM Symposium on Document Engineering

自引率

0.00%

发文量