ODF in the age of Big Data
On the 25th of March 2015, my son was born and to me it is by far the most important event of the day. Yet the same day the Document Freedom Day was gathering people across the world to celebrate open standards for documents and spread the word against vendor lock-in. This year, two distinct announcements were made that should pave the way for a continued adoption of ODF across the various markets. The first one was the start of the LibreOffice OnLine project, and the other one was Microsoft announcing the availability of ODF export for Microsoft Office 365. If we add to these two news items the commitment of Google to properly support ODF inside Google Drive, things are getting exciting for ODF. I have attempted to explain the advantages of ODF and open standards in this blog for many years, like several others have done and are still doing. But when it comes to the benefits of ODF in the cloud and with regard to big data, there has not been, as far as I can tell, any real attempt at articulating the relevance of ODF at this level.
Describing the benefits of ODF can be at times rather tricky as many people are just used to the omnipresent MS Office file formats. They do not ask themselves too many questions, and may not readily perceive some key advantages of ODF. In a nutshell, it is important to explain as a backgrounder the difference between interoperability and compatibility and then move on to address issues such as vendor lock-in, public and transparent standardization processes, licensing and copyright of the norm, etc. One may even rely on the theory of network effects to explain how one proprietary, de facto standard can quickly come to impose itself in any given market to justify the coming of age of ODF. How does all this work for the cloud? Not so well, actually.
Cloud computing, and more specifically software as a service may not render all the points above moot, but it does change the paradigm. Consuming, editing, sharing documents in the cloud change the basic premise that in order for me to exchange content with someone, I need to have the same software and use the same file format. The basic premise does not go away, but becomes one possible use case among others. What rather happens is that I can create a document that I will share inside a group of peers or inside an organization. The file itself is stored in the cloud (public or private) and I will send a link, not a file with a specific file format, to the people I want to share the document with. They in turn can use the link to simply read, or edit, and share back the document. In this scenario there is no need to have high quality implementations of specific file format. The final output tends to become more and more a PDF as all the editing work has already happened and the document logic and formatting itself resides more on the server side that inside a sophisticated file format. So much for document freedom? Not so fast.
ODF and file formats specifications that are open standards cannot do much for cloud architectures or the policy of data usage for organizations or consumers. One may need other types of open standards, corporate and public policy, use cases, etc. Document file formats are expected to remain what they are. But as such they do convey real benefits even in the age of the cloud and the big data. After all, thousands of documents stored in a cloud and formatted in binary blurbs no one has the key for remain exactly as useless as they were without the cloud: useless docs.
With ODF, benefits exist even in the cloud:
- Portability: You not only maintain the ability to use, reuse and archive your documents, you ensure that the portability of your existing and future documents across the cloud services and proivders you rely on today and will rely on tomorrow.
- Reusability/data mining and intelligence: Because its specifications are published and its IPR allowing any kind of modification and implementation, ODF documents are a sound base for what’s often referred to as big data operations, such as data mining, extracting sense and value out f metadata inside and outside organizations.
- Interoperability: This one may sound obvious, but it is not. There is a number of people out there who will readily tell anyone who cares to listen that office file formats (any of them) are on the verge of extinction, since you can do anything you need to do through software or software as a service that has html and pdf as their output. This is of course true. Oddly enough, I have yet to see a cross platform word processor or, spreadsheet or presentation software, or even a text editor that would offer the same ability foranyoneto edit html files that are complex, such as tables with pivot tables and working formulas or complex looking text documents. This may sound ludicrous, but the only tools doing this to this day are office suites that will have you compose your document first and then export it to html, with more or less integrity as to the formatting. Nothing else exists -and yes I’m aware of LaTeX, but this is about tools foreveryone. My point here is that ODF is essentially a compressed archive of xml files and additional contents (images, fonts), just like several other formats of its generation. Zipping several xml files together and keeping both the integrity of the content and its presentation is something that can be achievable on a server, and as such could enable ODF to become one of the pivot formats for documents and data in the cloud.
- Auditability /control over your data: Because ODF documents have their specifications published and known, it is always possible to check their integrity and audit them while being stored on the cloud or “offloaded” from it. It is useful if you want to ensure that no one tinkered with your documents and if you want to avoid building a walled garden with the data you own and control but which only one or a handful of vendors own the real key.
One may notice that the points listed above loosely match the main points usually mentioned when discussing the benefits of ODF in the more standard settings of the desktop. This is not surprising, but it was not necessarily intended; if anything this is a testimony to the value of a standard like ODF and its importance. The key point here is that when it comes to the cloud and big data, ODF is both a factor of transparency and innovation. This is something worth promoting and is a potential path to renewed success of ODF in the future.
Leave a Reply