In Defense Of The Open Document Format

Thursday, April 20, 2006 -

George Ou @ blogs.zdnet.com dropped another bomb yesterday, taking a big fat dump all over the Open Document Format. I'm going to disagree with a few points, and agree with a few others. I'll begin with the logic of being safer with an open standard.

Ou's contention that Microsoft's supposed open XML format is safe to use is making many people in the flash industry scream right now. Even though Microsoft has pledged not to pursue any patents they may have for the format (and don't for a minute think that they don't have any), the truth is that in the corporate world, the value of a promise goes up and down with a companies stock price. One only need look to the FAT file system for proof of this. Though the file system is ancient, it is widely used because its a known commodity. It just works. And the specifications are out there for anyone to see and use. But Microsoft owns several patents on the file system, and though for a decade people have assumed it would be safe to use due to Microsoft's inaction, the company woke to the fact that its intellectual property is sitting inside virtually every digital camera ever made, and now after it's slept and allowed so many companies to stray into its trap, it's ready to awaken and eat them all whole.

A wink, a promise, and a smile from Microsoft means absolutely nothing to me, in fact it makes me more than just a little bit paranoid, and rightly so after what it has done with FAT. It only makes sense to use a format that no one person or company has ownership over, and ODF is that format. Nothing more than a highly stylized grouping of XML files sitting inside a ZIP archive, it's as open as open gets when it comes to file formats. Rename any ODF file to have a .zip extension, and the contents are laid bare for you to edit with any text editor you wish, never to have its contents lost or threatened again.

This gives me peace of mind, and given that I don't have Microsoft Office, the idea that ODF is gaining traction only makes me more at ease using the format for my work. In fact, this article was written in OpenOffice.org Writer 2.0, saved to disk in the ODF format, as were my past two articles. Only later will I paste the contents of this document into w.bloggar for a minor amount of HTML markup to make it weblog ready.

OpenOffice in general I find exciting, useful, and beautiful, but terribly engineered. There is little doubt in anyones mind that OpenOffice is terribly bloated. It uses an unacceptable amount of memory, and even after a fresh system reboot, takes longer than 15 seconds on my 2.4ghz P4 to load. The amount of time it takes to load an ODF document is irrelevant, however, as they seem to load instantly. Saving seems just as speedy, so I don't know where the complaint comes from as to the speed of operating on an ODF document. Though I suspect that it is really a silly point to make, since only OpenOffice was judged when it came to operating on ODF files.

When I learned how to write in C, I cut my teeth on helper programs for IRC servers, called services, that run as background daemons and provide functionality to the users and network itself as a sort of collaborative management interface. The services we used at the time used primitive techniques, even by those days standards, using real-time linear sort and search algorithms for single direction lists with between 1,000 and 7,000 records. Searching them when they are small is fast, but they longer they are, the more time it takes to search them.

There are other ways to create lists which auto-sort as you insert items into them, like Binary Trees, and hash lookups. But BTrees are complex to understand, and difficult and time consuming to implement from scratch, and in certain situations where the lists are fairly small, no noticeable speedup will result. Using the slower by simpler method is tailoring your design to your environment, something Microsoft has been able to do very efficiently for a long time. They know what they are doing.

Microsoft has only their platform to worry about when they write software, and like the IRC service written to handle only a few hundred people, that's plenty fine when you know thats the only environment your software will be operating in. You can leverage every aspect of that environment to your advantage, and in the case of Microsoft Office, that means slaving some things off to the operating system, or scaling down necessary components to do only what you need, rather than everything they are capable of.

It's not always wrong or incorrect to do this, as its certainly worked for Microsoft for some time. But OpenOffice didn't have this luxury, unless it was going to be a Linux-only application. It's an extraneousness constraint, unfair, but one that must be dealt with nonetheless.

At the same time, the argument that OpenOffice must load a number of cross-platform libraries that competitor Microsoft Office does not, feels like an excuse. There are any number of cross-platform applications in the same boat that don't take anywhere near that mount of time to load and operate. It seems like the group that wrote StarOffice, OO's predecessor, while writing a very functional replacement for Microsoft Office, went the smart but unpopular route.

I believe one of the problems with the speed at which OpenOffice works with the ODF format may be it's XML parser. XML parsers are not easy to write, and the fully-complaint ones tend to be large and slow. That's just the nature of XML, it has no static format, it's designed with the purpose of never being locked it. It can always be changing to wrap around and describe the data when the data is changing. That results in many logic branches for a parser that eat up memory and take time to sort through. Microsoft has been a well known offender of standards, going for speed first. If you removed the XML parser in Microsoft Office and OpenOffice and compared them side-by-side only for the purpose of parsing XML, I would not be surprised if you found that OpenOffice's parser is something you could use as a drop-in replacement in virtually any application that needs one. The Microsoft Parser would probably choke hard and die on something it wasn't expecting. It's the deal where working only within your environment can help you, but it only helps you, and in the big picture, all you really did was cheat.

OpenOffice needs a serious audit and release with the sole purpose of optimization, because every day I open OOw and have to stare at its logo for 15+ seconds before I can get to work, thats another day I spent 15+ seconds wishing I had something faster. The memory usage is ridiculous, and probably means an entire rewrite of the memory management system, but it really needs to happen. It would be an investment in OpenOffice's future, and what a fine investment it would be. Maybe one day it'll be 15 seconds too many, and I'll just go somewhere else.

Technorati tags: OpenOffice, Programming, ODF, Open Document Format, OASIS

Like this post? Subscribe to RSS, or get daily emails:

Got something to say? Post a Comment. Got a question or a tip? Send it to me. If all else fails, you can return to the home page.