Reasons not to document data formats
I tend to rant and rave about open data formats. Well -- some folks think it's ranting and raving. Others (including a growing number of governments around the world) feel it's actually an important subject.
Over the years, I've talked to a variety of top executives in the CAD world about their reasons for not documenting their data formats. Though I'm not going to embarrass anyone by attributing their comments, here are some of the reasons I've been given:
- We don't actually know what's in our format. (How's that for honesty?)
- It's all defined in our source code somewhere, but we never bothered to write it down anywhere.
- It would cost too much to document it.
- We're not investing anything in that file format anymore.
- If we documented it, it would prevent us from changing it in the future.
- We never thought about it before you mentioned it.
- That would take a lot of work, and we don't have the time.
- The programmers don't like doing documentation.
- Nobody cares.
Here are few reasons that have been given to me by people other than top CAD executives (these reasons seem to come without the benefit of any actual background in the matter):
- The data format is the CAD company's work product.
- The data format is the CAD company's intellectual property.
- The data format is a trade secret.
- The data format is proprietary.
Here's what I think is interesting about all this: None of the reasons in the first group have any meaning, when considered in the light of customer needs. And none of the reasons in the second group seem to have any basis in law. (The last one, with the word "proprietary", is really slippery: the word doesn't actually mean anything, other than "it's not something we want to tell anyone.")
The one reason that no CAD executive has ever uttered to me is this:
- By not publishing our native data format, we can lock-in our customers, and lock-out our competitors.
That's actually a good reason. Yet, were a CAD vendor to say this, it would almost certainly upset their customers.
If you can think of any other good reasons not to document data formats, I'd be interested in hearing them. (By the way, if you're going to talk about patents or copyrights, you need to do some more research: patents are documented to start with, and data formats are not subject to copyright protection in any jurisdiction that I'm aware of.)


Reader Comments (2)
You can read a detailed explanation of my opinion at my blog at www.deelip.com (Should Autodesk keep the DWG format a secret?).
In short, if a file format is documented, you run the risk of having irresponsible people writing bad software which creats bad files, which then is bound to cause all sorts of problems for everyone.
I appreciate your work with the Open Design Alliance and I believe that you are part of the group of people who understand the importance of writing good file I/O software (especially given your limitation that the DWG file format has not been documented by Autodesk). But there is the another group of people who do not share your level of commitment. Believe me, I know they exist because I have had to deal with the output of their software.
Autodesk has documented the DXF file format and in my lifetime I have seen so many badly written DXF files. While I do not dispute your contention that companies use proprietary file formats to "lock-in customers and lock-out competitors", I believe that in doing so they may be unknowingly doing themselves a favor.
-- Randall Newton