MSWordView

A Word 8 converter for Unix

What is it

MSWordView is a program that can understand the microsofts word 8 binary file format (office97), it currently converts word into html, which can then be read with a browser.

MSWordView is being actively worked on, and will be pretty bleeding edge for the next few weeks, bear with me.

Current Features include

Currently Non Supported Features include
I will be working on the unsupported features, but as its already fairly useful, im releasing it. Also it only does word 8, not word 6 and/or word 7, i will be adding word 6 capabilities to it as well, and if i get lucky word 7.

This is to be considered early beta software as theres loads to be done and many bits and bobs to be fixed and supported.

What do you need

Just the source

Web Gateway

Demo mswordview here, dont use this to convert information you wouldnt want me to see, coz if the conversion doesnt work, ill be using the file you convert to try and extend what mswordview can support, which will require me to read it. This script is broken for non ascii languages, mswordview supports them but the utf-8 is getting stripped somewhere in the web interface to it.

More Info

MsWordView used to use laola to break the word file up into its ole streams, but now uses custom c code that is included in the distribution, after that the word specification that microsoft has made available is followed to extract the text and paragraph properties, i.e whether we are in a table or not.

How to Obtain Microsoft Office File Formats

The MS Office file formats (Word, Excel, Powerpoint, Office Binder and Office Drawing) are all freely available from the MS web site provided you are a member of the MS Developer Network (MSDN). Joining MSDN is free to gain access to these specifications

Simply go to the following address:
http://msdn.microsoft.com
From the list on the left of the screen select MSDN library online
If you are not a member of the MS Developer Network you will need to join - it's free.
Once you have subscribed to the MSDN, you can obtain online copies of the file formats. To do this, follow these steps:
1.On the MSDN World Wide Web site, click MSDN Library Online.
2.Under Member Area, click the Library Online tab.
3.Double-click Microsoft Office Development.
4.Double-click Office.
5.Double-click Microsoft Office 97 Binary File Formats.
6.Select the format you are interested in (Word, Excel, Powerpoint, etc.)

There is a definite need for converters for the other msoffice products. In relation to this converter ms office draw is needed, so go out there and work on it.

Other Decoders and related projects

There already exist a few attempts as word converters
laola (originally used by mswordview) includes one called elser, doesnt handle word 8, but can do word 6 and 7
word2x, which is for word 6 and doesnt do fastsaves
catdoc, which doesnt do fastsaves or tables, also for word 6.

all these converters are almost magical in how far they managed to go without access to the microsoft format specification, and their code was terribly useful in figuring out some things

Sun has something which displays word files on screen, though it doesnt print
Corels word processor for linux, has a very good converter for word6/7/8 built in. Its has had a few mistakes in conversion, but unlike current mswordview it retains formatting very very well.
Use wine and the ms 16bit word viewer, heres a howto.
the filters project.
A word macro investigation tool

Download MSWordView

Warning, mswordview no longer outputs to standard output by default

Remember this is a work in progress, its not finished yet and may show bugs.

Known Bugs

i reckon that theres loads of problems with more complex docs, and theres stacks of codes i havent implemented yet, often unknown graphics are spat out, which are incorrect, if the graphic name says unknown then its an unsupported graphic type. Heres my CHANGELOG, keep track of it for news and updates what im working on etc.

Mailing List

an incredibly low volume mailing list for announcements has been set up for mswordview (Aug 24th 1998)
to subscribe send email to mswordview-subscribe@makelist.com
to unsubscribe send email to mswordview-unsubscribe@makelist.com
the address of the list itself is mswordview@makelist.com
the list archive is at http://www.findmail.com/list/mswordview/
Subscribe to mswordview
Enter your e-mail address:
FindMail List Archive
A mailing list hosted by FindMail

What would be nice to get


Skynet Home Page