HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It does not support the new Word 2007 .docx file format, which is not OLE2 based. HWPF is still in early development. It is in the scratchpad section of the SVN. You will need to ensure you either have a recent SVN checkout, or a recent SVN nightly build (including the scratchpad jar!)

Source in the org.apache.poi.hwpf.model tree is the old legacy code refactored into an object model. Source code in the org.apache.poi.hwpf.extractor tree is a wrapper of this to facilitate easy extraction of interesting things (eg the Text). Source code in the org.apache.poi.hdf tree is the old legacy code.

At the moment we unfortunately do not have someone taking care for HWPF and fostering its development. What we need is someone to stand up, take this thing under his hood as his baby and push it forward. Ryan Ackley, who put a lot of effort into HWPF, is no longer on board, so HWPF is an orphan child waiting to be adopted.

If you are interested in becoming the new HWPF pointman, you should look into the Microsoft Word internals. A good starting point seems to be Ryan Ackley’s overview. This document contains a link to a detailled Word format description you can find somewhere at http://www.wotsit.org/. Please do not contact Ryan Ackley directly, because he is working for a company now that signed a NDA with Microsoft and thus he will be no longer able to answer questions.

Download pdf Apache POI - HWPF - Java API to Handle Microsoft Word Files