Tutorial: Jacob & Microsoft Word

@for Developers
@author Kai Ruhl
@since 2003-06

"Am I the only person on this planet who wants to write MS Word files with Java?"
- Me, after researching Jacob, POI, WordBean and others, all to no informational avail.

Introduction

Jacob is a Java/COM bridge provided by Dan Adler under a semi GPL license (may not be used in a commercial product targetted at Java developers, e.g. virtual machines, debuggers. The chance that you are not allowed to use it is very slim).

There is no documentation available concerning the practical use of any Microsoft applications; it is, so Adler, intended as a generic Java/COM bridge and not some MS Office API. However, M. Bigatti made a FAQ, which IMHO is not too useful when it comes to MS Word; and there is a Jacob Mailing List, where I got most of my information, even if it was tedious work.

Now, this is a tutorial entirely dedicated to the handling of Microsoft Word with Jacob. If you want Excel stuff, I would rather recommend POI, hosted at the Apache Foundation; they have good Excel support, but only word scratchpad stuff. If you just need to insert some unformatted text, an easier solution is the WordBean by Müller&Stein.

A good alternative to using Jacob may be Jawin, which follows exactly the same goal, namely dispatching calls to COM objects.

Authors

This document is far from complete; I am always open for suggestions, tips and any enhancements. If you know something, please tell me. The absence of another site like this, in contrast to all the questions on JDC Search and Jacob Mailing list, imply that my page will be of some usability. My mail adress is kain at the above domain.

Authors are so far Kai Ruhl.

Update 2006-04: I had a nice email exchange with a guy named Jean Helou; he summarised his experiences with Jacob in a wiki documentation: it contains a section on macros, and is based on ms word xp. Also, he provided me with a link to the useful MS Office object model documentation.

Update 2006-08: A nice girl named Kathrin Eichler emailed me a section on hyperlinks; it is included below. She is using Office XP. Thanks Kathrin!

Update 2016-02: A friendly guy named Igor Kitsa emailed me sections on tables and enumerations, and also improved ways to work almost exclusively with ActiveXComponent objects instead of Dispatch calls. Thanks Igor!

Scenario

I want to be very specific here, so I will only describe one solution. This is the following:

I have a document file_in.doc which is my "template"; actually it is a complete document which needs to be enhanced by text, enumerations, and tables.

This file is processed by my Java program and written to file_out.doc. Both files are in c:\java\jacob.

1 Preparations

You need to have two files: jacob.jar and jacob.dll. You put the former in your classpath and the latter in c:\windows\system32 or your equivalent. I tested Jacob on Windows XP with MS Office 2010 (also, in an earlier version 1, on Windows 98 and XP with MS Office 97, but without tables and enumerations).

Then, I assume you create a new Java class, make a new main(String[] asArgs) method and are at its beginning.

2 Lets Play

First, I will create some variables; you can change them almost arbitrarily. They are pretty self explaining.

    String sDir = "c:\\java\\jacob\\";
    String sInputDoc = sDir + "file_in.doc";
    String sOutputDoc = sDir + "file_out.doc";
    String sOldText = "[label:import:1]";
    String sNewText = "I am some horribly long sentence, so long that [insert bullshit here]";
    boolean tVisible = true;
    boolean tSaveOnExit = false;

sOldText holds the label that I will search and replace. tVisible is only true for debugging purposes, to see whats going on. tSaveOnExit is false since I save explicitly.

Now, we will open word and read the document as well as some base variables.

    ActiveXComponent oWord = new ActiveXComponent("Word.Application");
    oWord.setProperty("Visible", tVisible);
    ActiveXComponent oDocuments = oWord.getPropertyAsComponent("Documents");
    ActiveXComponent oDocument = oDocuments.invokeGetComponent("Open", new Variant(sInputDoc)); 
    ActiveXComponent oSelection = oWord.getPropertyAsComponent("Selection");
    ActiveXComponent oFind = oSelection.getPropertyAsComponent("Find");

Run this. It should open word, but dont do something cool.

oDocuments holds the list of documents. oDocument holds our specific document file_in.doc. oSelection and oFind are objects we need for the next step, selecting and inserting.

    oFind.setProperty("Text", sOldText);
    oFind.invoke("Execute");
    oSelection.setProperty("Text", sNewText);

Now we search for sOldText, execute the search (which results in the label being selected inside Word), and replace that selection with the new text (which, in turn, is also selected).

So next, we leave that select stuff.

    oSelection.invoke("MoveDown");
    oSelection.setProperty("Text", "\nSo we got the next line including BR.\n");

We move the cursor down, effectively leaving the selection (yes, it works just like a VB macro inside Word; works also with MoveUp, MoveLeft, MoveRight). Then, we insert other text.

Now we want to format text. Here, we operate with selected text, and make the format afterwards (unto the selected text, not unto the next-to-be-typed text).

    ActiveXComponent oFont = oSelection.getPropertyAsComponent("Font");
    oFont.setProperty("Bold", "1");
    oFont.setProperty("Italic", "1");
    oFont.setProperty("Underline", "0");

Now the selected text (the "\nSo we got ... BR.\n") is both bold and italic.

    ActiveXComponent oAlign = oSelection.getPropertyAsComponent("ParagraphFormat");
    oAlign.setProperty("Alignment", "3");

And now the alignment is block (0 - Left, 1 - Center, 2 - Right, 3 - Block; at least I hope so ;-). For now, this is the minimal thing that can be useful for you. Using the MoveDown and Text directives you can do the basics.

3 Save and Close

Well, there were a lot of suggestions on the mailing list, but that one worked for me.

    ActiveXComponent oWordBasic = oWord.getPropertyAsComponent("WordBasic");
    oWordBasic.invoke("FileSaveAs", sOutputDoc);

Dont ask me why. It just works.

    oDocument.invoke("Close", tSaveOnExit);
    oWord.invoke("Quit", 0);

This is straightforward. No sweat.

4 Images

Yes its possible to embed images pretty easy.

    String sImgFile = sDir + "image.png";
    ActiveXComponent oImage = oSelection.getPropertyAsComponent("InLineShapes");
    oImage.invoke("AddPicture", sImgFile);

Well, it just works the way shown by the mailing list. Dont ask me about the image format (text flow and such) though. Better, if you know it, mail me.

5 Hyperlinks

Hyperlinks are also pretty straightforward (courtesy Kathrin Eichler, under Office XP):

    String sHyperlink = "http://www.google.com";
    oSelection.setProperty("Text", "Text for the link to Google");
    ActiveXComponent oHyperlinks = oDocument.getPropertyAsComponent("Hyperlinks");
    Variant oSelectionRange = oSelection.getProperty("Range");
    oHyperlinks.invoke("Add", oSelectionRange, new Variant(sHyperlink));

The range object is new here: Apparently, you cannot set add the hyperlink over the selection, but on a range over the selection; I have no idea why this is.

6 Tables

Tables are a little more complex (courtesy Igor Kitsa, works under Office 2010):

    final int iRowCount = 3;
    final int iColCount = 4;
    ActiveXComponent oTables = oDocument.getPropertyAsComponent("Tables");
    Variant oSelectionRange = oSelection.getProperty("Range");
    ActiveXComponent oTable = oTables.invokeGetComponent("Add", oSelectionRange, new Variant(iRowCount), new Variant(iColCount));
    oTable.invoke("AutoFormat", 16);

This will create a 4x3 table with autoformat number 16 -- you need to try around to see which one fits your taste. Now you can write to each of the cells:

    ActiveXComponent oTableRange = oTable.getPropertyAsComponent("Range");
    ActiveXComponent oCells = oTableRange.getPropertyAsComponent("Cells");
    int iCellCount = oCells.getPropertyAsInt("Count");
    for (int i = 1; i <= iCellCount; i++) {
        ActiveXComponent oCellItem = oCells.invokeGetComponent("Item", new Variant(i));
        ActiveXComponent oCellRange = oCellItem.getPropertyAsComponent("Range");
        oCellRange.invoke("InsertAfter", String.format("Cell %d", i));
    }

Note that we ask the selection for its range two times: Once as Variant and once as ActiveXComponent. This can be a bit confusing, but hey, it works. And now, if you want to write to a specific cell, say column 1, row 2:

    ActiveXComponent oCell = oTable.invokeGetComponent("Cell", new Variant(1), new Variant(2));
    ActiveXComponent oCellRange = oCell.getPropertyAsComponent("Range");
    oCellRange.invoke("Delete");
    oCellRange.invoke("InsertAfter", "Special (1,2)");

A special case is row-wise access, for example row 2:

    ActiveXComponent oTableRows = oTable.getPropertyAsComponent("Rows");
    ActiveXComponent oRow2 = oTableRows.invokeGetComponent("Item", new Variant(2));
    ActiveXComponent oRow2Cells = oRow2.getPropertyAsComponent("Cells");
    final int iRow2CellCount = oRow2Cells.getPropertyAsInt("Count");
    for (int i = 1; i <= iRow2CellCount; i++) {
        ActiveXComponent oRow2Cell = oRow2Cells.invokeGetComponent("Item", new Variant(i));
        ActiveXComponent oRow2Range = oRow2Cell.getPropertyAsComponent("Range");           
        oRow2Range.invoke("MoveEnd", new Variant(1), new Variant(-1));
        String sText = oRow2Range.getPropertyAsString("Text");
        oRow2Range.invoke("InsertAfter", ", Touch:" + sText);
    }

This is a mix of the "all cells" and "indivdual cell" approaches above.

7 List / Enumeration

Lists can be of the unordered ("bullet") and numbered variant (again, courtesy Igor Kitsa, tested with Office 2010). We start with the bullet list:

    oSelection.invoke("TypeText", "Here is a bullet list.");
    ActiveXComponent oItemRange = oSelection.getPropertyAsComponent("Range");
    ActiveXComponent oListFormat = oItemRange.getPropertyAsComponent("ListFormat");
    oListFormat.invoke("ApplyBulletDefault");
    oSelection.invoke("TypeParagraph");
    oSelection.invoke("TypeText", "This is item one in a bullet list.");
    oSelection.invoke("TypeParagraph");
    oSelection.invoke("TypeText", "This is item two in a bullet list.");
    oListFormat.invoke("RemoveNumbers");
    oSelection.invoke("MoveDown");
    oSelection.invoke("TypeText", "\n");

Again, we get the selection range, here as ActiveXComponent, and proceed to format the list. Then we simply add paragraphs, and the bullets are inserted automatically. At the end, we move down to leave the list.

Creating a numbered list works analogously:

    oSelection.invoke("TypeText", "And now comes a numbered list:\n");
    oItemRange = oSelection.getPropertyAsComponent("Range");
    oListFormat = oItemRange.getPropertyAsComponent("ListFormat");
    oListFormat.invoke("ApplyNumberDefault");
    oSelection.invoke("TypeParagraph");
    oSelection.invoke("TypeText", "This is item one in a numbered list.");
    oSelection.invoke("TypeParagraph");
    oSelection.invoke("TypeText", "This is item two in a numbered list.");
    oListFormat.invoke("RemoveNumbers");
    oSelection.invoke("MoveDown");

The only difference is ApplyNumberDefault instead of ApplyBulletDefault. And that concludes the lists.

Summary

While working word with Jacob is a bit complicated, it still is the mightiest possibility aside from linking some VB to JNI or to an Runtime exe. I hope I could give you an introduction into this theme.

Thanks for reading me.

EOF (Feb:2016)