Apache POI Word - Create Document
Overview
In this article, I will explain how to create and save a blank Word document. Additionally, this article will reveal the storage secrets behind DOCX documents, allowing you to understand the true nature of DOCX documents.
Why favor XWPF over HWPF
In fact, POI provides two sets of APIs for handling Word documents:
HWPF: It stands for "Horrible Word Processor format." The term "horrible" here is used metaphorically, suggesting that it may not be the most pleasant or user-friendly API. This set of APIs is used for generating Word documents in the format used by Word 95 and earlier versions. The file extension for these documents is typically .doc.
XWPF: It stands for "XML Word Processor Format." This set of APIs is used for generating Word documents in the format used by Word 2007 and later versions. The file extension for these documents is typically .docx.
Create document
Creating a blank document is straightforward. Simply create an XWPFDocument object and then use the write method to save it to a local path.
XWPFDocument document = new XWPFDocument();
document.write(new FileOutputStream("D:\\tmp\\simpledoc.docx"));
document.close();
To emphasize the main point, the code above does not follow best practices. Normally, it is recommended to use try-with-resources to properly release resources.
package net.verytools.tutorial;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import java.io.FileOutputStream;
import java.io.IOException;
public class CreateDoc {
public static void main(String[] args) throws IOException {
try (XWPFDocument doc = new XWPFDocument()) {
try (FileOutputStream out = new FileOutputStream("D:\\tmp\\simpledoc.docx")) {
doc.write(out);
}
}
}
}
A blank document is not really blank
The generated document is located at the path 'D:\tmp\simpledoc.docx'. When you open this document, you will see that it is completely empty, with no content. However, we know that a document can contain not only text but also images, videos, and more. So, how does a Word document store these types of content?
In reality, the .docx extension is just a disguise. Let's change the extension from .docx to .zip and unzip the document we just generated. Then, we can use Visual Studio Code (VSCode) to explore the contents of the extracted folder and uncover the secrets.
As you can see, a seemingly blank document is actually composed of a bunch of XML files. Among them, the document.xml file is the core and contains the main content of the document. However, since the document is currently blank, the <w:body/> tag within this document contains nothing.
What is docx
In fact, a .docx file is essentially a compressed package that contains a bunch of XML files and media files.
Conclusion
This article explains how to use the POI API to create and save documents, and reveals the storage secrets of docx files. Next, we will gradually explore paragraphs and other content in the document. Click here to proceed to the next article: Apache POI Word - Paragraphs.
Notice:Feedback requires logging into the system first.