OpenDocument package¶
General¶
OpenDocument defines a package file to store the XML content of a document as separate parts together with associated binary data as file entries in a single package file. These file entries may be compressed to further reduce the storage taken by the package. This package is a Zip file [ZIP], whose structure is described at the end of this document. OpenDocument Packages impose additional structure on the Zip file to accomplish the representation of OpenDocument Format documents.
A document within a package may consist of a set of files creating a unit, for instance the set of files specified by OpenDocument (settings.xml, meta.xml, content.xml, styles.xml). These files may be located in the root of the package, or within a directory. If they are contained in the root of the package, they are called document. If they are located within a directory, the document they constitute is called a sub document. A package may contain multiple sub documents, but only a single document can be contained in the root of the package. Unless otherwise stated, the term document refers to the document contained in the root of the package. This may include sub documents.
When an OpenDocument document is represented as a package, there are four root elements, <office:document-content>, <office:document-styles>, <office:document-meta>, and, <office:document-settings>, each stored as a separate file.
A package may also contain image files, embedded objects and implementation-dependent files.
OpenDocument package¶
An OpenDocument Package shall meet the following requirements:
A)It shall be a Zip file, as defined by [ZIP]. All files contained in the Zip file shall be non compressed (STORED) or compressed using the “deflate” (DEFLATED) algorithm.
B)It shall contain a file “META-INF/manifest.xml”. This file shall meet the following requirements:
B.1)The file shall be a well formed XML document in accordance with the [XML1.0] specification.
B.2)The XML root element of the file shall be a <manifest:manifest> element.
B.3)The XML file shall be valid with respect to the manifest schema defined in http://docs.oasis-open.org/office/v1.2/cos01/OpenDocument-v1.2-cos01-manifest-schema.rng
C)It should contain a file “mimetype”.
-D)It may contain files whose relative paths begin with “META-INF/” and whose names contain the string “signatures”. These file shall meet the following requirements:
D.1)The files shall be well-formed XML files in accordance with [XML1.0].
D.2)The XML root element of each file shall be a <dsig:document-signatures> element.
D.3)The files shall be valid with respect to the digital signature schema defined in appendix A.2 OpenDocument Digital Signature Schema.-
E)It shall not contain other files whose relative path begins with “META-INF/” other than than those listed in B) and D).
F)The files listed in (B) and (D) meet the following requirements:
1. F.1)They shall be namespace-well-formed with regard to the XML Namespaces specification [xml-names].
2. F.2)They shall conform to the xml-id specification [XML-ID].
Manifest¶
All OpenDocument packages shall contain a file named “META-INF/manifest.xml”. This file is the OpenDocument package manifest. The manifest provides :- A list of all of the files in the package (except those specifically excluded from the manifest).
- The MIME media type of each file in the package.
- If a file is stored in the file data in encrypted form, the manifest provides information required to decrypt the file correctly when the encryption key is also supplied.
The format of the manifest file is specified http://docs.oasis-open.org/office/v1.2/cos01/OpenDocument-v1.2-cos01-manifest-schema.rng.
For all files contained in a package, with exception of the “mimetype” file and files whose relative path starts with “META-INF/”, the “META-INF/manifest.xml” file shall contain exactly one <manifest:file-entry> element whose manifest:full-path attribute's value references the file.
The “META-INF/manifest.xml” file need not contain <manifest:file-entry> elements whose manifest:full-path attribute references files whose relative path start with "META-INF/". The file shall not contain <manifest:file-entry> elements whose manifest:full-path attribute value references the “META-INF/manifest.xml” file itself or the “mimetype” file.
The “META-INF/manifest.xml” file should contain a <manifest:file-entry> element whose manifest:full-path attribute has the value "/". This element specifies information regarding the document stored in the root of the package. This entry shall exist if the package contains a file "mimetype"
MimeType¶
If a MIME media type for a document exists, then an OpenDocument package should contain a file with name “mimetype”. The content of this file shall be the ASCII encoded MIME media type associated with the document.The “mimetype” file shall be the first file of the zip file. It shall not be compressed, and it shall not use an 'extra field' in its header.
If the file named “META-INF/manifest.xml” contains a <manifest:file-entry> element whose manifest:full-path attribute has the value "/", then a "mimetype" file shall exist, and the content of the “mimetype” file shall be equal to the value of the manifest:media-type attribute of that element.
Note: The purpose is to allow the type of document represented by the package to be discovered through 'magic number' mechanisms, such as Unix's file/magic utility. If a Zip file contains a file at the beginning of the file that is uncompressed, and has no extra data in the header, then its file name and data can be found at fixed positions from the beginning of the package. More specifically, one will find:
- the string 'PK' at position 0 of all zip files
- the string 'mimetype' beginning at position 30
- the media type itself beginning at position 38.
Metadata : Rdf¶
Metadata for documents contained in an OpenDocument package may be expressed using the model of the W3C Resource Description Framework [RDF-CONCEPTS].
A document or sub document that is stored in a package may contain any number of metadata files. The content of a metadata files shall conform to the [RDF-XML] specification. Implementations that are consumers as well as producers should preserve all metadata files.
All metadata files of a document or sub document shall be listed in a separate metadata manifest file, which has the file name manifest.rdf. This file enumerates metadata files and their relationships to other files in an OpenDocument package.
In addition to metadata files, the "manifest.rdf" file may list other files which are contained in the document or sub document that contain RDF metadata, like files that contain RDFa metadata. The "manifest.rdf" file need not exist if a document or sub document does not contain any files that contain RDF metadata.
All references to a resource within the same package that occur within metadata file shall be represented by relative IRIs to the resource. This includes values of rdf:about attributes occurring within metadata files or metadata manifest files.
Schéma RDF : http://docs.oasis-open.org/office/v1.2/cos01/OpenDocument-v1.2-cos01-package-metadata.owl
Preview image¶
Unless a document is encrypted, package producers should generate a preview image of the document that is contained in the package. It should be a representation of the first page, first sheet, etc. of the document. For maximum re-usability of the preview images they shall be generated without any effects, surrounding frames, or borders.
Note: Such effects might interfere with effects added to the preview images by the different file system explorers or may not be desired at all for certain use cases.
The preview image shall be contained in a file named “Thumbnails/thumbnail.png”.
Preview images shall be saved in [PNG] format.
Note: Current desktops display preview images within squares of up to 256 pixel width and height, and 24 bit per pixel. While this specification does not define upper or lower limits for preview image sizes, producers should only use image sizes that are displayed with a reasonable quality if scaled to fit into 256x256 pixel square.
Encrypted documents are intended to be unreadable for unauthorized users and package producers shall not generate preview images for such documents. They may include a preview image that is independent of the contents of the document. Such preview images should not be encrypted.
Zip File Structure¶
A Zip file starts with a sequence of files, each of which can be compressed or stored in raw format. Each file has a local header immediately before its data, which contains most of the information about the file, including time-stamps, compression method and file name. The compressed file contents immediately follow, and are terminated by an optional data descriptor. The data descriptor contains the CRC and compressed size of the file, which are frequently not available when writing the local file header. If these details were included, the data descriptor can be skipped.
Each file in the archive is laid down sequentially in this format, followed by a central directory at the end of the Zip archive. The central directory is a contiguous set of directory entries, each of which contains all the information in the local file header, plus extras such as file comments and attributes. Most importantly, the central directory contains pointers to the position of each file in the archive for navigation of the Zip file.