Copyright © 2007 by ____. All rights reserved.
The Generalized Container Format (GCF) is an open specification for encapsulating one or more Digital Renditions of one or more Publications into a single, compressed file (hereafter referred to as the “Container”.) The Container is a convenient mechanism for the digital storage, transmission, and distribution of Publications of all kinds — it is quite generic, not specific to any particular Digital Rendition format. Nor is the Container limited to only text-based publications — it may be used for other types of content such as audio and video.
The Container also permits the encryption of component files by Digital Rights Management (DRM) systems which control user access to the content contained within a Container. This specification does not specify particular DRM technologies or systems which may be applied to the Container, but rather specifies the encryption mechanism which such systems may utilize.
This specification is a generalization of the IDPF OEBPS Container Format 1.0 Specification (IDPF/OCF). The IDPF/OCF describes a ZIP-based format for encapsulating an OEBPS Publication plus optional renditions of that OEBPS Publication in other formats.
The primary difference between GCF and IDPF/OCF is that this specification does not require a Container to include an OEBPS Publication. The IDPF/OCF requirement for an OEBPS Publication is suitable for its designed application, that of a transport and archive format for OEBPS publications, but may limit the options available to publishers and content creators.
In addition, it is possible for a Container meeting this specification to include multiple, independent Publications, with one or more Digital Renditions for each Publication. This allows publishers and creators greater flexibility in digital distribution of their Publications to users.
One fundamental design goal for GCF is that any conforming IDPF/OCF Container also conforms to this specification, a goal which has been achieved. It is strongly recommended, however, that IDPF/OCF Containers minimally include the container-level identifier gcf:id per the requirements of this specification — using any of the GCF Namespace attributes in META-INF/container.xml does not break conformity to the IDPF/OCF Specification.
The normative edition of this specification is the XHTML 1.1 document located at (To be added).
Other formatted editions may be offered besides the normative edition, but they will not be considered normative.
Following are the more important terms used in this Specification which require precise definition:
(More might need to be added)
When used unqualified in this specification, Container refers to a ZIP file conforming to the purpose and restrictions of this specification.
A Digital Rendition is the “Manifestation” (see FRBR) or embodiment of a Publication in some recognized digital format or framework, such as, for example, HTML web pages, OEBPS, OpenReader, PDF, Microsoft LIT, JPEG, MP3, MOV, etc. A Digital Rendition may even be another GCF Container.
The IDPF OEBPS Container Format 1.0 Specification. IDPF/OCF describes a ZIP-based format for encapsulating an OEBPS Publication plus optional renditions of that OEBPS Publication in other formats.
When used unqualified in this specification, Publication is equivalent to the FRBR “Expression”: “The specific intellectual or artistic form that a Work takes each time it is ‘realized.’”
To illustrate this, the following is a Publication according to this specification: “The Adventures of Tom Sawyer, by Mark Twain. Annotated and Edited Edition published by Acme Press, 2007.”
Thus, a Publication has a high degree of specificity, yet is still a non-embodied, abstract entity.
A Publication may be “materially” manifested (or embodied) in one or more ways (the FRBR “Manifestation”): physical (such as printed paperback books, chiseled on stone tablets, recorded on vinyl long-play records, etc.) and in digital electronic form or Digital Rendition (e.g., HTML web pages, OEBPS, OpenReader, PDF, Microsoft LIT, JPEG, MP3, MOV, etc.)
The following key words (“imperatives”) are used in this specification to denote requirement level consistent with RFC 2119:
To aid in readability and understandability, special text highlighting conventions are used in this specification (in addition to ordinary text emphasis) to emphasize important items.
The requirement level imperatives described in Section 2.3 are highlighted based on three basic imperative levels: required, recommended, and optional.
The normative XHTML 1.1 edition of this specification includes special markup for every mention of elements, attributes, attribute values, and other related code. This allows special highlighting to be applied by CSS to these markup constructs during presentation so they may be more easily recognized.
Since the normative edition of this specification may be rendered with different CSS style sheets, converted into other formats, rendered on visually limited hardware, or presented with text-to-speech engines, some or all of this highlighting may become lost or unrecognizable. Care has been taken to assure that, in the absence of highlighting, every mention of these markup constructs will be clear and unambiguous.
Highlighting appearing in this specification:
Element (required): container
Attribute (optional): gcf:id
Attribute Value (whole or fragment): urn:isbn:0-395-36341-1
Other Code: META-INF/container.xml
Ordinary Hypertext Link: OEBPS Container Format 1.0 Specification (OCF)
Word or Term with Definition Link: Digital Rendition
This specification is built upon a wide and stable base of compatible open specifications and standards. Following are the various specifications and standards referenced in some manner by this specification.
International Digital Publishing Forum (IDPF)
Internet Engineering Task Force (IETF)
Internet Assigned Numbers Authority (IANA)
Others
A GCF Container must conform to the IDPF OCF 1.0 Specification (IDPF/OCF), but with the following exceptions and optional additions:
A Container is not required to contain an OEBPS Publication.
The one-line ASCII text file mimetype is not required.
For the required Container document META-INF/container.xml, the following attributes from the GCF namespace may be applied to the root element container:
xmlns:gcf
This is the GCF namespace declaration which is required whenever any of the GCF namespaced attributes described in this specification are present in META-INF/container.xml. It must be given the value of [GCF Namespace URI to be assigned later]
gcf:id
This optional, but strongly recommended attribute assigns the Container Identifier. Refer to Section 3.2 for further requirements, recommendations, and comments regarding this attribute and its value.
[Informative Commentary] The Container Identifier is intended to identify the Container file itself, not the Digital Rendition(s) contained inside. A Container Identifier could be the same as a particular Digital Rendition identifier (if assigned), but this is not advised.
Example:
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
xmlns:gcf="[GCF Namespace URI to be assigned later]"
gcf:id="urn:isbn:978-1-56619-909-4">
For the required Container document META-INF/container.xml, the following attributes from the GCF namespace are optional (but recommended) for the element rootfile (this element is used to specify a Digital Rendition in the Container):
gcf:rendid
This optional (but recommended) attribute assigns the identifier of the associated Digital Rendition. Refer to Section 3.2 for the requirements, recommendations, and comments regarding this attribute and its value.
gcf:pubgroup
This optional (but recommended) attribute assigns the publication group of the associated Digital Rendition. Refer to Section 3.3 for the requirements, recommendations, and comments regarding this attribute and its value.
Example:
<rootfile full-path="tomsawyer.html"
media-type="application/xhtml+xml"
gcf:rendid="urn:uuid:5eda7560-a073-11db-b606-0800200c9a66"
gcf:pubgroup="The Adventures of Tom Sawyer"/>
gcf:id and gcf:rendid Attributes)The attributes gcf:id and gcf:rendid assign the identifiers for the Container and a Digital Rendition, respectively. The attribute value for both attributes is of datatype CDATA. Both attributes are recommended.
The value for each attribute must be drawn from an established, public identifier namespace scheme, and must follow the full syntax specified by that namespace.
Furthermore, if the identifier scheme to be used is either a registered Uniform Resource Name (URN) namespace, or a registered “info” URI namespace, then that namespace and associated syntax must be used.
For example, if the identifier is an ISBN or UUID, it must be assigned per the URN Namespace syntax (see Using ISBN in URN). If the identifier is a Digital Object Identifier (DOI), it must be assigned per the “info” URI Scheme requirements for DOI.
Examples:
gcf:id="urn:isbn:0-395-36341-1" (ISBN, 10 digit) gcf:rendid="urn:isbn:978-1-56619-909-4" (ISBN, 13 digit) gcf:id="urn:uuid:5eda7560-a073-11db-b606-0800200c9a66" (UUID) gcf:rendid="info:doi/10.123/456" (DOI)
It is strongly recommended that UUID (preferably the time-based or version 1 type) be assigned for the Container Identifier and to each Digital Rendition if any do not require or need an identifier assigned by a formal registration authority, such as ISBN. UUID is freely usable and is, practically speaking, globally unique; there are a number of free UUID generators, as well as UUID registration services.
Note that once a Digital Rendition is assigned an identifier, that same identifier should be re-used when the same Digital Rendition appears in a different Container. This allows existing links/references to that Digital Rendition to not be broken, among other benefits.
When the public namespace scheme specification allows any characters in the full identifier to be case-insensitive, lower case is strongly recommended, as the above examples illustrate.
gcf:pubgroup Attribute)As stated in Section 1.2, this specification allows multiple, independent Publications in a Container, and one or more Digital Renditions for each Publication. When more than one Publication is contained in the Container, it is necessary to identify (or group together) the Digital Renditions which represent the same Publication.
This grouping is accomplished using the optional (but recommended) attribute gcf:pubgroup applied to the rootfile element in META-INF/container.xml. The value of this attribute (of datatype CDATA) assigns the associated Digital Rendition to a particular Publication. The Digital Renditions in a Container which have the identical attribute value for gcf:pubgroup are assumed to represent the same Publication.
When all Digital Renditions in a Container represent the same Publication (that is, the Container only contains one Publication), then gcf:pubgroup is not necessary, but recommended.
To avoid ambiguity, if gcf:pubgroup is used at all, it must be applied to all rootfile elements in META-INF/container.xml. If gcf:pubgroup is applied to some but not all rootfile elements, then processors must assume that all Digital Renditions in the Container represent the same Publication and ignore whatever values have been applied to the gcf:pubgroup attribute(s).
The use of gcf:pubgroup is illustrated in the markup example of Section 3.4.
META-INF/container.xml Markup ExampleFollowing is an example META-INF/container.xml XML document which conforms to this specification. Further commentary follows the example.
<?xml version="1.0"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
xmlns:gcf="[GCF Namespace URI to be assigned later]"
gcf:id="urn:isbn:978-1-56619-909-4">
<rootfiles>
<rootfile full-path="HF/huckfinn.pdf"
media-type="application/pdf"
gcf:rendid="urn:uuid:64fa3550-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Huckleberry Finn"/>
<rootfile full-path="HF/huckfinn.lit"
media-type="application/x-ms-reader"
gcf:rendid="urn:uuid:7cfa28e0-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Huckleberry Finn"/>
<rootfile full-path="HF/huckfinn.html"
media-type="application/xhtml+xml"
gcf:rendid="urn:uuid:99495250-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Huckleberry Finn"/>
<rootfile full-path="TS/tomsawyer.pdf"
media-type="application/pdf"
gcf:rendid="urn:uuid:a4b2c4f0-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Tom Sawyer"/>
<rootfile full-path="TS/tomsawyer.lit"
media-type="application/x-ms-reader"
gcf:rendid="urn:uuid:ae03dee0-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Tom Sawyer"/>
<rootfile full-path="TS/tomsawyer.html"
media-type="application/xhtml+xml"
gcf:rendid="urn:uuid:b6c1e090-abd3-11db-abbd-0800200c9a66"
gcf:pubgroup="The Adventures of Tom Sawyer"/>
</rootfiles>
</container>
The above example illustrates the use of all the GCF Namespace attributes defined in this specification.
The Container itself is assigned an ISBN identifier (it could instead have been assigned a UUID or an identifier from some other scheme.) Each Digital Rendition is assigned a version 1 UUID (from a free online UUID generator), following the recommendations in this specification.
In addition, the example illustrates the inclusion of two Publications, with three Digital Renditions for each Publication. The value of the attribute gcf:pubgroup provides a convenient descriptor for the associated Publication. Note from Section 3.3 that if gcf:pubgroup is used at all, it must be applied to all rootfile elements in META-INF/container.xml, which is the case in this example.
IDPF/OCF permits the addition of prefixed namespace attributes to META-INF/container.xml provided the prefixed namespace is declared. Thus, had the above example document specified one and only one OEBPS Publication Digital Rendition, the document would have fully conformed to the IDPF/OCF requirements for META-INF/container.xml.
[To be added]
The IDPF OCF 1.0 Specification provides mechanisms to enable file encryption (for use by DRM systems), digital signatures, container-level metadata, etc. These mechanisms may be used for GCF Containers.
Note that a recent version of the ZIP specification provides its own encryption capability. This must not be used since it is proprietary; the IDPF/OCF provides an encryption mechanism based on the open standards XML Encryption Syntax and Processing Specification.
A future version of this specification, or allied specifications, may address these topics in more detail.
(To be added, and should include discussion of at metadata support, both Container-level and for the Digital Renditions.)