Generalized Container Format Specification 1.0

Draft 2007-02-19

This draft version:
(link forthcoming)
Latest draft version:
(link forthcoming)
Previous draft version:
(none)
Contributors:
David Coté
Jon Noring (editor)
Lee Passey


1 Overview

1.1 Purpose

The Generalized Container Format (GCF) is an open specification for encapsulating one or more Digital Renditions of one or more Publications into a single, compressed file (hereafter referred to as the “Container”.) The Container is a convenient mechanism for the digital storage, transmission, and distribution of Publications of all kinds — it is quite generic, not specific to any particular Digital Rendition format. Nor is the Container limited to only text-based publications — it may be used for other types of content such as audio and video.

The Container also permits the encryption of component files by Digital Rights Management (DRM) systems which control user access to the content contained within a Container. This specification does not specify particular DRM technologies or systems which may be applied to the Container, but rather specifies the encryption mechanism which such systems may utilize.

1.2 Compatibility With the IDPF OEBPS Container Format

This specification is a generalization of the IDPF OEBPS Container Format 1.0 Specification (IDPF/OCF). The IDPF/OCF describes a ZIP-based format for encapsulating an OEBPS Publication plus optional renditions of that OEBPS Publication in other formats.

The primary difference between GCF and IDPF/OCF is that this specification does not require a Container to include an OEBPS Publication. The IDPF/OCF requirement for an OEBPS Publication is suitable for its designed application, that of a transport and archive format for OEBPS publications, but may limit the options available to publishers and content creators.

In addition, it is possible for a Container meeting this specification to include multiple, independent Publications, with one or more Digital Renditions for each Publication. This allows publishers and creators greater flexibility in digital distribution of their Publications to users.

One fundamental design goal for GCF is that any conforming IDPF/OCF Container also conforms to this specification, a goal which has been achieved. It is strongly recommended, however, that IDPF/OCF Containers minimally include the container-level identifier gcf:id per the requirements of this specification — using any of the GCF Namespace attributes in META-INF/container.xml does not break conformity to the IDPF/OCF Specification.

2 Conventions, Definitions and Resources

2.1 Normative Edition

The normative edition of this specification is the XHTML 1.1 document located at (To be added).

Other formatted editions may be offered besides the normative edition, but they will not be considered normative.

2.2 Definitions

Following are the more important terms used in this Specification which require precise definition:

(More might need to be added)

Container

When used unqualified in this specification, Container refers to a ZIP file conforming to the purpose and restrictions of this specification.

Digital Rendition

A Digital Rendition is the “Manifestation” (see FRBR) or embodiment of a Publication in some recognized digital format or framework, such as, for example, HTML web pages, OEBPS, OpenReader, PDF, Microsoft LIT, JPEG, MP3, MOV, etc. A Digital Rendition may even be another GCF Container.

IDPF/OCF

The IDPF OEBPS Container Format 1.0 Specification. IDPF/OCF describes a ZIP-based format for encapsulating an OEBPS Publication plus optional renditions of that OEBPS Publication in other formats.

Publication

When used unqualified in this specification, Publication is equivalent to the FRBR “Expression”: “The specific intellectual or artistic form that a Work takes each time it is ‘realized.’”

To illustrate this, the following is a Publication according to this specification: “The Adventures of Tom Sawyer, by Mark Twain. Annotated and Edited Edition published by Acme Press, 2007.”

Thus, a Publication has a high degree of specificity, yet is still a non-embodied, abstract entity.

A Publication may be “materially” manifested (or embodied) in one or more ways (the FRBR “Manifestation”): physical (such as printed paperback books, chiseled on stone tablets, recorded on vinyl long-play records, etc.) and in digital electronic form or Digital Rendition (e.g., HTML web pages, OEBPS, OpenReader, PDF, Microsoft LIT, JPEG, MP3, MOV, etc.)

2.3 Requirement Levels

The following key words (“imperatives”) are used in this specification to denote requirement level consistent with RFC 2119:

  • must
  • must not
  • required
  • should
  • recommended
  • may
  • not required
  • optional

2.4 Highlighting Conventions

To aid in readability and understandability, special text highlighting conventions are used in this specification (in addition to ordinary text emphasis) to emphasize important items.

2.4.1 Imperative Level

The requirement level imperatives described in Section 2.3 are highlighted based on three basic imperative levels: required, recommended, and optional.

2.4.2 Elements, Attributes and Attribute Values, and Other Code

The normative XHTML 1.1 edition of this specification includes special markup for every mention of elements, attributes, attribute values, and other related code. This allows special highlighting to be applied by CSS to these markup constructs during presentation so they may be more easily recognized.

Since the normative edition of this specification may be rendered with different CSS style sheets, converted into other formats, rendered on visually limited hardware, or presented with text-to-speech engines, some or all of this highlighting may become lost or unrecognizable. Care has been taken to assure that, in the absence of highlighting, every mention of these markup constructs will be clear and unambiguous.

Highlighting appearing in this specification:

  • Element (required): container

  • Attribute (optional): gcf:id

  • Attribute Value (whole or fragment): urn:isbn:0-395-36341-1

  • Other Code: META-INF/container.xml

2.4.3 Hypertext Links

3 Generalized Container Requirements, Recommendations and Options

3.1 General Conformance

A GCF Container must conform to the IDPF OCF 1.0 Specification (IDPF/OCF), but with the following exceptions and optional additions:

  1. A Container is not required to contain an OEBPS Publication.

  2. The one-line ASCII text file mimetype is not required.

  3. For the required Container document META-INF/container.xml, the following attributes from the GCF namespace may be applied to the root element container:

    • xmlns:gcf

      This is the GCF namespace declaration which is required whenever any of the GCF namespaced attributes described in this specification are present in META-INF/container.xml. It must be given the value of [GCF Namespace URI to be assigned later]

    • gcf:id

      This optional, but strongly recommended attribute assigns the Container Identifier. Refer to Section 3.2 for further requirements, recommendations, and comments regarding this attribute and its value.

      [Informative Commentary] The Container Identifier is intended to identify the Container file itself, not the Digital Rendition(s) contained inside. A Container Identifier could be the same as a particular Digital Rendition identifier (if assigned), but this is not advised.

    Example:

    <container version="1.0"
               xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
               xmlns:gcf="[GCF Namespace URI to be assigned later]"
               gcf:id="urn:isbn:978-1-56619-909-4">
  4. For the required Container document META-INF/container.xml, the following attributes from the GCF namespace are optional (but recommended) for the element rootfile (this element is used to specify a Digital Rendition in the Container):

    • gcf:rendid

      This optional (but recommended) attribute assigns the identifier of the associated Digital Rendition. Refer to Section 3.2 for the requirements, recommendations, and comments regarding this attribute and its value.

    • gcf:pubgroup

      This optional (but recommended) attribute assigns the publication group of the associated Digital Rendition. Refer to Section 3.3 for the requirements, recommendations, and comments regarding this attribute and its value.

    Example:

    <rootfile full-path="tomsawyer.html"
              media-type="application/xhtml+xml"
              gcf:rendid="urn:uuid:5eda7560-a073-11db-b606-0800200c9a66"
              gcf:pubgroup="The Adventures of Tom Sawyer"/>

3.2 Identifiers (gcf:id and gcf:rendid Attributes)

The attributes gcf:id and gcf:rendid assign the identifiers for the Container and a Digital Rendition, respectively. The attribute value for both attributes is of datatype CDATA. Both attributes are recommended.

The value for each attribute must be drawn from an established, public identifier namespace scheme, and must follow the full syntax specified by that namespace.

Furthermore, if the identifier scheme to be used is either a registered Uniform Resource Name (URN) namespace, or a registered “info” URI namespace, then that namespace and associated syntax must be used.

For example, if the identifier is an ISBN or UUID, it must be assigned per the URN Namespace syntax (see Using ISBN in URN). If the identifier is a Digital Object Identifier (DOI), it must be assigned per the “info” URI Scheme requirements for DOI.

Examples:

gcf:id="urn:isbn:0-395-36341-1"   (ISBN, 10 digit)

gcf:rendid="urn:isbn:978-1-56619-909-4"   (ISBN, 13 digit)

gcf:id="urn:uuid:5eda7560-a073-11db-b606-0800200c9a66"   (UUID)

gcf:rendid="info:doi/10.123/456"   (DOI)

It is strongly recommended that UUID (preferably the time-based or version 1 type) be assigned for the Container Identifier and to each Digital Rendition if any do not require or need an identifier assigned by a formal registration authority, such as ISBN. UUID is freely usable and is, practically speaking, globally unique; there are a number of free UUID generators, as well as UUID registration services.

Note that once a Digital Rendition is assigned an identifier, that same identifier should be re-used when the same Digital Rendition appears in a different Container. This allows existing links/references to that Digital Rendition to not be broken, among other benefits.

When the public namespace scheme specification allows any characters in the full identifier to be case-insensitive, lower case is strongly recommended, as the above examples illustrate.

3.3 Publication Group (gcf:pubgroup Attribute)

As stated in Section 1.2, this specification allows multiple, independent Publications in a Container, and one or more Digital Renditions for each Publication. When more than one Publication is contained in the Container, it is necessary to identify (or group together) the Digital Renditions which represent the same Publication.

This grouping is accomplished using the optional (but recommended) attribute gcf:pubgroup applied to the rootfile element in META-INF/container.xml. The value of this attribute (of datatype CDATA) assigns the associated Digital Rendition to a particular Publication. The Digital Renditions in a Container which have the identical attribute value for gcf:pubgroup are assumed to represent the same Publication.

When all Digital Renditions in a Container represent the same Publication (that is, the Container only contains one Publication), then gcf:pubgroup is not necessary, but recommended.

To avoid ambiguity, if gcf:pubgroup is used at all, it must be applied to all rootfile elements in META-INF/container.xml. If gcf:pubgroup is applied to some but not all rootfile elements, then processors must assume that all Digital Renditions in the Container represent the same Publication and ignore whatever values have been applied to the gcf:pubgroup attribute(s).

The use of gcf:pubgroup is illustrated in the markup example of Section 3.4.

3.4 META-INF/container.xml Markup Example

Following is an example META-INF/container.xml XML document which conforms to this specification. Further commentary follows the example.

<?xml version="1.0"?>

<container version="1.0"
           xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
           xmlns:gcf="[GCF Namespace URI to be assigned later]"
           gcf:id="urn:isbn:978-1-56619-909-4">

   <rootfiles>

      <rootfile full-path="HF/huckfinn.pdf"
                media-type="application/pdf"
                gcf:rendid="urn:uuid:64fa3550-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Huckleberry Finn"/>

      <rootfile full-path="HF/huckfinn.lit"
                media-type="application/x-ms-reader"
                gcf:rendid="urn:uuid:7cfa28e0-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Huckleberry Finn"/>

      <rootfile full-path="HF/huckfinn.html"
                media-type="application/xhtml+xml"
                gcf:rendid="urn:uuid:99495250-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Huckleberry Finn"/>

      <rootfile full-path="TS/tomsawyer.pdf"
                media-type="application/pdf"
                gcf:rendid="urn:uuid:a4b2c4f0-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Tom Sawyer"/>

      <rootfile full-path="TS/tomsawyer.lit"
                media-type="application/x-ms-reader"
                gcf:rendid="urn:uuid:ae03dee0-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Tom Sawyer"/>

      <rootfile full-path="TS/tomsawyer.html"
                media-type="application/xhtml+xml"
                gcf:rendid="urn:uuid:b6c1e090-abd3-11db-abbd-0800200c9a66"
                gcf:pubgroup="The Adventures of Tom Sawyer"/>

   </rootfiles>

</container>

The above example illustrates the use of all the GCF Namespace attributes defined in this specification.

The Container itself is assigned an ISBN identifier (it could instead have been assigned a UUID or an identifier from some other scheme.) Each Digital Rendition is assigned a version 1 UUID (from a free online UUID generator), following the recommendations in this specification.

In addition, the example illustrates the inclusion of two Publications, with three Digital Renditions for each Publication. The value of the attribute gcf:pubgroup provides a convenient descriptor for the associated Publication. Note from Section 3.3 that if gcf:pubgroup is used at all, it must be applied to all rootfile elements in META-INF/container.xml, which is the case in this example.

IDPF/OCF permits the addition of prefixed namespace attributes to META-INF/container.xml provided the prefixed namespace is declared. Thus, had the above example document specified one and only one OEBPS Publication Digital Rendition, the document would have fully conformed to the IDPF/OCF requirements for META-INF/container.xml.

4 Digital Rendition MIME Media Types

[To be added]

5 Encryption, Digital Signatures, and Container-Level Metadata

The IDPF OCF 1.0 Specification provides mechanisms to enable file encryption (for use by DRM systems), digital signatures, container-level metadata, etc. These mechanisms may be used for GCF Containers.

Note that a recent version of the ZIP specification provides its own encryption capability. This must not be used since it is proprietary; the IDPF/OCF provides an encryption mechanism based on the open standards XML Encryption Syntax and Processing Specification.

A future version of this specification, or allied specifications, may address these topics in more detail.

6 Tentative Future Plans (Non-Normative)

(To be added, and should include discussion of at metadata support, both Container-level and for the Digital Renditions.)