Collection.Doc+JSON - Document Format

Description

Collection.Doc is a read/write, recursive hypermedia-type designed to facilitate flexible exchange of structured content through web APIs. The media type is registered with IANA, since Jan 31st, 2014.

Specification Lead:
Irakli Nadareishvili, NPR
Authors:
See: §Acknowledgements
Dates:
2012-06-20 (Created)
2013-10-32 (Updated)
2014-01-03 (Updated)
2014-01-31 (IANA Approval)
Status:
Approved

Contents

    NOTE:
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.

    Pedigree and Motivation

    Collection.Doc+JSON (application/vnd.collection.doc+json), documented in the following specification, is a JSON representation of the media type. Other representations (+XML, +XHTML, +HTML) can be created, but are out of scope for this document.

    Collection.Doc+JSON is heavily based on Collection+JSON hypermedia type and leverages existing standards, such as: URI Template [RFC6570], Home Document Specification and IANA-registered Link Relation Types, wherever possible.

    In contrast to some interchange formats before it, Collection.Doc is not a domain-specific content exchange format. Rather, Collection.Doc was designed to be a generic hypermedia type that can host more specific standards through semantic extensions: profiles. The primary motivation behind Collection.Doc is to standardize traits common to the vast majority of content APIs: aggregating content items into collections, supporting internationalization, paginating large sets, providing rights management, templated querying, facilities for content update etc.—features that, alas, currently every API unnecessarily implements in their own way.

    In addition to the standardization of common traits, we deemed it equally important to also define standard extension mechanisms for application-specific parts of what an API may require to implement.

    Recursiveness

    Collection.Doc is a recursive hypermedia type. Every instance of a Collection.Doc is a document and a collection at the same time.

    As a document, it contains a set of attributes (key/value pairs) and exposes controls (links). As a collection: it contains other documents that, by the virtue of also being Collection.Docs, can contain other documents themselves, which can further contain additional documents etc. The recursion can be of arbitrary depth and its flexibility allows to describe a wide variety of complex use-cases in a streamlined manner.

    General Structure

    Despite its flexible and extensible nature, Collection.Doc is a very simple media type: it has only five top-level elements:

    Collection.Doc diagram

    Version

    Version property indicates the version of the Collection.Doc specification that message represents.

    A Collection.Doc document SHOULD have a top-level version field. Currently, there’s only one version of the specification: 1.0, the current version. As such, the version property, if present, MUST be set to 1.0, until there’s another version of the specification. If the version field is ommited, it should be assumed to be 1.0.

    The top-level href field is a unique identifier of the resource representation. It MUST be a valid Uniform Resource Identifier (URI)

    Also see: guid attribute vs. top-level href

    Top-level Links field is possibly the most important component of the Collection.doc+JSON media type. It contains hyperlinks representing various relationships that current document has to other documents and actions that can be performed in the context of the current document.

    Links is also where the biggest opportunity for standardization lies: most APIs implement the same controls in different ways. Standardizing those takes us long way towards interoperable APIs and API clients.

    In the JSON representation of the media type, the links field is a JSON object. Each top-level key of the JSON object is a Link Relation Type (RFC5988) and the value is an array of Collection Document Link Objects. We call the link relationship types used for the top-level keys Primary Relationship Type of the corresponding links, to distinguish them from the Secondary Relationship Type that is explained below.

    In Collection.doc+JSON a Link Object is a JSON object that has following fields:

    Collection.doc+JSON media type inherits semantic meaning of many standard Link Relations for the Primary Link Relation Types, including the ones defined by IANA and Microformats.

    For link relation types used in Collection.doc, the emphasis is on using popular, standard IETF and IANA link relation types, rather than inventing new ones. There are a handful of important link relation types that the media type defines and a client MUST implement. Clients are free to implement additional standard link relations as far as they follow the Must Ignore strategy, as defined in the Architecture of the World Wide Web.

    A profile link defines additional semantics for a message body of a Collection Document and uniquely identifies the sub-type of a document.

    Collection.Doc is a generic media type intended to standardize solutions for common requiements in content web APIs. Most applications will however require to use various document sub-types that define additional document-type-specific semantics.

    The href attribute of a profile link MUST be a valid URI that uniquely identifies the sub-type of a document. The URI SHOULD be dereferenceable and SHOULD point to a document explaining additional semantics.

    Depending on the applicatuon needs, the dereferenceable URI of the profile link may point to any web document that can serve as a human-centric or machine-centric documentation. Examples include: another Collection Document document, an ALPS document or even a PDF.

    Profiles MAY be made inheritable using the “extends” link relation allowing re-use of profile definitions and collaboration around profile definitions. Profiles are used solely via linking. There is no requirement for a central registry of profiles. An index of profiles can be created for discoverability, but innovation around profiles is intentionally decentralized.

    Creating lists of content items is one of the most important tasks in content management. Collection.Doc provides both of the two possible ways of defining a list: top-down and bottom-up.

    Based on: RFC 6573

    Item link is a way to define lists top-down. This is when a document points to other documents that it contains. It’s a ‘contains’ type of relationship, suitable for use-cases such as: “blog contains blog posts” or “news story contains asset documents” scenarios.

    Based on: RFC 6573

    Collection link supports the bottom-up approach. In this case child documents themselves are pointing to which parent document they’re associated with. It’s a ‘belongs to’ type of relationship, suitable for things like document indicating: “I belong to these certain topics”.

    Permission

    Read/write. optional. URL pointing to a Collection.doc document that describes rights management-related information.

    For more see: Content Rights Management

    Navigation is a primary link relation that defines a set of links, with the following secondary link relations:

    Alternate

    Based on: W3C Spec

    Read/write. optional. List of URIs that point to alternate representations of the same content item. Suggested usages include: clean (semantic) URL of the content (when “hreflang” of the alternate is the same as hreflang of the content) and/or translations in other languages (when “hreflang” differs).

    Extends

    Extends link relation type allows document profiles to extend each other’s semantic definitions. A profile definition that extends another one MUST inherit all semantic definitions in the parent profile and MUST NOT override them. Child profile is only allowed to define additional semantics, but not allowed to change or invalidate the definitions in the parent. This is a crucial requirement to facilitate Must Ignore strategy.

    Query

    Read-only. optional. List of templated URIs RFC that describe range of Uniform Resource Identifiers that can be constructed through variable expansion to run searches for additional documents related to the context document.

    Edit

    Read-only. optional. A URI pointing to a document that provides information required for updating the document in the current context.

    Creator

    Read-only. optional. A URI pointing to a document that describes user or an organization whose API Key originally created current document.

    Distributor

    Read/write. optional. A URI pointing to a document that describes users or organizations that are allowed to fully administer the document. This is a mechanism for the original publisher of the content to share responsibility with or delegate responsibility of content administration to other collaborators. Distributors, alongside the creator, are the only party that can administer read/write permissions of a document.

    Attributes

    The attributes field is a JSON object that represents static state of the document. If you view a Collection.doc+JSON as encapsulating controls (links) and state, then attributes is where the state resides. The meaning of each attribute depends on the profile of the document.

    The attributes and their semantic meaning defined for the base media type are as follows:

    guid

    Read-only, optional.May be present on output only. Unique internal identifier of a document. A UUIDv4 schema SHOULD be used for generating guids. The only time an API client needs GUID directly is when creating a new document. Documents that are merely collections of search results will not have a GUID, as a search result collection is a dynamic resource identified by the URL of the search query itself.

    The difference between the top-level “href” field of a doucment and the “guid” attribute is that “href” identifies document representation as served by a specific API endpoint, whereas “guid” can identify the content item itself. If the same content is published as two Collection.Doc documents by two API servers, they SHOULD have the same guid, but they MAY have different hrefs.

    title

    Read/write, optional. Title of the document.

    hreflang

    Read/write, optional. ISO639-1 code of the language of the document. Defaults to ‘en’.

    valid

    Read/write, optional. The date-times that content is valid from and to. In ISO 8601 format. Allows scheduling of document publishing and expiration of content. API Clients MUST respect valid from/to fields. This field has following rules:

    created

    Read-only, auto-generated. Date the content item was first saved, in ISO 8601 format. This field is system-generated and read-only to the API clients.

    modified

    Read-only, date the content item was last modified, in ISO 8601 format. This field is system-generated and read-only to the API clients.

    Content Rights

    Content rights determine who can view or modify a document.

    Efectively a document can be either:

    Whereas the meaning of “available” MUST be contextualized by one of the two distinct operations:

    Additionally, Collection.Doc assumes that if you have “write” access to content it implies “read” access, but not: vice versa.

    Role-Based Security

    Collection.Doc’s content rights management system design follows common “role-based” security pattern. Users are enrolled into permission groups, by listing user documents as items links of a permission group document. A permission group document MUST be a Collection.Doc document. It can be of any profile, as far as it is referenced via the permission link of a content document. Applications MAY decide to create or not to create a separate profile type for permission group documents.

    Both the content item as well as the permission group documents MUST be proper Collection.Doc documents and permissions MUST be granted by pointing “permission” link relation from a content item to the permission group document.

    Docs and PermGroups relationship graph

    The “permission” link relation can be used to establish rights management relationships. Presence of a “permission” link relation indicates that the current resource may have access restricted. By default “read” access is open to anybody and “write” access is restricted to the original publisher of the document (“creator”) and distributors. A “permission” link relation can alter those defaults and provide much flexibility in rights management.

    Links using the “permission” link relation MUST point to a dereferenceable resource that SHOULD provide a resource of a known media type. Permissions link relation MAY point to multiple links.

    For example, Collection.doc+JSON media type would represent permissions relationship as:

    {
      "version" : "1.0"
    , "attributes"    : {OBJECT}
    , "links"   : {
        "profile"      : [ARRAY]
      , "permission"  : [ { "href"      : "http://api.pmp.io/docs/a54dd0e7-2e12-49aa-adf0-373e3873493a"
                           , "operation" : "read"
                           , "blacklist" : false
                           }
                         , { "href"      : "http://api.pmp.io/docs/a54dd0e7-2e12-49aa-adf0-373e3873493a"
                           , "operation" : "read"
                           , "blacklist" : true
                           }
                         , { "href"      : "http://api.pmp.io/docs/a54dd0e7-2e12-49aa-adf0-373e3873493a"
                           , "operation" : "write"
                           }]  
      }  
    }
    

    where:

    Please notice that the “permission” link relation in the example above contains multiple links. These link relations are additive: the resulting permissions will be determined by adding-up all of the rules defined by each link relation. Following rules are applied to the addition algorithm:

    Additivity Rules

    1. If one link relation grants access, but another one denies it, the resulting access is: denied.

      This rule is important to guarantee publisher confidence in the rights management and avoid accidental “security holes”. Typically it’s more obvious, hence: easily correct-able, if somebody accidentally gets denied access than if somebody accidentally gets the rights they should not have.

    2. If a user/organization has “write” access, then it always implies read access, even if another relationship denies “read” access specifically.

    3. First all “write” rules are processed separately, then read rules are processed and only in the end the two are combined + conflicts are resolved.

    4. A creator and distributor user/organization can never be denied either “read” or “write” access to a document. This rule trumps all others.

    The resulting additivity rules matrix looks like the following:

    r(y)        = r(y) and w(undefined)
    w(y)        = w(y) and r(y) 
    w(y) + r(n) = w(y) and r(y)
    w(n) + r(y) = w(n) and r(y)
    w(y) + r(y) = w(y) and r(y)
    w(n) + r(n) = w(n) and r(n)
    

    Where w(y) means: group has write access (y = yes), and r(n) accordingly means: group is denied read access (n = no).

    w(undefined) defaults to w(n) if no other rule provides more specific value, but is a weak definition and is overridden by any specific definition whether w(y) or w(n).

    In case of multi-level addition, considering rule #3:

    w(y) + w(n) + r(y) = w(n) and r(y)
    

    Please note that the w(y) rule never got a chance to grant read access since it was trumped by w(n).

    Another example:

    w(y) + r(y) + r(n) = w(y) + r(n) = w(y) and r(y)
    

    In this case rule #1 governed that r(y) + r(n) = r(n), but it was overridden when added-up by the end-result of w() rules, since w(y) implies r(y) according to rule #2.

    Defaults.

    1. If you omit the permission link relation altogether, it means:
      • Document is accessible for “read” to anybody
      • Document is accessible for “write” only to the creator and distributors.
    2. To indicate that a document is view-able only by the creator and distributors, you need to create a group that only contains the creator and specify it as a read whitelist:

       { "href"      : "https://api.pmp.io/docs/3709eda6-0c57-4f67-ab8f-efddb641297d"
       , "operation" : "read"
       } 
      

      where 3709eda6-0c57-4f67-ab8f-efddb641297d is the guid of the document that defines the creator. Creator and distributors always have read and write permissions on a document, so this just serves to prevent others from having access.

    A Blacklist Without a Whitelist.

    This is an important use-case worth discussing separately.

    While technically valid, it rarely makes any sense to define a blacklist without defining a whitelist group regardless of an operation type.

    For “write” operations, not having a “whitelist” defined means that the default write behavior applies, where only the creator and distributors are allowed to modify content. Creator and distributors cannot be denied access, so why define an extraneous “blacklist”?

    For “read” operations, not having an explicit “whitelist” means that the default read behavior applies, where any API user with an API Key (or alternative valid authentication token) can “read” a document. By defining a blacklist you can restrict specific users/organizations, but since they can probably easily register as another user - are you really securing anything?

    Defining a “blacklist” permission relation without a “whitelist” permission relation also defined on a document is usually a “smell” of potential misconfiguration. It is not invalid, since it is not technically wrong, but publishers should be careful and treat such definitions as indications of a possible error.

    Acknowledgements

    Collection.Doc+JSON is based on Collection+JSON hypermedia type and heavily leverages existing standards, such as: URI Template [RFC6570], Home Document Specification and IANA-registered Link Relation Types, wherever possible.

    It was designed by a team at NPR that included:

    with valuable input and advice from Mike Amundsen, Ioseb Dzmanashvili and Andrew Kuklewicz, while working on various NPR APIs and most recently: building the Public Media Platform.