Schemas for the Data Object Service (DOS) API

The Global Alliance for Genomics and Health is an international coalition formed to enable the sharing of genomic and clinical data. This collaborative consortium takes place primarily via GitHub and public meetings.

Cloud Workstream

The Data Working Group concentrates on data representation, storage, and analysis, including working with platform development partners and industry leaders to develop standards that will facilitate interoperability. The Cloud Workstream is an informal, multi-vendor working group focused on standards for exchanging Docker-based tools and CWL/WDL workflows, execution of Docker-based tools and workflows on clouds, and abstract access to cloud object stores.

What is DOS?

This proposal for a DOS release is based on the schema work of Brian W. and others from OHSU along with work by UCSC. It also is informed by existing object storage systems such as:

The goal of DOS is to create a generic API on top of these and other projects, so workflow systems can access data in the same way regardless of project.

Key features

Data object management

This section of the API focuses on how to read and write data objects to cloud environments and how to join them together as data bundles. Data bundles are simply a flat collection of one or more files. This section of the API enables:

  • create/update/delete a file
  • create/update/delete a data bundle
  • register UUIDs with these entities (an optionally track versions of each)
  • generate signed URLs and/or cloud specific object storage paths and temporary credentials

Data object queries

A key feature of this API beyond creating/modifying/deletion files is the ability to find data objects across cloud environments and implementations of DOS. This section of the API allows users to query by data bundle or file UUIDs which returns information about where these data objects are available. This response will typically be used to find the same file or data bundle located across multiple cloud environments.

Implementations

There are currently a few experimental implementations that use some version of these schemas.

  • DOS Connect observes cloud and local storage systems and broadcasts their changes to a service that presents DOS endpoints.
  • DOS Downloader is a mechanism for downloading Data Objects from DOS URLs.
  • dos-gdc-lambda presents data from the GDC public REST API using the Data Object Service.
  • dos-signpost-lambda presents data from a signpost instance using the Data Object Service.