Overview/Data Model

Data Model

Each sequencing run produces log files, instrument health data, run metrics, base call information (*.bcl files), and other data. BaseSpace Sequence Hub demultiplexes base call information to create the samples used in secondary analysis.

Samples are automatically analyzed using the Illumina workflow apps specified in the run sample sheet or biosample workflow file, or by manually launching custom BaseSpace Sequence Hub apps. BaseSpace Sequence Hub apps process software and routines that interact with BaseSpace Sequence Hub data through the API. Every app that accesses BaseSpace Sequence Hub data requires user authentication and inflight data encryption.

The new BaseSpace Sequence Hub data model begins with a biosample. Biosamples are prepared into libraries, sequenced, and are associated with one or more libraries, which produce data sets of FASTQ files.

In the new model, projects do not contain biosamples. Instead, they contain the FASTQ data sets associated with the biosample.

Because BaseSpace Sequence Hub tracks data for the biosample, you can easily aggregate data from a biosample that has been sequenced as part of multiple libraries or pools.

Runs, app sessions, and projects have not changed. Run files continue to be stored in the run itself. App sessions continue to be launched by apps but use biosample resources as inputs. Projects continue to store the output files of apps.

BaseSpace Sequence Hub New Data Model

Original Data Model

Files produced by an app session are stored in an analysis. An analysis is created each time an app is launched and stores the app results. For example, when a resequencing app executes alignment and variant calling, an analysis is created that contains the app results for each sample. App results typically contain BAM and VCF files, but they can contain other file types. App results can also be used as app inputs.

A project is a container that stores samples and analyses.

BaseSpace Sequence Hub Data Model