Data Model

Each sequencing run produces log files, instrument health data, run metrics, base call information (*.bcl files), and other data. BaseSpace Sequence Hub demultiplexes base call information to create the samples used in secondary analysis.

Samples are automatically analyzed using the Illumina apps specified in the run sample sheet or biosample workflow file, or by manually launching custom BaseSpace Sequence Hub apps. BaseSpace Sequence Hub apps process software and routines that interact with BaseSpace Sequence Hub data through the API. Every app that accesses BaseSpace Sequence Hub data requires user authentication and inflight data encryption.

The BaseSpace Sequence Hub data model begins with a biosample. Biosamples are prepared into libraries, sequenced, and associated with one or more libraries, which produce data sets of FASTQ files. Projects contain FASTQ data sets associated with biosamples but do not contain the biosamples.

Because BaseSpace Sequence Hub tracks data for the biosample, you can easily aggregate data from a biosample that has been sequenced as part of multiple libraries or pools.

Runs, app sessions, and projects have not changed. Run files continue to be stored in the run itself. App sessions continue to be launched by apps but use biosample resources as inputs. Projects continue to store the output files of apps.

BaseSpace Sequence Hub Data Model

Overview

Automate

Manage Data

Sequence

Analyze

Collaborate

Manage Your Account

Developer Tools

Additional Resources

Related articles

Data Model