Automation Overview/Automatic Data Aggregation

Automatic Data Aggregation

BaseSpace Sequence Hub automatically aggregates QC Passed FASTQ data when launching apps that use biosamples as inputs. You can control which data are used in analyses by setting the QC status of a resource as QC Passed or QC Failed.


You do not need to merge biosamples to combine their data for analysis.

When BaseSpace Sequence Hub collects data for an app launch, it automatically excludes QC Failed lanes, libraries, pools, and any downstream data they produced. For example, if you fail a flow cell lane, all FASTQ data sets produced from that lane are excluded when aggregating data for the biosamples and libraries put on those lanes. If you fail a FASTQ dataset, only that FASTQ dataset is excluded.

If a FASTQ dataset has been copied, BaseSpace Sequence Hub uses the original FASTQ dataset, or the most recent copy if the original is not available. If you wish to use a different copy, mark the other copies as QC Failed before starting the analysis.


The following resources can be excluded from data aggregation:

Lanes—Fail lanes using Automatic Lane QC, BaseSpace Sequence Hub API, or manually in BaseSpace Sequence Hub.
Libraries—Fail libraries using the BaseSpace Sequence Hub API.
Pools—Fail pools using the BaseSpace Sequence Hub API.
FASTQ Datasets—Fail FASTQ data sets using the BaseSpace Sequence Hub API, or manually in BaseSpace Sequence Hub.


BaseSpace Sequence Hub aggregates FASTQ data sets with different read lengths from the same biosample.


When using apps that have not been updated to use biosamples or data sets as inputs, BaseSpace Sequence Hub automatically converts the FASTQ data sets into samples before launching the app.

When you schedule an analysis, you can specify the Prep Kit with the input biosample. The an analysis will then launch using only FASTQ Datasets from libraries of the specified Prep Kit. In the following example, Prep Kit B is selected as input and the FASTQ files from Prep Kit A are excluded.

Prep Kit Data Included in App Launch

In , all resources are QC Passed and the Whole Genome Sequence App launches with all FASTQ data sets. In , Lane 2 was set to QC Passed and its FASTQ data sets were excluded from the Whole Genome Sequence App analysis.

QC Passed Data Aggregated For App Launch

QC Failed Lane Data Excluded From App Launch

Correct Unintended Aggregations

All data associated with a biosample name are automatically aggregated to the biosample and will be used in automatic app launches. If you did not use unique biosample names or Sample IDs, your data may have unintended aggregations.

If you are the owner of the biosample, you can requeue the run with a corrected biosample name.

1. Edit the biosample name.
If you are using a sample sheet, use Fix Sample Sheet.
If you are using Prep Tab, change the sample name in the Prep Tab.
2. Requeue the run.

The previous run data are marked as failed and can be deleted to reduce storage costs.