Input VCF Requirements

If you use a variant call format (VCF) file from a whole-genome sequencing analysis done outside of BaseSpace, the input VCF file must adhere to VCF version 4.1. The file also has the following requirements:

VCF file header and columns require the following characteristics:
Columns are in the following order, and separated from each other by tabs:

#CHROM POS ID REF ALT QUAL FILTER INFO

If these requirements are not met, the VCF Upload app does not upload the file. See also support.basespace.illumina.com.

VCF file entries require the following characteristics:
The chromosome field (CHROM) begins with the string chr followed by the chromosome number 1–22, X, Y, or M.
Annotations for the reference bases in the ref column match the UCSC hg19 reference; see genome.ucsc.edu/cgi-bin/hgTables.
Multi-sample VCF files are not allowed for analysis. If you do have a multi-sample VCF file, filter the file to include only a single sample of interest with the VCFtools package (vcftools.sourceforge.net) or another tool.
16,000 or more passing variants, else there are insufficient variants to perform analysis.
If there are < 1000 heterozygous SNVs in a single chromosome the TruSeq Phasing Analysis app exits.

If these requirements are not met, the TruSeq Phasing Analysis app does not proceed.

Variants in the VCF file require the following characteristics; if not, the TruSeq Phasing Analysis app skips the variant for phasing analysis. The analysis proceeds with all variants that adhere to the rules.
Entries are labeled as 1–22, X, Y, or M only in the CHROM column. For example, chr1 is valid whereas chr6_cox_hap2 is not.
Variants in the ref and alt columns are 50 bases long or less.
Variants in the ref and alt columns only contain ACGT characters.
Only bi-allelic variants are allowed. Haploid calls and multi-allelic entries are discarded.
Variants have a PASS label in the filter column.
Variants have information in the GT field.

When your VCF file has the proper format, you can upload it to BaseSpace. For more information, see the Import Analysis section in the BaseSpace User Guide.

Note

If the VCF file passes the variant filtering criteria but has less than 1 million SNPs and indels after filtering, the TruSeq Phasing Analysis app issues a warning. VCF files with less than 1 million passing variants are rare for a human whole-genome sequencing run with depth of coverage of at least 30×. Low numbers of variants indicate problems with the sample, sequencing, or analysis.

 

© 2015 Illumina, Inc. All rights reserved.

15055853 Rev. B