BAM File Format

A BAM file (*.bam) is the compressed binary version of a SAM file that is used to represent aligned sequences. SAM and BAM formats are described in detail at samtools.github.io/hts-specs/SAMv1.pdf.

BAM files use the file naming format, SampleName.SampleGroup_S#.bam. The variable,#, is the sample number determined by the order that samples are listed for the run.

BAM files contain a header section and an alignment section:

Header—Contains information about the entire file, such as sample name, sample length, and alignment method. Alignments in the alignments section are associated with specific information in the header section.
Alignments—Contains read name, read sequence, read quality, alignment information, and custom tags. The read name includes the chromosome, start coordinate, alignment quality, and match descriptor string.

The alignments section includes the following information for each read or read pair:

RG—Read group, which indicates the number of reads for a specific sample.
BC—Barcode tag, which indicates the demultiplexed sample ID associated with the read.
SM—Single-end alignment quality.
NM—Edit distance tag, which records the Levenshtein distance between the read and the reference.
MD—Mismatching positions/bases (BWA only).

BAM files are suitable for viewing with an external viewer such as IGV or the UCSC Genome Browser.

BAM index files (*.bam.bai) provide an index of the corresponding BAM file.

NOTE

BAM files are not available for runs that performed fusion calling.