Alignment sections have 11 mandatory fields, as well as a variable number of optional fields. Headings begin with the symbol, which distinguishes them from the alignment section. The header section must be prior to the alignment section if it is present.
SAM files can be analysed and edited with the software SAMtools.
The binary equivalent of a SAM file is a Binary Alignment Map (BAM) file, which stores the same data in a compressed binary representation. The SAM format consists of a header and an alignment section. The format supports short and long reads (up to 128 Mbp) produced by different sequencing platforms and is used to hold mapped data within the Genome Analysis Toolkit (GATK) and across the Broad Institute, the Wellcome Sanger Institute, and throughout the 1000 Genomes Project. It is widely used for storing data, such as nucleotide sequences, generated by next generation sequencing technologies, and the standard has been broadened to include unmapped sequences.
The name of SAM came from Gabor Marth from University of Utah, who originally had a format under the same name but with a different syntax more similar to a BLAST output. The overall TAB-delimited flavour of the format came from an earlier format inspired by BLAT’s PSL. It was developed when the 1000 Genomes Project wanted to move away from the MAQ mapper format and decided to design a new format. Sequence Alignment Map (SAM) is a text-based format originally for storing biological sequences aligned to a reference sequence developed by Heng Li and Bob Handsaker et al.