GATK provides a genomic analysis toolkit focused on variant discovery.

The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.

These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.

Documentation

GATK home
GATK source

Interactive use

$ module load gatk
$ wget https://resources.qiagenbioinformatics.com/testdata/paeruginosa-reads.zip
$ unzip paeruginosa-reads.zip
$ gatk FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa 
Using GATK jar /mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa
12:13:05.739 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Mar 21 12:13:05 EDT 2025] FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa --USE_SEQUENTIAL_FASTQS false --READ_GROUP_NAME A --SORT_ORDER queryname --MIN_Q 0 --MAX_Q 93 --STRIP_UNPAIRED_MATE_NUMBER false --ALLOW_AND_IGNORE_EMPTY_LINES false --ALLOW_EMPTY_FASTQ false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Fri Mar 21 12:13:05 EDT 2025] Executing as onealdw@fsitgl-head01p.ncifcrf.gov on Linux 4.18.0-553.40.1.el8_10.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.14+7-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0
INFO	2025-03-21 12:13:05	FastqToSam	Auto-detected quality format as: Standard.
INFO	2025-03-21 12:13:09	FastqToSam	Processed     1,000,000 records.  Elapsed time: 00:00:03s.  Time for last 1,000,000:    3s.  Last read position: */*
INFO	2025-03-21 12:13:12	FastqToSam	Processed 1909263 fastq reads
WARNING	2025-03-21 12:13:13	SortingCollection	There is not enough memory per file for buffering. Reading will be unbuffered.
[Fri Mar 21 12:13:17 EDT 2025] picard.sam.FastqToSam done. Elapsed time: 0.20 minutes.
Runtime.totalMemory()=612368384
Tool returned:
0

Build instructions for those who are curious

wget https://github.com/broadinstitute/gatk/releases/download/4.6.1.0/gatk-4.6.1.0.zip
unzip gatk-4.6.1.0.zip
mv gatk-4.6.1.0 /mnt/nasapps/production/gatk/4.6.1.0