GATK provides a genomic analysis toolkit focused on variant discovery.
The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.
These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.
Documentation
Interactive use
$ module load gatk $ wget https://resources.qiagenbioinformatics.com/testdata/paeruginosa-reads.zip $ unzip paeruginosa-reads.zip $ gatk FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa Using GATK jar /mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa 12:13:05.739 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/nasapps/production/gatk/4.6.1.0/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Fri Mar 21 12:13:05 EDT 2025] FastqToSam --FASTQ paeruginosa-reads/SRR396636.sra_1.fastq --OUTPUT SRR396636.sra_1.sam --SAMPLE_NAME Pseudomonas_aeruginosa --USE_SEQUENTIAL_FASTQS false --READ_GROUP_NAME A --SORT_ORDER queryname --MIN_Q 0 --MAX_Q 93 --STRIP_UNPAIRED_MATE_NUMBER false --ALLOW_AND_IGNORE_EMPTY_LINES false --ALLOW_EMPTY_FASTQ false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Fri Mar 21 12:13:05 EDT 2025] Executing as onealdw@fsitgl-head01p.ncifcrf.gov on Linux 4.18.0-553.40.1.el8_10.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.14+7-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.6.1.0 INFO 2025-03-21 12:13:05 FastqToSam Auto-detected quality format as: Standard. INFO 2025-03-21 12:13:09 FastqToSam Processed 1,000,000 records. Elapsed time: 00:00:03s. Time for last 1,000,000: 3s. Last read position: */* INFO 2025-03-21 12:13:12 FastqToSam Processed 1909263 fastq reads WARNING 2025-03-21 12:13:13 SortingCollection There is not enough memory per file for buffering. Reading will be unbuffered. [Fri Mar 21 12:13:17 EDT 2025] picard.sam.FastqToSam done. Elapsed time: 0.20 minutes. Runtime.totalMemory()=612368384 Tool returned: 0
Build instructions for those who are curious
wget https://github.com/broadinstitute/gatk/releases/download/4.6.1.0/gatk-4.6.1.0.zip
unzip gatk-4.6.1.0.zip
mv gatk-4.6.1.0 /mnt/nasapps/production/gatk/4.6.1.0