Conpair is a fast and robust method dedicated for human tumor-normal studies to perform concordance verification (i.e. samples coming from the same individual), as well as cross-individual contamination level estimation in whole-genome and whole-exome sequencing experiments. Importantly, our method of estimating contamination in the tumor samples is not affected by copy number changes and is able to detect contamination levels as low as 0.1%.

Documentation

GATK home
GATK source

Interactive use

$ module load conpair
$ wget -O NA12878_normal40x.gatk.pileup https://github.com/nygenome/Conpair/raw/refs/heads/master/data/example/pileup/NA12878_normal40x.gatk.pileup.txt
$ wget -O NA12878_tumor80x.gatk.pileup.txt https://github.com/nygenome/Conpair/raw/refs/heads/master/data/example/pileup/NA12878_tumor80x.gatk.pileup.txt
$ estimate_tumor_normal_contamination.py -T NA12878_tumor80x.gatk.pileup -N NA12878_normal40x.gatk.pileup
Normal sample contamination level: 0.026%
Tumor sample contamination level: 0.016%

Build instructions for those who are curious

git clone https://github.com/nygenome/Conpair.git
cd Conpair
ln -s scripts bin
for file in scripts/*py; do sed -i s/python2.7/python3/ $file ; done
for file in modules/*; do sed -i 's/env python/env python3/' $file; done
# reasons for the sed command are found at https://docs.python.org/3/whatsnew/3.12.html#imp
cat > herefile << EOF
import importlib
import importlib.util
def load_source(modname, filename):
    loader = importlib.machinery.SourceFileLoader(modname, filename)
    spec = importlib.util.spec_from_file_location(modname, filename, loader=loader)
    module = importlib.util.module_from_spec(spec)
    # The module is always executed and not cached in sys.modules.
    # Uncomment the following line to cache the module.
    # sys.modules[module.__name__] = module
    loader.exec_module(module)
    return module
EOF
sed '/import imp/ {
r herefile
d
}' scripts/estimate_tumor_normal_contamination.py
sed -i s/imp.load_source/load_source/ scripts/estimate_tumor_normal_contamination.py
rm -f herefile
cd data/genomes
module load gatk
wget -O human_g1k_v37.fasta.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
gatk CreateSequenceDictionary -R human_g1k_v37.fa.gz
gunzip human_g1k_v37.fasta.gz
samtools faidx human_g1k_v37.fa cd ..
mv Conpair /mnt/nasapps/production/conpair/0.2.0