Download Human-g1k-v37-decoy.fasta __exclusive__ Online
The standard human_g1k_v37.fasta (from the 1000 Genomes Project) contains the primary chromosomal sequences (chr1-22, X, Y, M) and some unplaced contigs. However, it lacks decoy sequences.
Do not confuse this with hs37d5.fasta , which is the NCBI/Genome Reference Consortium version of hg19 with decoys. human-g1k-v37-decoy.fasta is the 1000 Genomes variant. They are compatible but not identical. download human-g1k-v37-decoy.fasta
There are two primary ways to obtain this file: directly from the GATK Resource Bundle (the most common method) or via FTP from the Broad Institute archives. The standard human_g1k_v37
By using human-g1k-v37-decoy.fasta , you drastically reduce false positive SNP and indel calls. human-g1k-v37-decoy
Whether you are setting up a pipeline for whole-genome sequencing, configuring a GATK Best Practices workflow, or trying to reproduce legacy data, you will likely need this specific file. This article provides a deep dive into what this file is, why the "decoy" sequences matter, and a step-by-step guide on how to download human_g1k_v37_decoy.fasta safely and efficiently.
As of 2025, the original 1000 Genomes FTP site is deprecated, but mirrors remain active. Here are the to download this file.
~3.2–3.3 Gb (slightly larger than GRCh37’s ~3.1 Gb due to decoys).