Whole genome sequencing refers to a wide variety of applications; which platform to choose depends very much on both the genome itself and the research question being asked. LMT and SF staff are available to help with effective study design.
Metagenomics may require higher coverage platforms, depending on the expected complexity and relative frequency of microbes. Other approaches for assessing diversity, such as 16S sequencing, can be performed using Ion Torrent. To date, PacBio error profile may be prohibitive for these studies.
Plant and mammalian genomes are large and complex. These whole genomes should be sequenced on the Illumina HiSeq with the understanding that not all regions of the genome will sequence equally well, and de novo assembly will be difficult given the shorter read length. Human genomes, which have a good reference, will be less problematic than those of other species.
De novo assembly of new genomes requires coverage, accuracy, and completeness. Ideally, different data sets with different intrinsic biases, strengths, and weaknesses are combined to produce a good assembly. While most of any genome can be assembled even with short reads given the high coverage of Illumina or SOLiD data, finishing will need longer reads and gap filling provided by 454, PacBio, Sanger, or a combination of these methods.
Another approach to NGS is to sequence targeted regions. Capture of the target region can be hybridization-based (for example SureSelect) or PCR-based (using Fluidigm or other amplification capture methods). In order to get the most effective use of sequencing space, samples can be barcoded and multiplexed. In this case, choice of sequencing platform depends on the number and size of captured regions, the number of individuals that are being multiplexed, and the type of sequence being captured.
Several commercial methods exist to capture the human exome. Whole exome sequencing is ideal to run on a lane of the Illumina HiSeq; however, several NGS platforms provide enough coverage to be suitable for this application.
Multiplexed amplicons are an efficient way to use available sequencing space. Many large (Mbp) regions should be targeted for high throughput platforms, as should samples with very low frequency alleles (<5%). Otherwise Ion Torrent, PacBio, or 454 are appropriate platforms depending on read length and accuracy requirements. With few amplicons for few individuals, Sanger may still be the most appropriate and cost effective method of sequencing.
Similar to smaller genomes, BACs, plasmids and other constructs should be run on the platforms that have lower cost/run, since all NGS platforms will provide sufficient coverage for high accuracy. Sanger sequencing should also be considered.
ChIP seq kits are available for Illumina and SOLiD.
Several specific types of sequences are difficult to sequence on most NGS platforms. These include regions with particularly high GC or AT content, highly repetitive regions, longs stretches of homopolymers, palindromes, and other secondary structure. PacBio is the only NGS method that works well for these types of sequences. Sanger sequencing can also be used to sequence through these regions.
RNA sequencing with conversion to cDNA prior to sequence can be performed on all platforms. Expression data for whole transcriptomes requires sufficient coverage to detect low abundance messages and should be done on Illumina or SOLiD, while sequencing of smaller numbers of targeted transcripts will require less coverage. Similar to targeted DNA sequencing, the appropriate platform depends on the number and size of the transcripts and the read length and coverage required. Both Illumina and SOLiD have kits available for miRNA sequencing and are appropriate platforms given the short read length. Ion Torrent may have potential for this application in the future. In addition, PacBio plans to develop direct RNA seq methods, though these are not yet available.
Methylation of cytosines can be detected on any sequencing platform using bisulfite conversion prior to sequencing. The appropriate platform depends on the same considerations as for regular DNA sequencing; total number of base pairs and coverage required. Since mapping will need to be reference based to detect methylation events, read length should not be a consideration in this type of profiling. A method for direct detection of epigenetic modifications concurrent with primary sequence acquisition is in development at PacBio.