Frequently Asked Questions – PacBio
What PacBio services does the Sequencing Facility provide?
Who can order PacBio services through the Sequencing Facility?
How do I submit a sequencing proposal?
How do I submit samples?
What are the quantity and quality requirements for submitted samples?
What happens after my sample is submitted?
What is required to assure timely processing and delivery of the data?
What kinds of analyses are performed?
How will my data be delivered and what are the file formats?
How large are the files?
How do I analyze the data?
How long is the data available?
Please see the Services page for a suggested list of projects we support. However, the PacBio RS is still a novel platform. If your project design is not listed, please contact Dr. David Munroe, the Project Manager, or Bao Tran, the Sequencing Facility Director, to discuss the potential feasibility of a custom project.
Contact Bao Tran to find out if your research lab is eligible.
Please complete a sequencing proposal form (available under Protocols and Resources) and submit it to email@example.com. You may also contact Kristie Jones, the PacBio Lab Team Leader, to discuss the available options and best choices for your sequencing project.
Before submitting samples, ensure the sequencing proposal has been approved. Once approved, you may then submit samples by delivering them to Castle Raley, the PacBio QA Specialist, at the Advanced Technology Center, room 133. Pacific Biosciences recommends resuspending your samples in either molecular biology grade water or 10 mM Tris-HCl. Please include a Sample Manifest with your submission and ensure all samples are shipped in dry ice and are clearly labeled. Please include any Quality Control documentation available such as gel images or electropherograms. Use FedEx same-day or next-day delivery for all sample deliveries, Monday through Thursday. (http://fedex.com/us/services/us/index.html) Alternatively, you are always welcome to drop off your samples between 9:00 am and 4:00 pm, Monday through Friday, at the Sequencing Facility in Gaithersburg, MD.
- Circular Consensus Sequencing (500 bp fragment): minimum DNA 1 μg.
- Standard Sequencing (3 Kb fragment): minimum DNA 2 μg.
- Strobe Sequencing (6-10 Kb fragment): minimum DNA 10-20 μg.
- Must have OD260/280 1.8–2.0
- Must be double-stranded. Single-stranded DNA will not be made into a SMRT bell template in the template preparation process and can interfere with quantitation and polymerase binding.
- Has undergone a minimum of freeze-thaw cycles.
- Has not been exposed to high temps (> 65°C for more than one hour can cause a detectable decrease in sequence yield and quality).
- Has not been exposed to pH extremes (< 6 or > 9).
- Does not contain insoluble material.
- Does not contain RNA.
- Has not been exposed to intercalating fluorescent dyes or ultraviolet radiation.
- Does not contain chelating agents (e.g., EDTA), divalent metal cations (e.g., Mg2+), denaturants (e.g., guanidinium salts, phenol), or detergents (e.g., heme, humic acid, polyphenols).
Before sequencing, we will perform an initial sample acceptance QC check to confirm the information in the sample manifest is correct and to ensure your sample meets the minimum quantity and quality requirements. We will provide you with the QC results and indicate whether your sample meets the minimum sequencing requirements. If a sample fails to pass QC, you can choose to either resubmit the sample or to proceed with library construction of the sample, assuming the cost if library construction fails. You will be notified again when the analysis of each sample is completed and available for download.
An initial consultation is required with the ABCC’s PacBio Bioinformatics team to discuss the data analysis and set expectations. It is also important that reference sequence(s) are provided for projects requiring reference mapping, as mapping is performed automatically based on the settings configured during the sequencing run setup. The reference should be submitted to the PacBio Bioinformatics team as a text file containing the sequence in FASTA format; use of degenerate nucleotide codes should be avoided unless specifically desired. Please contact the PacBio Bioinformatics team if you have any questions regarding the reference sequences or your preferred data processing options.
We perform primary and secondary analysis. Primary analysis is performed on the computer attached to the instrument (see figure below) and includes trace recording, trace to pulse conversion, raw base calling, and consensus base calling that uses circular consensus algorithm.
Secondary analysis is performed on the compute cluster at the NCI’s Advanced Biomedical Computing Center and can include, depending on the specifics of the project, mapping reads to the reference with subsequent consensus building, variant calling, and/or first-pass de novo assembly. Sequencing metrics are provided for you with the standard sequencing report.
For the projects not requiring secondary processing, we will post the zip file with the primary data on the web for download, or will copy the data to the location of your choice where you would be able to download it for further analysis. Please contact the PacBio Bioinformatics team to make arrangement for primary data delivery.
The data folder structure is shown on the Figure below. The run folder, usually with the date of the run in its name, contains folders for individual SMRT Cells, and each of those folders contains the folder named Analysis_Results, with two sets (s1 and s2) of files for each of the two movie sets.
Structure of a PacBio run folder; a four SMRT Cell run is shown as an example:
Files (raw primary data) and theis sizes in bytes in the Analysis_Results folder:
If you requested the reads to be mapped to the provided reference sequence, we will perform the mapping after pooling data from all SMRT Cells in your project. Such aggregated data will be delivered via web, unless requested otherwise. The web report will contain a standard report with links to specific reports, such as Filtering, Diagnostic, Coverage Mapping, Variants, and links to download files with aligned reads in BAM and HDF5 formats, coverage files in GFF and BED formats, variants in VCF, BED and GFF formats, etc. Please refer to the Figure below showing the sample report page:
PacBio sequencing data are incremental, with the smallest unit being a dataset equal to one SMRT Cell worth of data. As of fall 2011, the amount of data produced is approximately 3.5GB, compressed. However, data volumes continuously change as new software and chemistry is introduced, so we encourage you to contact the PacBio team if you need the most up-to-date information.
The CCRIFX Bioinformatics core is a new on-site core resource providing bioinformatics analysis assistance to investigators of the Center for Cancer Research at NCI. If you need assistance for data analysis, please check CCRIFX core web site for the services and office hours they provide.
The original sequence and analysis files are available for download for one month after you receive an email confirming delivery of the data. It is the responsibility of the investigator to ensure they have retrieved their data within the time frame. To maintain sufficient data storage for upcoming runs, the analysis files are then archived and stored for six months. Please contact PacBio Bioinformatics team if you need us to store the data longer.
If your data is no longer available for download, please contact the bioinformatics group and we can re-run the data processing and alignment as necessary. However, please note that it may take longer to receive the re-analyzed data because of resource conflicts with current production runs. Whenever possible it is best to download the data in a timely manner upon receiving the data delivery notice.