FAQs
Sequencing services
Does Blueprint Genetics provide sequencing data for further research purposes?
Yes, we provide raw sequencing data for Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES) projects. For more information about the assays, including sequencing parameters, please visit https://blueprintgenetics.com/sequencing-services/
What is raw sequencing data used for?
Raw sequencing data are typically a resource for scientists that are skilled at bioinformatics and performing human research under an approved IRB. Raw sequencing data require specialized bioinformatic tools to open the files and further analyze or visualize them. The results are not considered validated or appropriate for informing medical management. Any discoveries that may have medical implications for individuals should be confirmed using a validated test prior to making medical management changes.
What file formats does Blueprint Genetics provide?
Blueprint Genetics delivers a variety of raw sequencing data file types, depending on your research purposes. For the standard Whole Genome Sequencing (WGS) projects, we provide
- FASTQ
- CRAM
- VCF
- All of the above
For the standard Whole Exome Sequencing (WES) projects, we provide
- FASTQ
- BAM
- VCF (unannotated SNVs and indels)
- All of the above
We also provide you with md5sum file for confirming the integrity of the downloaded file(s).
FASTQ is a flat file format that contains DNA sequencing output in the form of base calls (A, T, G, C, or N) and a corresponding Phred quality score describing the reliability of the call. This is the input file for quality control and sequence alignment processes. Typically, a WES run produces 8 FASTQ files per sample that are ~1 GB in size and WGS run produces 16 FASTQ files per sample that are ~6 GB in size.
BAM (Binary sequence Alignment Map) is a standard file format that contains sequence reads mapped to the human reference genome, including base quality and mapping quality scores. BAM files are ~6-10 GB in size for WES type analysis and ~30-50 GB for WGS analysis.
VCF (Variant Call Format) is a standard file format storing a list of sequence variants and their genomic positions. The VCF file does not contain detailed annotations of the variants such as the gene name or population frequency of the variant. This VCF has not been filtered to meet any specific quality standards. It is ~2 MB in size for WES type analysis and ~400 MB for WGS analysis.
MD5 Checksum (md5sum) is used to verify the integrity of the downloaded file(s) by confirming that a file has not changed because of a faulty file transfer, a disk error, or unintentional corruption.
Please note that the file size, especially for BAM and FASTQ files, is large. The data require further processing with bioinformatic tools to produce meaningful results. Links to commonly used and freely available tools are provided in “What software do I need to read the raw sequencing data?”
Who can collaborate with us for the research projects?
The service is available for those working under an approved IRB for human research, and with appropriate consents of participants, including:
- Scientists
- Research-minded clinical geneticists
- Researchers working in clinical setting who want high-quality sequencing data.
How do I start a research collaboration with Blueprint Genetics?
To start the process, please complete the Sequencing Service Request Form available on the website https://blueprintgenetics.com/sequencing-services/
with as much information as possible. Our Sequencing Service Project Coordinator will contact you once the details about your research project scope and expectations have been received.
After we have reviewed and agreed on the details, one of our sales representatives will provide you with the agreement that includes the service description and pricing. Our Sequencing Service Project Coordinator will provide you with detailed instructions on how to place your orders, how to package and ship the samples, as well as how the data will be provided to you.
What is the price of the service per sample?
Since research projects are very often different in scope, we evaluate each of them separately. A pricing quote will be sent to you by Sales Representative responsible for your territory, with a copy to our Sequencing Service Project Coordinator. The Project Coordinator is available to assist you with the practical aspects of your project. However, for pricing discussions, please contact your dedicated Sales Representative.
What should I expect when I download the file(s)?
The file size can be large, so we recommend that you reserve several hours for the download. It is possible to use a web browser to download the files, but we encourage you to use a command line utility to download files on a computer that is capable of storing and processing large quantities of data.
- The VCF file size (~2MB, WES and 400 MB, WGS) is not typically problematic in size for a regular computer. VCF files require bioinformatic tools for processing. However, it can be opened in any text editor or Excel after unzipping the gz-formatted VCF file.
- With BAM files (~6-10GB, WES and ~40 GB, WGS), it is important to use a computer that can store and process large quantities of data. One can use bioinformatic tools to access the BAM file and make variant calls (which generates a VCF) or visualize sequence reads and variants.
- With FASTQ files (8 files per sample, each ~1 GB for WES and 16 files per sample, each ~6 GB for WGS) the above-mentioned needs to be considered together with the fact that the number of files per sample is large.
- The md5sum file for confirming the integrity of the downloaded file(s) will be included.
What software do I need to read the raw sequencing data?
Analysis of the raw sequence data requires bioinformatics expertise and software.
For annotation and analysis of sequence variants in the VCF file, there are several commercial and non-commercial tools available. Commonly used and freely available command line tools include:
- VEP (https://www.ensembl.org/info/docs/tools/vep/index.html)
- ANNOVAR (http://annovar.openbioinformatics.org/en/latest/)
- SnpEff (http://snpeff.sourceforge.net)
There are also several non-commercial and commercial web browser-based or stand-alone software for variant annotation and analysis. For sequence read analysis, GATK is one of the most-used toolkits [https://software.broadinstitute.org/gatk/].
To visualize the content, of VCF or BAM files in genomic context the Integrative Genome Browser (IGV) can be used, both as a stand-alone application and a web-browser interface. [http://software.broadinstitute.org/software/igv/].
How long does Blueprint Genetics store data from research projects?
We do not store research data for an extended period; your data will be permanently deleted after you download the files. The ordering customer is both the data owner and data controller, and Blueprint Genetics does not store or use the generated sequencing information for any purpose. If you experience problems with downloading the files or require an extension for completing the download, please contact our Sequencing Service Project Coordinator at sequencing@blueprintgenetics.com as soon as possible.