Policy for handling pseudogene and other duplicated genomic regions in the Blueprint Genetics sequencing panels

Pseudogenes are characterized by a combination of homology to a known gene and non-functionality. Every pseudogene has a DNA sequence that is similar to some functional gene (usually between 40% and 100% of the sequences are identical), nonetheless, they are unable to produce functional final protein products. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon-intron structure and promoter sequences. Gene duplication is one of the evolutionary processes giving rise to pseudogenes. Mutations that disrupt either the structure or the function of one of the copies of a duplicated gene are not necessarily deleterious and may not be removed through the selection process. As a result, the gene copy that has been mutated may gradually become a pseudogene and will be either unexpressed or functionless.

Existence of pseudogenes and other duplicated regions in the genome cause three types of problems in the sequence analysis. First, segmental duplications (defined as regions in the genome where sequence similarity is ≥90% over a length of ≥1 kilobases) where pseudogenes are often located are indistinguishable using short read sequencing methods. Next-generation sequencing reads are usually few hundred bases in length and cannot be accurately aligned to either in the pseudogene or it’s parent gene. However, Blueprint Genetics’ laboratory assay defeats many other short read NGS assays as it utilizes 2 x 150 bps paired-end sequencing that enable reads with high mapping quality from most of the duplicated regions. Sequence reads with ambiguous alignment results (mapping to several genomic positions) are discarded in the analysis, which causes gaps in the sequence coverage. Secondly, non-functional pseudogenes are not under selective pressure, hence, they accumulate more variation than their parent gene counterparts. Sequencing errors might cause mismapping of the variable pseudogene sequences and interference with the results obtained for the parent gene. Thirdly, due to high degree of sequence similarity, it is difficult to design Sanger sequencing primers that would not cross-react with pseudogene sequences. Therefore, direct Sanger sequencing of PCR products may not be used to confirm findings from genomic regions that are affected by pseudogenes.

Target regions of the Blueprint Genetics panels contain 3% of DNA sequence that has been recognized as segmental duplications by UCSC. Variant calling from these regions may be unreliable, at least sensitivity is expected to be lower than sensitivity achieved from non-duplicated regions. Genomic coordinates of the segmental duplication regions and the affected protein-coding exons in genes that are included in the Blueprint Genetics sequence analysis panels have been described in Table below. Additionally, genes included in the panels that are completely affected by the segmental duplications have been listed here.

Pseudogene and other duplicated genomic regions are partly overlapping with difficult to validate regions. For validation and in-process quality control, we are using golden-standard DNA samples accompanied with high-quality single nucleotide variants and insertions and deletions datasets provided by Genome In A Bottle (GIAB) (1). Even in the best variant call data, 12% of the genomic regions covered by the Blueprint Genetics panels are masked as unreliable and are therefore unavailable for validation studies or quality control. Genomic regions are masked in reference data set due to low coverage, discrepant genotype calls between alternative sequencing technologies, evidence of systematic biases (sequencing errors, local alignment problems, mapping problems, or abnormal allele balance) and co-localization with known genomic complexities (deletions, segmental duplications and structural variants). Therefore, any estimation of the assay’s analytical accuracy reflects the accuracy in the regions in the genome that are not affected by masking. We have applied orthogonal confirmatory testing to demonstrate that the masked regions outside the segmental duplications can be accurately analyzed using our assays, hence the difficult to validate regions (excluding segmental duplications) are included in the analysis of the patient samples, and we have successfully identified pathogenic mutations in these regions. The difficult to validate regions in the Blueprint Genetics panels are recognized in the interpretation of the results and confirmatory testing is performed to mitigate the risks associated with potential errors arising from the masked regions.

  • Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotech 2014;32:246-51.
Gene Genomic coordinates of duplicated region Exon number in protein coding consensus sequence
ABCC6 chr16:16295857-16318046 1-10
ABCD1 chrX:153006027-153009189 1-4
ACTB chr7:5567378-5569288 1-6
ACTG1 chr17:79477715-79479380 1-5
ACTN4 chr19:39219635-39220072 4-5
ADAMTSL2 chr9:136419478-136419815 1
ADIPOR1 chr1:202910700-202911346 6-7
AFG3L2 chr18:12344130-12344246 14
AGK chr7:141352586-141352724 1
ALG1 chr16:5127907-5134882 1-8
ALG11 chr13:52604280-52605241 1
ALMS1 chr2:73826527-73830431 3-7
ANKRD11 chr16:89334885-89335071 14
ANOS1 chrX:8501035-8507799 10-14
AP1S2 chrX:15848184-15851132 7-8
AP4S1 chr14:31562112-31562241 1
ARMC4 chr10:28250496-28284109 1-8
ARSE chrX:2852872-2856298 11-13
ASNS chr7:97481570-97498468 1-11
ATAD3A chr1:1447648-1469452 1-17
B3GAT3 chr11:62383172-62384819 3-5
BCAP31 chrX:152966391-152969549 5-8
BDP1 chr5:70860608-70860712 1
BMPR1A chr10:88683132-88683476 1-2
BRAF chr7:140433811-140434570 19
BRCA1 chr17:41276033-41276113 1
C2 chr6:31867702-31869082 20
CACNA1C chr12:2791115-2795435 3-6
CALM1 chr14:90870212-90871061 1-3
CD46 chr1:207930358-207934791 10-13
CEP290 chr12:88442959-88443191 55
CFH chr1:196658549-196659369 17-18
CFH chr1:196682864-196683047 13
CFH chr1:196712581-196716443 1-3
CHEK2 chr22:29083884-29091861 11-15
CISD2 chr4:103808497-103808587 1
CLCNKA chr1:16349114-16360153 1-19
CLCNKB chr1:16370987-16383411 1-19
CORO1A chr16:30199853-30199897 1
COX10 chr17:14082520-14095538 2-4
CP chr3:148891500-148891517 19
CRYBB2 chr22:25623819-25627739 1-3
CSF2RA chrX:1401596-1428482 1-13
CUBN chr10:16866973-16883046 61-67
CUBN chr10:16948201-16970302 41-50
CYCS chr7:25163319-25163738 1-2
CYP11B1 chr8:143955788-143961229 1-11
CYP21A2 chr6:32006199-32009227 1-11
DCLRE1C chr10:14974852-14981868 5-10
DHFR chr5:79924905-79924984 7
DICER1 chr14:95556834-95557000 26
DIS3L2 chr2:233194522-233201908 2-7
DNAH11 chr7:21923908-21924028 7
DNAH11 chr7:21940624-21940872 1
DNM1 chr9:131015379-131016993 1-2
DSE chr6:116756749-116758508 1
DUOX2 chr15:45402847-45404153 4-7
EGLN1 chr1:231502156-231502221 5
ELK1 chrX:47496227-47498737 2-5
ELMO2 chr20:45008891-45023121 1-10
ERCC6 chr10:50723243-50725167 5
ESPN chr1:6488285-6517432 2-13
EYS chr6:66005755-66040367 10-11
F8 chrX:154114408-154114577 25
FANCD2 chr3:10084733-10091189 28-33
FANCD2 chr3:10101977-10115046 17-26
FAR1 chr11:13750173-13750321 1
FHL1 chrX:135292029-135292184 1
FLG chr1:152275831-152286484 2
FLNC chr7:128496571-128498577 1-5
FOXD4 chr9:116799-118119 1
FXN chr9:71687527-71689806 7-8
GBA chr1:155204785-155210903 1-11
GH1 chr17:61994668-61996136 1-5
GJA1 chr6:121767993-121769142 1
GK chrX:30746848-30746859 1
GLUD1 chr10:88811507-88811627 17
GLUD1 chr10:88834307-88836413 6-8
GOSR2 chr17:45008464-45009565 4-5
GUSB chr7:65429309-65429445 11
HARS chr5:140052855-140052973 18
HBA1 chr16:226715-227410 1-3
HBA2 chr16:222911-223599 1-3
HNRNPA1 chr12:54677603-54678097 1-2
HPS1 chr10:100193696-100195529 2-4
HPS3 chr3:148891500-148891517 1
HSPD1 chr2:198351769-198353971 8-11
HYDIN chr16:70852244-71186686 6-61
IDS chrX:148584841-148585745 2-3
IFT122 chr3:129200372-129218911 12-17
IGLL1 chr22:23915452-23917272 2-3
KANSL1 chr17:44171925-44172067 2
KCTD1 chr18:24035706-24039889 8-9
KIF1C chr17:4925522-4927446 1-2
KRAS chr12:25362444-25362845 5
KRT14 chr17:39738686-39743086 1-8
KRT16 chr17:39766186-39768940 1-8
KRT17 chr17:39775845-39780761 1-7
KRT6A chr12:52881503-52886972 1-9
KRT6B chr12:52840973-52845862 1-9
KRT6C chr12:52862845-52867521 1-9
LAS1L chrX:64738719-64738873 13
LEFTY2 chr1:226125140-226128840 1-4
LRP5 chr11:68080182-68080273 24
LRP5 chr11:68125117-68174281 16-22
LYST chr1:235901308-235901894 34
MAT2A chr2:85770097-85770895 1-2
MID1 chrX:10417407-10417479 11
MOCS1 chr6:39874132-39874889 11
MSN chrX:64958386-64959755 1-3
MSX2 chr5:174156167-174156586 1
MYO5B chr18:47352840-47352993 40
NCF1 chr7:74191612-74203048 1-6
NEB chr2:152435850-152465190 80-103
NECAP1 chr12:8248196-8248686 1-2
NEFH chr22:29884837-29886692 1
NF1 chr17:29527439-29528503 51-53
NF1 chr17:29541468-29563039 33-49
NF1 chr17:29585361-29592357 26-30
NOTCH2 chr1:120539619-120612206 1-5
NXF5 chrX:101087239-101097764 1-17
OCLN chr5:68840730-68849498 1-2
OTOA chr16:21742157-21771861 1-3
PARN chr16:14530572-14530629 24
PBX1 chr1:164818407-164818639 1
PIGA chrX:15339627-15343274 4-6
PIGN chr18:59763080-59763183 20
PIK3CA chr3:178935997-178938945 9-13
PIK3CA chr3:178957783-178957852 1
PIK3CD chr1:9787004-9787104 1
PKD1 chr16:2147417-2185690 1-36
PKP2 chr12:32945357-32945665 13-14
PMS2 chr7:6017218-6027251 11-14
PMS2 chr7:6031603-6031688 9
PMS2 chr7:6042083-6048650 1-5
PNPT1 chr2:55863371-55863527 28
POLH chr6:43555008-43555226 8
PPT1 chr1:40537121-40537205 10
PRODH chr22:18900687-18910692 5-14
PRODH chr22:18923527-18923800 1
PROS1 chr3:93593088-93647641 2-16
PRPS1 chrX:106893169-106893262 1
PRSS1 chr7:142457335-142460871 1-5
PTEN chr10:89725043-89725229 1
RAD21 chr8:117859738-117859927 13
RBM8A chr1:145507666-145509211 1-5
RBPJ chr4:26431519-26432629 1-3
RDX chr11:110102559-110102758 13
RMND1 chr6:151766442-151766946 1
RNF216 chr7:5764954-5770448 5-7
RNF216 chr7:5800632-5800700 1
ROBO2 chr3:75986644-75986753 29
RPL15 chr3:23959350-23962334 1-3
SALL1 chr16:51171022-51176056 3-4
SBDS chr7:66453357-66460404 1-5
SDHA chr5:218470-256535 1-15
SHOX chrX:591590-619564 1-6
SLC25A15 chr13:41367362-41367417 7
SLC25A15 chr13:41382573-41383803 1-2
SLC33A1 chr3:155545998-155546166 6
SLC6A8 chrX:152954029-152960669 1-14
SMN1 chr5:70220930-70248259 1-8
SMN2 chr5:69345512-69372860 1-8
SOX2 chr3:181430572-181431102 1
SPTLC1 chr9:94871021-94871116 3
SRD5A3 chr4:56233760-56236258 1-2
SRP72 chr4:57367949-57368027 1
STAT5B chr17:40370168-40371860 5-8
STRC chr15:43891869-44010382 1-17
SYT14 chr1:210334073-210334387 1
TARDBP chr1:11082180-11083305 1
TBL1XR1 chr3:176743285-176743312 16
TBX20 chr7:35242041-35280649 5-8
TIMM8A chrX:100600648-100601648 3
TPM3 chr1:154130114-154130197 17
TPMT chr6:18130898-18131011 8
TRAPPC2 chrX:13732525-13732624 5
TRIP11 chr14:92436016-92436237 21
TSPEAR chr21:45993635-45994841 3
TTN chr2:179519171-179527539 175-192
TUBA1A chr12:49578792-49580616 3-5
TUBB2A chr6:3154100-3156386 2-4
TUBB2B chr6:3224988-3226045 4
TUBB3 chr16:90001136-90002212 1
TUBB4A chr19:6495174-6496232 4
TUBG1 chr17:40765664-40766675 2-5
TYR chr11:89017940-89028534 1-2
UBA5 chr3:132394091-132395370 1-4
UBE3A chr15:25615712-25616959 6
UNC93B1 chr11:67759016-67763355 9-11
USP18 chr22:18642938-18656609 1-8
VPS35 chr16:46705616-46717518 3-13
VWF chr12:6120782-6135212 22-33
WRN chr8:30941214-30942762 25-26
XIAP chrX:123040837-123041031 1
ZEB2 chr2:145147017-145147590 10
ZNF341 chr20:32378793-32379323 1
Last modified: 04.20.2018