Blueprint Genetics’ approach to pseudogenes and other duplicated genomic regions

At Blueprint Genetics, we are transparent about the limitations of our technology and ensure that you are also aware of them by including these in our comprehensive clinical statement. We are committed to resolving difficult-to-sequence regions that are hard to validate, interpret, and confirm.

Such regions include highly homologous and repetitive areas that are very challenging to map with any next-generation sequencing technology. This page lists these clinically relevant regions throughout the genome and Blueprint Genetics’ approach to handling them.

What is a pseudogene?

A pseudogene is a genomic region that has high sequence similarity (homology) to a known gene but is nonfunctional (ie, does not produce a functional final protein product). Usually, the DNA sequences of a pseudogene and of its functional parent gene are about 65% to 100% identical.

Pseudogenes tend to accumulate more variants than their parent genes as they are not often under selective pressure.

What is a segmental duplication?

A segmental duplication is a region in the genome where the sequence is duplicated and the similarity between the parent region and duplicated region is ≥90% over a length of ≥1 kilobases (≥1000 base pairs).

Pseudogenes are often located in regions of segmental duplication.

Why is it important to be aware of pseudogenes when ordering genetic testing?

Pseudogenes can complicate the analysis of sequence data generated from NGS because:

  • Segmental duplications can be indistinguishable from their parent region if a laboratory is using short-read NGS methods (75-300 bp reads depending on the chemistry and sequencing platform used).
  • High levels of sequence similarity complicate accurate read alignment (mapping) as shown in the figure below in (Figure 1). Sequence reads that map to several genomic positions are discarded in the analysis, which causes gaps in the sequence coverage.
  • If sequence reads containing a pseudogene-derived variant are mis-mapped to the parent gene, it may result in a false positive variant call.
  • If sequence reads containing a parent gene-derived variant are mis-mapped to the pseudogene, it may result in a false negative result.
  • Due to the high degree of sequence similarity, it can be difficult to design parent gene-specific Sanger sequencing primers.
  • We manually design all Sanger primers when confirming variants in regions with high homology and develop custom confirmation methods utilizing long-range PCR when necessary.

Figure 1. Mapping next-generation sequencing reads.

Confidence in read alignment decreases when sequence homology between the regions increases. Sequence reads are discarded when they align equally well to several genomic positions. The use of longer read length and paired-end sequencing improves read mapping.

How many clinically relevant genes are affected by regions of segmental duplication?

It is estimated that humans have >10,000 pseudogenes (GENCODE project).  Some of the genes on Blueprint Genetics’ panels and whole exome sequencing tests have pseudogenes or other homologous regions in the genome. Variant calling from NGS data from these regions may be unreliable due to the issues listed above. The sensitivity to detect variants in genes with pseudogenes is expected to be lower than the sensitivity achieved from regions without pseudogenes or segmental duplications.

For transparency’s sake, the genomic coordinates of the affected regions are listed below in a table. This table shows all genes and exons on our panels that are affected by >90% homology based on segmental duplications data extracted from the UCSC Genome Browser Database. In addition, we highlight affected genes with an asterisk (*) on our website and clinical statements to ensure health care providers are aware of this important limitation.

Are all genes with associated pseudogenes difficult to accurately analyze?

The degree to which a pseudogene impacts the ability to accurately detect and map variants in its parent gene depends on the degree of similarity (homology) between the duplicated region and the parent gene. Generally, variants in genes sharing 90%-98% homology with a pseudogene are still accurately detected and mapped. When homology is greater than 98%, accurate detection and mapping of variants is still possible, however, it becomes more difficult and may require specialized methods. These more severely affected exons (>98% homology) are also highlighted in the table along with exons that are completely removed from the analysis. See question:What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?’

Analytic validation of genes with pseudogenes is difficult.

There are regions of the genome, including segmentally duplicated regions, that are masked in reference sample data sets. This makes analytic validation of these masked regions extremely difficult.

For both our analytic validation and in-test quality control, we use gold-standard reference DNA samples with high-quality single nucleotide variants and insertions and deletions datasets provided by the Genome In a Bottle Consortium (GIAB).

We recognize the difficult-to-validate regions in our panels. Our bioinformatics data analysis pipeline flags variants located in these regions and confirmatory testing is performed for all variants not fulfilling specific quality control criteria to mitigate the risks associated with potential errors arising from the masked regions.

What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?

  • Customized target capture kit and chemistry increase specificity and our ability to discriminate between homologous regions
  • Paired-end 2 x 150 bps sequencing results in reads with high mapping quality including the majority of genomic regions with segmental duplication
  • Customized bioinformatics pipeline increases our ability to map reads accurately
  • Only sequence reads with a minimum mapping quality (MQ) of 20 are used in variant calling (ie, base call accuracy >99%)
  • Manual design of long-range PCR and Sanger sequencing primers to confirm variants in regions with high homology

What questions should you ask your lab of choice about segmentally duplicated regions?

  • What steps have you taken to try and resolve genes affected by segmental duplication and pseudogenes?
  • Do you use customized solutions to analyze affected regions?
  • Is information about pseudogenes and test limitations publicly available? If yes, where?
  • Do you highlight genes affected by pseudogenes when presenting panel content or whole exome sequencing?
  • Does the clinical report mention the possibility of pseudogene interference?
  • What is your mapping quality?

Genes affected by segmental duplication

 

GeneGenomic coordinates of duplicated regionTranscriptAffected exons
(>90% homology)
Severely affected exons
(>98% homology)
Exons excluded from analysis
ABCC6chr16:16295857-16318046NM_001171.51-91-9
ABCD1chrX:153006027-153009189NM_000033.37-10
ACTBchr7:5567378-5569288NM_001101.42-6
ACTG1chr17:79477715-79479380NM_001614.32-6
ACTN4chr19:39219635-39220072NM_004924.520-21
ADAMTSL2chr9:136419478-136419815NM_014694.3101011-19
ADIPOR1chr1:202910700-202911346NM_015999.57-8
AFG3L2chr18:12344130-12344246NM_006796.214
AGKchr7:141352586-141352724NM_018238.316
ALG1chr16:5127907-5134882NM_019109.46-12
ALMS1chr2:73826527-73830431NM_015120.417-21
ANKRD11chr16:89334885-89335071NM_013275.51313
ANOS1chrX:8501035-8507799NM_000216.310-14
AP4S1chr14:31562112-31562241NM_001254729.16
ARMC4chr10:28250496-28284109NM_018076.42-8, 109
ARSEchrX:2852872-2856298NM_000047.29-11
ASNSchr7:97481570-97498468NM_133436.33-13
ATAD3Achr1:1447648-1469452NM_001170535.21-16
B3GAT3chr11:62383172-62384819NM_012200.33-5
BCAP31chrX:152966391-152969549NM_005745.75-8
BDP1chr5:70860608-70860712NM_018429.239
BMPR1Achr10:88683132-88683476NM_004329.212-1312-13
BRAFchr7:140433811-140434570NM_004333.418
BRCA1chr17:41276033-41276113NM_007294.32
C2chr6:31867702-31869082NM_001178063.11
CACNA1Cchr12:2791115-2795435NM_001167625.143-45
CALM1chr14:90870212-90871061NM_006888.44-6
CD46chr1:207930358-207934791NM_002389.42-5
CEP290chr12:88442959-88443191NM_025114.354
CFHchr1:196658549-196659369NM_000186.38-9
CFHchr1:196682864-196683047NM_000186.310
CFHchr1:196712581-196716443NM_000186.320-22
CHEK2chr22:29083884-29091861NM_007194.311-15
CISD2chr4:103808497-103808587NM_001008388.43
CLCNKAchr1:16349114-16360153NM_004070.32-20
CLCNKBchr1:16370987-16383411NM_000085.42-20
CORO1Achr16:30199853-30199897NM_007074.3101011
COX10chr17:14082520-14095538NM_001303.366
CPchr3:148891500-148891517NM_000096.319
CRYBB2chr22:25623819-25627739NM_000496.24-6
CSF2RAchrX:1401596-1428482NM_006140.43-133-13
CUBNchr10:16866973-16883046NM_001081.361-67
CUBNchr10:16948201-16970302NM_001081.341-50
CYCSchr7:25163319-25163738NM_018947.52-3
CYP11B1chr8:143955788-143961229NM_000497.31-9
CYP21A2chr6:32006199-32009227NM_000500.71-101-10
DCLRE1Cchr10:14974852-14981868NM_001033855.24-9
DHFRchr5:79924905-79924984NM_000791.366
DICER1chr14:95556834-95557000NM_177438.227
DIS3L2chr2:233194522-233201908NM_152383.415-21
DNAH11chr7:21923908-21924028NM_001277115.176
DNAH11chr7:21940624-21940872NM_001277115.182
DNM1chr9:131015379-131016993NM_004408.321
DSEchr6:116756749-116758508NM_013352.36
DUOX2chr15:45402847-45404153NM_014080.45-8
EGLN1chr1:231502156-231502221NM_022051.25
ELK1chrX:47496227-47498737NM_005229.43-6
ELMO2chr20:45008891-45023121NM_133171.43-11
ERCC6chr10:50723243-50725167NM_001277059.16
ESPNchr1:6488285-6517432NM_031475.22-12
EYSchr6:66005755-66040367NM_001142800.112
F8chrX:154114408-154114577NM_019863.211
FANCD2chr3:10084733-10091189NM_033084.312-17
FANCD2chr3:10101977-10115046NM_033084.319-28
FAR1chr11:13750173-13750321NM_032228.512
FHL1chrX:135292029-135292184NM_001449.47
FLGchr1:152275831-152286484NM_002016.13
FLNCchr7:128496571-128498577NM_001458.444-48
FOXD4chr9:116799-118119NM_207305.411
FXNchr9:71687527-71689806NM_000144.45
GBAchr1:155204785-155210903NM_000157.31-11
GH1chr17:61994668-61996136NM_000515.41-5
GJA1chr6:121767993-121769142NM_000165.42
GKchrX:30746848-30746859NM_000167.51919
GLUD1chr10:88811507-88811627NM_005271.413
GLUD1chr10:88834307-88836413NM_005271.42-4
GOSR2chr17:45008464-45009565NM_004287.43-4
GUSBchr7:65429309-65429445NM_000181.311
HBA1chr16:226715-227410NM_000558.41-3
HBA2chr16:222911-223599NM_000517.41-3
HNRNPA1chr12:54677603-54678097NM_031157.39-10
HPS1chr10:100193696-100195529NM_000195.34-6
HSPD1chr2:198351769-198353971NM_002156.49-12
HYDINchr16:70852244-71186686NM_001270974.27, 9-11, 13-17, 19, 22, 24-25, 28-30, 32-34, 36, 38-44, 46, 48-49, 51, 53-56, 59-63, 65-69, 71-74, 76-77, 79-81, 847, 9-11, 13-17, 19, 22, 24-25, 28-30, 32-34, 36, 38-44, 46, 48-49, 51, 53-56, 59-63, 65-69, 71-74, 76-77, 79-81, 846, 8, 12, 18, 20-21, 23, 26-27, 31, 35, 37, 45, 47, 50, 52, 57-58, 64, 70, 75, 78, 82-83
IDSchrX:148584841-148585745NM_000202.72-32-3
IFT122chr3:129200372-129218911NM_052985.315-20
IGLL1chr22:23915452-23917272NM_020070.32-3
KANSL1chr17:44171925-44172067NM_001193466.13
KCTD1chr18:24035706-24039889NM_001258221.14-5
KIF1Cchr17:4925522-4927446NM_006612.522-23
KRASchr12:25362444-25362845NM_033360.26
KRT14chr17:39738686-39743086NM_000526.41-8
KRT16chr17:39766186-39768940NM_005557.31-8
KRT17chr17:39775845-39780761NM_000422.21-8
KRT6Achr12:52881503-52886972NM_005554.31-9
KRT6Bchr12:52840973-52845862NM_005555.31-9
KRT6Cchr12:52862845-52867521NM_173086.41-9
LEFTY2chr1:226125140-226128840NM_003240.41-4
LRP5chr11:68080182-68080273NM_002335.21
LRP5chr11:68125117-68174281NM_002335.23-9
MAT2Achr2:85770097-85770895NM_005911.58-9
MID1chrX:10417407-10417479NM_000381.310
MOCS1chr6:39874132-39874889NM_005943.510
MSNchrX:64958386-64959755NM_002444.211-13
MSX2chr5:174156167-174156586NM_002449.42
MYO5Bchr18:47352840-47352993NM_001080467.240
NCF1chr7:74191612-74203048NM_000265.52-4, 6-7, 102-4, 6-7, 101, 5, 8, 9, 11
NEBchr2:152435850-152465190NM_001271208.182-10582-105
NECAP1chr12:8248196-8248686NM_015509.37-8
NEFHchr22:29884837-29886692NM_021076.34
NF1chr17:29527439-29528503NM_000267.39-11
NF1chr17:29541468-29563039NM_000267.313-29
NF1chr17:29585361-29592357NM_000267.331-35
NOTCH2chr1:120539619-120612206NM_024408.31-41-4
NXF5chrX:101087239-101097764NM_032946.23-16
OCLNchr5:68840730-68849498NM_002538.36, 96, 95,7,8
OTOAchr16:21742157-21771861NM_144672.320-21, 2820-21, 2822-27
PARNchr16:14530572-14530629NM_002582.324
PBX1chr1:164818407-164818639NM_002585.39
PIGAchrX:15339627-15343274NM_002641.34-6
PIGNchr18:59763080-59763183NM_012327.522
PIK3CAchr3:178935997-178938945NM_006218.210-1410-14
PIK3CDchr1:9787004-9787104NM_005026.324
PKD1chr16:2147417-2185690NM_001009944.21-331
PKP2chr12:32945357-32945665NM_004572.313-14
PMS2chr7:6017218-6027251NM_000535.611-1411-1415
PMS2chr7:6031603-6031688NM_000535.69
PMS2chr7:6042083-6048650NM_000535.61-5
PNPT1chr2:55863371-55863527NM_033109.428
POLHchr6:43555008-43555226NM_006502.24
PRODHchr22:18900687-18910692NM_016335.46-15
PRODHchr22:18923527-18923800NM_016335.42
PROS1chr3:93593088-93647641NM_000313.32-15
PRPS1chrX:106893169-106893262NM_002764.37
PRSS1chr7:142457335-142460871NM_002769.41-5
PTENchr10:89725043-89725229NM_000314.499
RAD21chr8:117859738-117859927NM_006265.214
RBM8Achr1:145507666-145509211NM_005105.41-2, 4-61-2, 4-63
RBPJchr4:26431519-26432629NM_005349.310-12
RDXchr11:110102559-110102758NM_002906.314
RMND1chr6:151766442-151766946NM_017909.32
RNF216chr7:5764954-5770448NM_207111.36-8
RNF216chr7:5800632-5800700NM_207111.32
RPL15chr3:23959350-23962334NM_002948.42-4
SALL1chr16:51171022-51176056NM_002968.22-3
SBDSchr7:66453357-66460404NM_016038.21-5
SDHAchr5:218470-256535NM_004168.31-15
SHOXchrX:591590-619564NM_000451.32-62-6
SLC25A15chr13:41367362-41367417NM_014252.32
SLC25A15chr13:41382573-41383803NM_014252.36-7
SLC33A1chr3:155545998-155546166NM_004733.36
SLC6A8chrX:152954029-152960669NM_005629.31-13
SMN1chr5:70220930-70248259NM_000344.31-81-8
SMN2chr5:69345512-69372860NM_022875.21-81-8
SOX2chr3:181430572-181431102NM_003106.31
SPTLC1chr9:94871021-94871116NM_006415.33
SRD5A3chr4:56233760-56236258NM_024592.44-5
SRP72chr4:57367949-57368027NM_006947.319
STAT5Bchr17:40370168-40371860NM_012448.36-9
STRCchr15:43891869-44010382NM_153700.219-2919-291-18
SYT14chr1:210334073-210334387NM_001146261.210
TARDBPchr1:11082180-11083305NM_007375.36
TBL1XR1chr3:176743285-176743312NM_024665.416
TBX20chr7:35242041-35280649NM_001077653.25-8
TIMM8AchrX:100600648-100601648NM_004085.32
TPM3chr1:154130114-154130197NM_153649.38
TPMTchr6:18130898-18131011NM_000367.49
TRAPPC2chrX:13732525-13732624NM_001011658.36
TRIP11chr14:92436016-92436237NM_004239.421
TTNchr2:179519171-179527539NM_001267550.1175-192175-192
TUBA1Achr12:49578792-49580616NM_006009.32-4
TUBB2Achr6:3154100-3156386NM_001069.22-44
TUBB2Bchr6:3224988-3226045NM_178012.444
TUBB3chr16:90001136-90002212NM_006086.34
TUBB4Achr19:6495174-6496232NM_006087.34
TUBG1chr17:40765664-40766675NM_001070.47-10
TYRchr11:89017940-89028534NM_000372.44-5
UBA5chr3:132394091-132395370NM_024818.39-12
UBE3Achr15:25615712-25616959NM_130838.23
UNC93B1chr11:67759016-67763355NM_030930.39-11
USP18chr22:18642938-18656609NM_017414.33-103-1011
VPS35chr16:46705616-46717518NM_018206.52-12
VWFchr12:6120782-6135212NM_000552.323-34
WRNchr8:30941214-30942762NM_000553.410-11
XIAPchrX:123040837-123041031NM_001167.37
ZEB2chr2:145147017-145147590NM_014795.310
ZNF341chr20:32378793-32379323NM_032819.415

 

Genes affected by segmental duplication (prior to August 2019 update)

Last modified: August 19, 2020