At Blueprint Genetics, we are transparent about the limitations of our technology and ensure that you are also aware of them by including these in our comprehensive clinical statement. We are committed to resolving difficult-to-sequence regions that are hard to validate, interpret, and confirm.
Such regions include highly homologous and repetitive areas that are very challenging to map with any next-generation sequencing technology. This page lists these clinically relevant regions throughout the genome and Blueprint Genetics’ approach to handling them.
What is a pseudogene?
A pseudogene is a genomic region that has high sequence similarity (homology) to a known gene but is nonfunctional (ie, does not produce a functional final protein product). Usually, the DNA sequences of a pseudogene and of its functional parent gene are about 65% to 100% identical.
Pseudogenes tend to accumulate more variants than their parent genes as they are not often under selective pressure.
What is a segmental duplication?
A segmental duplication is a region in the genome where the sequence is duplicated and the similarity between the parent region and duplicated region is ≥90% over a length of ≥1 kilobases (≥1000 base pairs).
Pseudogenes are often located in regions of segmental duplication.
Why is it important to be aware of pseudogenes when ordering genetic testing?
Pseudogenes can complicate the analysis of sequence data generated from NGS because:
- Segmental duplications can be indistinguishable from their parent region if a laboratory is using short-read NGS methods (75-300 bp reads depending on the chemistry and sequencing platform used).
- High levels of sequence similarity complicate accurate read alignment (mapping) as shown in the figure below in (Figure 1). Sequence reads that map to several genomic positions are discarded in the analysis, which causes gaps in the sequence coverage.
- If sequence reads containing a pseudogene-derived variant are mis-mapped to the parent gene, it may result in a false positive variant call.
- If sequence reads containing a parent gene-derived variant are mis-mapped to the pseudogene, it may result in a false negative result.
- Due to the high degree of sequence similarity, it can be difficult to design parent gene-specific Sanger sequencing primers.
- We manually design all Sanger primers when confirming variants in regions with high homology and develop custom confirmation methods utilizing long-range PCR when necessary.
Figure 1. Mapping next-generation sequencing reads.
Confidence in read alignment decreases when sequence homology between the regions increases. Sequence reads are discarded when they align equally well to several genomic positions. The use of longer read length and paired-end sequencing improves read mapping.
How many clinically relevant genes are affected by regions of segmental duplication?
It is estimated that humans have >10,000 pseudogenes (GENCODE project). Some of the genes on Blueprint Genetics’ panels and whole exome sequencing tests have pseudogenes or other homologous regions in the genome. Variant calling from NGS data from these regions may be unreliable due to the issues listed above. The sensitivity to detect variants in genes with pseudogenes is expected to be lower than the sensitivity achieved from regions without pseudogenes or segmental duplications.
For transparency’s sake, the genomic coordinates of the affected regions are listed below in a table. This table shows all genes and exons on our panels that are affected by >90% homology based on segmental duplications data extracted from the UCSC Genome Browser Database. In addition, we highlight affected genes with an asterisk (*) on our website and clinical statements to ensure health care providers are aware of this important limitation.
Are all genes with associated pseudogenes difficult to accurately analyze?
The degree to which a pseudogene impacts the ability to accurately detect and map variants in its parent gene depends on the degree of similarity (homology) between the duplicated region and the parent gene. Generally, variants in genes sharing 90%-98% homology with a pseudogene are still accurately detected and mapped. When homology is greater than 98%, accurate detection and mapping of variants is still possible, however, it becomes more difficult and may require specialized methods. These more severely affected exons (>98% homology) are also highlighted in the table along with exons that are completely removed from the analysis. See question: ‘What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?’
Analytic validation of genes with pseudogenes is difficult.
There are regions of the genome, including segmentally duplicated regions, that are masked in reference sample data sets. This makes analytic validation of these masked regions extremely difficult.
For both our analytic validation and in-test quality control, we use gold-standard reference DNA samples with high-quality single nucleotide variants and insertions and deletions datasets provided by the Genome In a Bottle Consortium (GIAB).
We recognize the difficult-to-validate regions in our panels. Our bioinformatics data analysis pipeline flags variants located in these regions and confirmatory testing is performed for all variants not fulfilling specific quality control criteria to mitigate the risks associated with potential errors arising from the masked regions.
What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?
- Customized target capture kit and chemistry increase specificity and our ability to discriminate between homologous regions
- Paired-end 2 x 150 bps sequencing results in reads with high mapping quality including the majority of genomic regions with segmental duplication
- Customized bioinformatics pipeline increases our ability to map reads accurately
- Only sequence reads with a minimum mapping quality (MQ) of 20 are used in variant calling (ie, base call accuracy >99%)
- Manual design of long-range PCR and Sanger sequencing primers to confirm variants in regions with high homology
What questions should you ask your lab of choice about segmentally duplicated regions?
- What steps have you taken to try and resolve genes affected by segmental duplication and pseudogenes?
- Do you use customized solutions to analyze affected regions?
- Is information about pseudogenes and test limitations publicly available? If yes, where?
- Do you highlight genes affected by pseudogenes when presenting panel content or whole exome sequencing?
- Does the clinical report mention the possibility of pseudogene interference?
- What is your mapping quality?
Genes affected by segmental duplication
Gene | Genomic coordinates of duplicated region | Transcript | Affected exons (>90% homology) |
Severely affected exons (>98% homology) |
Exons excluded from analysis |
ABCC6 | chr16:16295857-16318046 | NM_001171.5 | 1-9 | 1-9 | |
ABCD1 | chrX:153006027-153009189 | NM_000033.3 | 7-10 | ||
ACTB | chr7:5567378-5569288 | NM_001101.4 | 2-6 | ||
ACTG1 | chr17:79477715-79479380 | NM_001614.3 | 2-6 | ||
ACTN4 | chr19:39219635-39220072 | NM_004924.5 | 20-21 | ||
ADAMTSL2 | chr9:136419478-136419815 | NM_014694.3 | 10 | 10 | 11-19 |
ADIPOR1 | chr1:202910700-202911346 | NM_015999.5 | 7-8 | ||
AFG3L2 | chr18:12344130-12344246 | NM_006796.2 | 14 | ||
AGK | chr7:141352586-141352724 | NM_018238.3 | 16 | ||
ALG1 | chr16:5127907-5134882 | NM_019109.4 | 6-12 | ||
ALMS1 | chr2:73826527-73830431 | NM_015120.4 | 17-21 | ||
ANKRD11 | chr16:89334885-89335071 | NM_013275.5 | 13 | 13 | |
ANOS1 | chrX:8501035-8507799 | NM_000216.3 | 10-14 | ||
AP4S1 | chr14:31562112-31562241 | NM_001254729.1 | 6 | ||
ARMC4 | chr10:28250496-28284109 | NM_018076.4 | 2-8, 10 | 9 | |
ARSE | chrX:2852872-2856298 | NM_000047.2 | 9-11 | ||
ASNS | chr7:97481570-97498468 | NM_133436.3 | 3-13 | ||
ATAD3A | chr1:1447648-1469452 | NM_001170535.2 | 1-16 | ||
B3GAT3 | chr11:62383172-62384819 | NM_012200.3 | 3-5 | ||
BCAP31 | chrX:152966391-152969549 | NM_005745.7 | 5-8 | ||
BDP1 | chr5:70860608-70860712 | NM_018429.2 | 39 | ||
BMPR1A | chr10:88683132-88683476 | NM_004329.2 | 12-13 | 12-13 | |
BRAF | chr7:140433811-140434570 | NM_004333.4 | 18 | ||
BRCA1 | chr17:41276033-41276113 | NM_007294.3 | 2 | ||
C2 | chr6:31867702-31869082 | NM_001178063.1 | 1 | ||
CACNA1C | chr12:2791115-2795435 | NM_001167625.1 | 43-45 | ||
CALM1 | chr14:90870212-90871061 | NM_006888.4 | 4-6 | ||
CD46 | chr1:207930358-207934791 | NM_002389.4 | 2-5 | ||
CEP290 | chr12:88442959-88443191 | NM_025114.3 | 54 | ||
CFH | chr1:196658549-196659369 | NM_000186.3 | 8-9 | ||
CFH | chr1:196682864-196683047 | NM_000186.3 | 10 | ||
CFH | chr1:196712581-196716443 | NM_000186.3 | 20-22 | ||
CHEK2 | chr22:29083884-29091861 | NM_007194.3 | 11-15 | ||
CISD2 | chr4:103808497-103808587 | NM_001008388.4 | 3 | ||
CLCNKA | chr1:16349114-16360153 | NM_004070.3 | 2-20 | ||
CLCNKB | chr1:16370987-16383411 | NM_000085.4 | 2-20 | ||
CORO1A | chr16:30199853-30199897 | NM_007074.3 | 10 | 10 | 11 |
COX10 | chr17:14082520-14095538 | NM_001303.3 | 6 | 6 | |
CP | chr3:148891500-148891517 | NM_000096.3 | 19 | ||
CRYBB2 | chr22:25623819-25627739 | NM_000496.2 | 4-6 | ||
CSF2RA | chrX:1401596-1428482 | NM_006140.4 | 3-13 | 3-13 | |
CUBN | chr10:16866973-16883046 | NM_001081.3 | 61-67 | ||
CUBN | chr10:16948201-16970302 | NM_001081.3 | 41-50 | ||
CYCS | chr7:25163319-25163738 | NM_018947.5 | 2-3 | ||
CYP11B1 | chr8:143955788-143961229 | NM_000497.3 | 1-9 | ||
CYP21A2 | chr6:32006199-32009227 | NM_000500.7 | 1-10 | 1-10 | |
DCLRE1C | chr10:14974852-14981868 | NM_001033855.2 | 4-9 | ||
DHFR | chr5:79924905-79924984 | NM_000791.3 | 6 | 6 | |
DICER1 | chr14:95556834-95557000 | NM_177438.2 | 27 | ||
DIS3L2 | chr2:233194522-233201908 | NM_152383.4 | 15-21 | ||
DNAH11 | chr7:21923908-21924028 | NM_001277115.1 | 76 | ||
DNAH11 | chr7:21940624-21940872 | NM_001277115.1 | 82 | ||
DNM1 | chr9:131015379-131016993 | NM_004408.3 | 21 | ||
DSE | chr6:116756749-116758508 | NM_013352.3 | 6 | ||
DUOX2 | chr15:45402847-45404153 | NM_014080.4 | 5-8 | ||
EGLN1 | chr1:231502156-231502221 | NM_022051.2 | 5 | ||
ELK1 | chrX:47496227-47498737 | NM_005229.4 | 3-6 | ||
ELMO2 | chr20:45008891-45023121 | NM_133171.4 | 3-11 | ||
ERCC6 | chr10:50723243-50725167 | NM_001277059.1 | 6 | ||
ESPN | chr1:6488285-6517432 | NM_031475.2 | 2-12 | ||
EYS | chr6:66005755-66040367 | NM_001142800.1 | 12 | ||
F8 | chrX:154114408-154114577 | NM_019863.2 | 1 | 1 | |
FANCD2 | chr3:10084733-10091189 | NM_033084.3 | 12-17 | ||
FANCD2 | chr3:10101977-10115046 | NM_033084.3 | 19-28 | ||
FAR1 | chr11:13750173-13750321 | NM_032228.5 | 12 | ||
FHL1 | chrX:135292029-135292184 | NM_001449.4 | 7 | ||
FLG | chr1:152275831-152286484 | NM_002016.1 | 3 | ||
FLNC | chr7:128496571-128498577 | NM_001458.4 | 44-48 | ||
FOXD4 | chr9:116799-118119 | NM_207305.4 | 1 | 1 | |
FXN | chr9:71687527-71689806 | NM_000144.4 | 5 | ||
GBA | chr1:155204785-155210903 | NM_000157.3 | 1-11 | ||
GH1 | chr17:61994668-61996136 | NM_000515.4 | 1-5 | ||
GJA1 | chr6:121767993-121769142 | NM_000165.4 | 2 | ||
GK | chrX:30746848-30746859 | NM_000167.5 | 19 | 19 | |
GLUD1 | chr10:88811507-88811627 | NM_005271.4 | 13 | ||
GLUD1 | chr10:88834307-88836413 | NM_005271.4 | 2-4 | ||
GOSR2 | chr17:45008464-45009565 | NM_004287.4 | 3-4 | ||
GUSB | chr7:65429309-65429445 | NM_000181.3 | 11 | ||
HBA1 | chr16:226715-227410 | NM_000558.4 | 1-3 | ||
HBA2 | chr16:222911-223599 | NM_000517.4 | 1-3 | ||
HNRNPA1 | chr12:54677603-54678097 | NM_031157.3 | 9-10 | ||
HPS1 | chr10:100193696-100195529 | NM_000195.3 | 4-6 | ||
HSPD1 | chr2:198351769-198353971 | NM_002156.4 | 9-12 | ||
HYDIN | chr16:70852244-71186686 | NM_001270974.2 | 7, 9-11, 13-17, 19, 22, 24-25, 28-30, 32-34, 36, 38-44, 46, 48-49, 51, 53-56, 59-63, 65-69, 71-74, 76-77, 79-81, 84 | 7, 9-11, 13-17, 19, 22, 24-25, 28-30, 32-34, 36, 38-44, 46, 48-49, 51, 53-56, 59-63, 65-69, 71-74, 76-77, 79-81, 84 | 6, 8, 12, 18, 20-21, 23, 26-27, 31, 35, 37, 45, 47, 50, 52, 57-58, 64, 70, 75, 78, 82-83 |
IDS | chrX:148584841-148585745 | NM_000202.7 | 2-3 | 2-3 | |
IFT122 | chr3:129200372-129218911 | NM_052985.3 | 15-20 | ||
IGLL1 | chr22:23915452-23917272 | NM_020070.3 | 2-3 | ||
KANSL1 | chr17:44171925-44172067 | NM_001193466.1 | 3 | ||
KCTD1 | chr18:24035706-24039889 | NM_001258221.1 | 4-5 | ||
KIF1C | chr17:4925522-4927446 | NM_006612.5 | 22-23 | ||
KRAS | chr12:25362444-25362845 | NM_033360.2 | 6 | ||
KRT14 | chr17:39738686-39743086 | NM_000526.4 | 1-8 | ||
KRT16 | chr17:39766186-39768940 | NM_005557.3 | 1-8 | ||
KRT17 | chr17:39775845-39780761 | NM_000422.2 | 1-8 | ||
KRT6A | chr12:52881503-52886972 | NM_005554.3 | 1-9 | ||
KRT6B | chr12:52840973-52845862 | NM_005555.3 | 1-9 | ||
KRT6C | chr12:52862845-52867521 | NM_173086.4 | 1-9 | ||
LEFTY2 | chr1:226125140-226128840 | NM_003240.4 | 1-4 | ||
LRP5 | chr11:68080182-68080273 | NM_002335.2 | 1 | ||
LRP5 | chr11:68125117-68174281 | NM_002335.2 | 3-9 | ||
MAT2A | chr2:85770097-85770895 | NM_005911.5 | 8-9 | ||
MID1 | chrX:10417407-10417479 | NM_000381.3 | 10 | ||
MOCS1 | chr6:39874132-39874889 | NM_005943.5 | 10 | ||
MSN | chrX:64958386-64959755 | NM_002444.2 | 11-13 | ||
MSX2 | chr5:174156167-174156586 | NM_002449.4 | 2 | ||
MYO5B | chr18:47352840-47352993 | NM_001080467.2 | 40 | ||
NCF1 | chr7:74191612-74203048 | NM_000265.5 | 2-4, 6-7, 10 | 2-4, 6-7, 10 | 1, 5, 8, 9, 11 |
NEB | chr2:152435850-152465190 | NM_001271208.1 | 82-105 | 82-105 | |
NECAP1 | chr12:8248196-8248686 | NM_015509.3 | 7-8 | ||
NEFH | chr22:29884837-29886692 | NM_021076.3 | 4 | ||
NF1 | chr17:29527439-29528503 | NM_000267.3 | 9-11 | ||
NF1 | chr17:29541468-29563039 | NM_000267.3 | 13-29 | ||
NF1 | chr17:29585361-29592357 | NM_000267.3 | 31-35 | ||
NOTCH2 | chr1:120539619-120612206 | NM_024408.3 | 1-4 | 1-4 | |
NXF5 | chrX:101087239-101097764 | NM_032946.2 | 3-16 | ||
OCLN | chr5:68840730-68849498 | NM_002538.3 | 6, 9 | 6, 9 | 5,7,8 |
OTOA | chr16:21742157-21771861 | NM_144672.3 | 20-21, 28 | 20-21, 28 | 22-27 |
PARN | chr16:14530572-14530629 | NM_002582.3 | 24 | ||
PBX1 | chr1:164818407-164818639 | NM_002585.3 | 9 | ||
PIGA | chrX:15339627-15343274 | NM_002641.3 | 4-6 | ||
PIGN | chr18:59763080-59763183 | NM_012327.5 | 22 | ||
PIK3CA | chr3:178935997-178938945 | NM_006218.2 | 10-14 | 10-14 | |
PIK3CD | chr1:9787004-9787104 | NM_005026.3 | 24 | ||
PKD1 | chr16:2147417-2185690 | NM_001009944.2 | 1-33 | 1 | |
PKP2 | chr12:32945357-32945665 | NM_004572.3 | 13-14 | ||
PMS2 | chr7:6017218-6027251 | NM_000535.6 | 11-14 | 11-14 | 15 |
PMS2 | chr7:6031603-6031688 | NM_000535.6 | 9 | ||
PMS2 | chr7:6042083-6048650 | NM_000535.6 | 1-5 | ||
PNPT1 | chr2:55863371-55863527 | NM_033109.4 | 28 | ||
POLH | chr6:43555008-43555226 | NM_006502.2 | 4 | ||
PRODH | chr22:18900687-18910692 | NM_016335.4 | 6-15 | ||
PRODH | chr22:18923527-18923800 | NM_016335.4 | 2 | ||
PROS1 | chr3:93593088-93647641 | NM_000313.3 | 2-15 | ||
PRPS1 | chrX:106893169-106893262 | NM_002764.3 | 7 | ||
PRSS1 | chr7:142457335-142460871 | NM_002769.4 | 1-5 | ||
PTEN | chr10:89725043-89725229 | NM_000314.4 | 9 | 9 | |
RAD21 | chr8:117859738-117859927 | NM_006265.2 | 14 | ||
RBM8A | chr1:145507666-145509211 | NM_005105.4 | 1-2, 4-6 | 1-2, 4-6 | 3 |
RBPJ | chr4:26431519-26432629 | NM_005349.3 | 10-12 | ||
RDX | chr11:110102559-110102758 | NM_002906.3 | 14 | ||
RMND1 | chr6:151766442-151766946 | NM_017909.3 | 2 | ||
RNF216 | chr7:5764954-5770448 | NM_207111.3 | 6-8 | ||
RNF216 | chr7:5800632-5800700 | NM_207111.3 | 2 | ||
RPL15 | chr3:23959350-23962334 | NM_002948.4 | 2-4 | ||
SALL1 | chr16:51171022-51176056 | NM_002968.2 | 2-3 | ||
SBDS | chr7:66453357-66460404 | NM_016038.2 | 1-5 | ||
SDHA | chr5:218470-256535 | NM_004168.3 | 1-15 | ||
SHOX | chrX:591590-619564 | NM_000451.3 | 2-6 | 2-6 | |
SLC25A15 | chr13:41367362-41367417 | NM_014252.3 | 2 | ||
SLC25A15 | chr13:41382573-41383803 | NM_014252.3 | 6-7 | ||
SLC33A1 | chr3:155545998-155546166 | NM_004733.3 | 6 | ||
SLC6A8 | chrX:152954029-152960669 | NM_005629.3 | 1-13 | ||
SMN1 | chr5:70220930-70248259 | NM_000344.3 | 1-8 | 1-8 | |
SMN2 | chr5:69345512-69372860 | NM_022875.2 | 1-8 | 1-8 | |
SOX2 | chr3:181430572-181431102 | NM_003106.3 | 1 | ||
SPTLC1 | chr9:94871021-94871116 | NM_006415.3 | 3 | ||
SRD5A3 | chr4:56233760-56236258 | NM_024592.4 | 4-5 | ||
SRP72 | chr4:57367949-57368027 | NM_006947.3 | 19 | ||
STAT5B | chr17:40370168-40371860 | NM_012448.3 | 6-9 | ||
STRC | chr15:43891869-44010382 | NM_153700.2 | 19-29 | 19-29 | 1-18 |
SYT14 | chr1:210334073-210334387 | NM_001146261.2 | 10 | ||
TARDBP | chr1:11082180-11083305 | NM_007375.3 | 6 | ||
TBL1XR1 | chr3:176743285-176743312 | NM_024665.4 | 16 | ||
TBX20 | chr7:35242041-35280649 | NM_001077653.2 | 5-8 | ||
TIMM8A | chrX:100600648-100601648 | NM_004085.3 | 2 | ||
TPM3 | chr1:154130114-154130197 | NM_153649.3 | 8 | ||
TPMT | chr6:18130898-18131011 | NM_000367.4 | 9 | ||
TRAPPC2 | chrX:13732525-13732624 | NM_001011658.3 | 6 | ||
TRIP11 | chr14:92436016-92436237 | NM_004239.4 | 21 | ||
TTN | chr2:179519171-179527539 | NM_001267550.1 | 175-192 | 175-192 | |
TUBA1A | chr12:49578792-49580616 | NM_006009.3 | 2-4 | ||
TUBB2A | chr6:3154100-3156386 | NM_001069.2 | 2-4 | 4 | |
TUBB2B | chr6:3224988-3226045 | NM_178012.4 | 4 | 4 | |
TUBB3 | chr16:90001136-90002212 | NM_006086.3 | 4 | ||
TUBB4A | chr19:6495174-6496232 | NM_006087.3 | 4 | ||
TUBG1 | chr17:40765664-40766675 | NM_001070.4 | 7-10 | ||
TYR | chr11:89017940-89028534 | NM_000372.4 | 4-5 | ||
UBA5 | chr3:132394091-132395370 | NM_024818.3 | 9-12 | ||
UBE3A | chr15:25615712-25616959 | NM_130838.2 | 3 | ||
UNC93B1 | chr11:67759016-67763355 | NM_030930.3 | 9-11 | ||
USP18 | chr22:18642938-18656609 | NM_017414.3 | 3-10 | 3-10 | 11 |
VPS35 | chr16:46705616-46717518 | NM_018206.5 | 2-12 | ||
VWF | chr12:6120782-6135212 | NM_000552.3 | 23-34 | ||
WRN | chr8:30941214-30942762 | NM_000553.4 | 10-11 | ||
XIAP | chrX:123040837-123041031 | NM_001167.3 | 7 | ||
ZEB2 | chr2:145147017-145147590 | NM_014795.3 | 10 | ||
ZNF341 | chr20:32378793-32379323 | NM_032819.4 | 15 |