Blueprint Genetics’ approach to pseudogenes and other duplicated genomic regions

At Blueprint Genetics, we are transparent about the limitations of our technology and ensure that you are also aware of them by including these in our comprehensive clinical statement. We are committed to resolving difficult-to-sequence regions that are hard to validate, interpret, and confirm.

Such regions include highly homologous and repetitive areas that are very challenging to map with any next-generation sequencing technology. This page lists these clinically relevant regions throughout the genome and Blueprint Genetics’ approach to handling them.

What is a pseudogene?

A pseudogene is a genomic region that has high sequence similarity (homology) to a known gene but is nonfunctional (ie, does not produce a functional final protein product). Usually, the DNA sequences of a pseudogene and of its functional parent gene are about 65% to 100% identical.

Pseudogenes tend to accumulate more variants than their parent genes as they are not often under selective pressure.

What is a segmental duplication?

A segmental duplication is a region in the genome where the sequence is duplicated and the similarity between the parent region and duplicated region is ≥90% over a length of ≥1 kilobases (≥1000 base pairs).

Pseudogenes are often located in regions of segmental duplication.

Why is it important to be aware of pseudogenes when ordering genetic testing?

Pseudogenes can complicate the analysis of sequence data generated from NGS because:

Segmental duplications can be indistinguishable from their parent region if a laboratory is using short-read NGS methods (75-300 bp reads depending on the chemistry and sequencing platform used).
High levels of sequence similarity complicate accurate read alignment (mapping) as shown in the figure below in (Figure 1). Sequence reads that map to several genomic positions are discarded in the analysis, which causes gaps in the sequence coverage.
If sequence reads containing a pseudogene-derived variant are mis-mapped to the parent gene, it may result in a false positive variant call.
If sequence reads containing a parent gene-derived variant are mis-mapped to the pseudogene, it may result in a false negative result.
Due to the high degree of sequence similarity, it can be difficult to design parent gene-specific Sanger sequencing primers.
We manually design all Sanger primers when confirming variants in regions with high homology and develop custom confirmation methods utilizing long-range PCR when necessary.

Figure 1. Mapping next-generation sequencing reads.

Confidence in read alignment decreases when sequence homology between the regions increases. Sequence reads are discarded when they align equally well to several genomic positions. The use of longer read length and paired-end sequencing improves read mapping.

How many clinically relevant genes are affected by regions of segmental duplication?

It is estimated that humans have >10,000 pseudogenes (GENCODE project). Some of the genes on Blueprint Genetics’ panels and whole exome sequencing tests have pseudogenes or other homologous regions in the genome. Variant calling from NGS data from these regions may be unreliable due to the issues listed above. The sensitivity to detect variants in genes with pseudogenes is expected to be lower than the sensitivity achieved from regions without pseudogenes or segmental duplications.

For transparency’s sake, the genomic coordinates of the affected regions are listed below in a table. This table shows all genes and exons on our panels that are affected by >90% homology based on segmental duplications data extracted from the UCSC Genome Browser Database. In addition, we highlight affected genes with an asterisk (*) on our website and clinical statements to ensure health care providers are aware of this important limitation.

Are all genes with associated pseudogenes difficult to accurately analyze?

The degree to which a pseudogene impacts the ability to accurately detect and map variants in its parent gene depends on the degree of similarity (homology) between the duplicated region and the parent gene. Generally, variants in genes sharing 90%-98% homology with a pseudogene are still accurately detected and mapped. When homology is greater than 98%, accurate detection and mapping of variants is still possible, however, it becomes more difficult and may require specialized methods. These more severely affected exons (>98% homology) are also highlighted in the table along with exons that are completely removed from the analysis. See question: ‘What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?’

Analytic validation of genes with pseudogenes is difficult.

There are regions of the genome, including segmentally duplicated regions, that are masked in reference sample data sets. This makes analytic validation of these masked regions extremely difficult.

For both our analytic validation and in-test quality control, we use gold-standard reference DNA samples with high-quality single nucleotide variants and insertions and deletions datasets provided by the Genome In a Bottle Consortium (GIAB).

We recognize the difficult-to-validate regions in our panels. Our bioinformatics data analysis pipeline flags variants located in these regions and confirmatory testing is performed for all variants not fulfilling specific quality control criteria to mitigate the risks associated with potential errors arising from the masked regions.

What has Blueprint Genetics done to improve our ability to accurately detect variants in clinically relevant genes with pseudogenes?

Customized target capture kit and chemistry increase specificity and our ability to discriminate between homologous regions
Paired-end 2 x 150 bps sequencing results in reads with high mapping quality including the majority of genomic regions with segmental duplication
Customized bioinformatics pipeline increases our ability to map reads accurately
Only sequence reads with a minimum mapping quality (MQ) of 20 are used in variant calling (ie, base call accuracy >99%)
Manual design of long-range PCR and Sanger sequencing primers to confirm variants in regions with high homology

What questions should you ask your lab of choice about segmentally duplicated regions?

What steps have you taken to try and resolve genes affected by segmental duplication and pseudogenes?
Do you use customized solutions to analyze affected regions?
Is information about pseudogenes and test limitations publicly available? If yes, where?
Do you highlight genes affected by pseudogenes when presenting panel content or whole exome sequencing?
Does the clinical report mention the possibility of pseudogene interference?
What is your mapping quality?

Genes affected by segmental duplication

Gene	Genomic coordinates of duplicated region	Transcript	Affected exons (>90% homology)	Severely affected exons (>98% homology)
ABCC6	chr16:16295857-16318046	NM_001171.5	1-9	1-9
ABCD1	chrX:153006027-153009189	NM_000033.3	7-10
ACTB	chr7:5567378-5569288	NM_001101.4	2-6
ACTG1	chr17:79477715-79479380	NM_001614.3	2-6
ACTN4	chr19:39219635-39220072	NM_004924.5	20-21
ADAMTSL2	chr9:136419478-136419815	NM_014694.3	10-19	10-19
ADIPOR1	chr1:202910700-202911346	NM_015999.5	7-8
AFG3L2	chr18:12344130-12344246	NM_006796.2	14
AGK	chr7:141352586-141352724	NM_018238.3	16
ALG1	chr16:5127907-5134882	NM_019109.4	6-12
ALMS1	chr2:73826527-73830431	NM_015120.4	17-21
ANKRD11	chr16:89334885-89335071	NM_013275.5	13	13
ANOS1	chrX:8501035-8507799	NM_000216.3	10-14
AP4S1	chr14:31562112-31562241	NM_001254729.1	6
ARMC4	chr10:28250496-28284109	NM_018076.4	2-10
ARSE	chrX:2852872-2856298	NM_000047.2	9-11
ASNS	chr7:97481570-97498468	NM_133436.3	3-13
ATAD3A	chr1:1447648-1469452	NM_001170535.2	1-16
B3GAT3	chr11:62383172-62384819	NM_012200.3	3-5
BCAP31	chrX:152966391-152969549	NM_005745.7	5-8
BDP1	chr5:70860608-70860712	NM_018429.2	39
BMPR1A	chr10:88683132-88683476	NM_004329.2	12-13	12-13
BRAF	chr7:140433811-140434570	NM_004333.4	18
BRCA1	chr17:41276033-41276113	NM_007294.3	2
C2	chr6:31867702-31869082	NM_001178063.1	1
CACNA1C	chr12:2791115-2795435	NM_001167625.1	43-45
CALM1	chr14:90870212-90871061	NM_006888.4	4-6
CD46	chr1:207930358-207934791	NM_002389.4	2-5
CEP290	chr12:88442959-88443191	NM_025114.3	54
CFH	chr1:196658549-196659369	NM_000186.3	8-9
CFH	chr1:196682864-196683047	NM_000186.3	10
CFH	chr1:196712581-196716443	NM_000186.3	20-22
CHEK2	chr22:29083884-29091861	NM_007194.3	11-15
CISD2	chr4:103808497-103808587	NM_001008388.4	3
CLCNKA	chr1:16349114-16360153	NM_004070.3	2-20
CLCNKB	chr1:16370987-16383411	NM_000085.4	2-20
CORO1A	chr16:30199853-30199897	NM_007074.3	10-11	10-11
COX10	chr17:14082520-14095538	NM_001303.3	6	6
CP	chr3:148891500-148891517	NM_000096.3	19
CRYBB2	chr22:25623819-25627739	NM_000496.2	4-6
CSF2RA	chrX:1401596-1428482	NM_006140.4	3-13	3-13
CUBN	chr10:16866973-16883046	NM_001081.3	61-67
CUBN	chr10:16948201-16970302	NM_001081.3	41-50
CYCS	chr7:25163319-25163738	NM_018947.5	2-3
CYP11B1	chr8:143955788-143961229	NM_000497.3	1-9
CYP21A2	chr6:32006199-32009227	NM_000500.7	1-10	1-10
DCLRE1C	chr10:14974852-14981868	NM_001033855.2	4-9
DHFR	chr5:79924905-79924984	NM_000791.3	6	6
DICER1	chr14:95556834-95557000	NM_177438.2	27
DIS3L2	chr2:233194522-233201908	NM_152383.4	15-21
DNAH11	chr7:21923908-21924028	NM_001277115.1	76
DNAH11	chr7:21940624-21940872	NM_001277115.1	82
DNM1	chr9:131015379-131016993	NM_004408.3	21
DSE	chr6:116756749-116758508	NM_013352.3	6
DUOX2	chr15:45402847-45404153	NM_014080.4	5-8
EGLN1	chr1:231502156-231502221	NM_022051.2	5
ELK1	chrX:47496227-47498737	NM_005229.4	3-6
ELMO2	chr20:45008891-45023121	NM_133171.4	3-11
ERCC6	chr10:50723243-50725167	NM_001277059.1	6
ESPN	chr1:6488285-6517432	NM_031475.2	2-12
EYS	chr6:66005755-66040367	NM_001142800.1	12
F8	chrX:154114408-154114577	NM_019863.2	1	1
FANCD2	chr3:10084733-10091189	NM_033084.3	12-17
FANCD2	chr3:10101977-10115046	NM_033084.3	19-28
FAR1	chr11:13750173-13750321	NM_032228.5	12
FHL1	chrX:135292029-135292184	NM_001449.4	7
FLG	chr1:152275831-152286484	NM_002016.1	3
FLNC	chr7:128496571-128498577	NM_001458.4	44-48
FOXD4	chr9:116799-118119	NM_207305.4	1	1
FXN	chr9:71687527-71689806	NM_000144.4	5
GBA	chr1:155204785-155210903	NM_000157.3	1-11
GH1	chr17:61994668-61996136	NM_000515.4	1-5
GJA1	chr6:121767993-121769142	NM_000165.4	2
GK	chrX:30746848-30746859	NM_000167.5	19	19
GLUD1	chr10:88811507-88811627	NM_005271.4	13
GLUD1	chr10:88834307-88836413	NM_005271.4	2-4
GOSR2	chr17:45008464-45009565	NM_004287.4	3-4
GUSB	chr7:65429309-65429445	NM_000181.3	11
HBA1	chr16:226715-227410	NM_000558.4	1-3
HBA2	chr16:222911-223599	NM_000517.4	1-3
HNRNPA1	chr12:54677603-54678097	NM_031157.3	9-10
HPS1	chr10:100193696-100195529	NM_000195.3	4-6
HSPD1	chr2:198351769-198353971	NM_002156.4	9-12
HYDIN	chr16:70852244-71186686	NM_001270974.2	6-84	6-84
IDS	chrX:148584841-148585745	NM_000202.7	2-3	2-3
IFT122	chr3:129200372-129218911	NM_052985.3	15-20
IGLL1	chr22:23915452-23917272	NM_020070.3	2-3
KANSL1	chr17:44171925-44172067	NM_001193466.1	3
KCTD1	chr18:24035706-24039889	NM_001258221.1	4-5
KIF1C	chr17:4925522-4927446	NM_006612.5	22-23
KRAS	chr12:25362444-25362845	NM_033360.2	6
KRT14	chr17:39738686-39743086	NM_000526.4	1-8
KRT16	chr17:39766186-39768940	NM_005557.3	1-8
KRT17	chr17:39775845-39780761	NM_000422.2	1-8
KRT6A	chr12:52881503-52886972	NM_005554.3	1-9
KRT6B	chr12:52840973-52845862	NM_005555.3	1-9
KRT6C	chr12:52862845-52867521	NM_173086.4	1-9
LEFTY2	chr1:226125140-226128840	NM_003240.4	1-4
LRP5	chr11:68080182-68080273	NM_002335.2	1
LRP5	chr11:68125117-68174281	NM_002335.2	3-9
MAT2A	chr2:85770097-85770895	NM_005911.5	8-9
MID1	chrX:10417407-10417479	NM_000381.3	10
MOCS1	chr6:39874132-39874889	NM_005943.5	10
MSN	chrX:64958386-64959755	NM_002444.2	11-13
MSX2	chr5:174156167-174156586	NM_002449.4	2
MYO5B	chr18:47352840-47352993	NM_001080467.2	40
NCF1	chr7:74191612-74203048	NM_000265.5	1-11	1-11
NEB	chr2:152435850-152465190	NM_001271208.1	82-105	82-105
NECAP1	chr12:8248196-8248686	NM_015509.3	7-8
NEFH	chr22:29884837-29886692	NM_021076.3	4
NF1	chr17:29527439-29528503	NM_000267.3	9-11
NF1	chr17:29541468-29563039	NM_000267.3	13-29
NF1	chr17:29585361-29592357	NM_000267.3	31-35
NOTCH2	chr1:120539619-120612206	NM_024408.3	1-4	1-4
NXF5	chrX:101087239-101097764	NM_032946.2	3-16
OCLN	chr5:68840730-68849498	NM_002538.3	5-9	5-9
OTOA	chr16:21742157-21771861	NM_144672.3	21-29	21-29
PARN	chr16:14530572-14530629	NM_002582.3	24
PBX1	chr1:164818407-164818639	NM_002585.3	9
PIGA	chrX:15339627-15343274	NM_002641.3	4-6
PIGN	chr18:59763080-59763183	NM_012327.5	22
PIK3CA	chr3:178935997-178938945	NM_006218.2	10-14	10-14
PIK3CD	chr1:9787004-9787104	NM_005026.3	24
PKD1	chr16:2147417-2185690	NM_001009944.2	1-33	1
PKP2	chr12:32945357-32945665	NM_004572.3	13-14
PMS2	chr7:6017218-6027251	NM_000535.6	11-15	11-15
PMS2	chr7:6031603-6031688	NM_000535.6	9
PMS2	chr7:6042083-6048650	NM_000535.6	1-5
PNPT1	chr2:55863371-55863527	NM_033109.4	28
POLH	chr6:43555008-43555226	NM_006502.2	4
PRODH	chr22:18900687-18910692	NM_016335.4	6-15
PRODH	chr22:18923527-18923800	NM_016335.4	2
PROS1	chr3:93593088-93647641	NM_000313.3	2-15
PRPS1	chrX:106893169-106893262	NM_002764.3	7
PRSS1	chr7:142457335-142460871	NM_002769.4	1-5
PTEN	chr10:89725043-89725229	NM_000314.4	9	9
RAD21	chr8:117859738-117859927	NM_006265.2	14
RBM8A	chr1:145507666-145509211	NM_005105.4	1-6	1-6
RBPJ	chr4:26431519-26432629	NM_005349.3	10-12
RDX	chr11:110102559-110102758	NM_002906.3	14
RMND1	chr6:151766442-151766946	NM_017909.3	2
RNF216	chr7:5764954-5770448	NM_207111.3	6-8
RNF216	chr7:5800632-5800700	NM_207111.3	2
RPL15	chr3:23959350-23962334	NM_002948.4	2-4
SALL1	chr16:51171022-51176056	NM_002968.2	2-3
SBDS	chr7:66453357-66460404	NM_016038.2	1-5
SDHA	chr5:218470-256535	NM_004168.3	1-15
SHOX	chrX:591590-619564	NM_000451.3	2-6	2-6
SLC25A15	chr13:41367362-41367417	NM_014252.3	2
SLC25A15	chr13:41382573-41383803	NM_014252.3	6-7
SLC33A1	chr3:155545998-155546166	NM_004733.3	6
SLC6A8	chrX:152954029-152960669	NM_005629.3	1-13
SMN1	chr5:70220930-70248259	NM_000344.3	1-8	1-8
SMN2	chr5:69345512-69372860	NM_022875.2	1-8	1-8
SOX2	chr3:181430572-181431102	NM_003106.3	1
SPTLC1	chr9:94871021-94871116	NM_006415.3	3
SRD5A3	chr4:56233760-56236258	NM_024592.4	4-5
SRP72	chr4:57367949-57368027	NM_006947.3	19
STAT5B	chr17:40370168-40371860	NM_012448.3	6-9
STRC	chr15:43891869-44010382	NM_153700.2	1-29	1-29
SYT14	chr1:210334073-210334387	NM_001146261.2	10
TARDBP	chr1:11082180-11083305	NM_007375.3	6
TBL1XR1	chr3:176743285-176743312	NM_024665.4	16
TBX20	chr7:35242041-35280649	NM_001077653.2	5-8
TIMM8A	chrX:100600648-100601648	NM_004085.3	2
TPM3	chr1:154130114-154130197	NM_153649.3	8
TPMT	chr6:18130898-18131011	NM_000367.4	9
TRAPPC2	chrX:13732525-13732624	NM_001011658.3	6
TRIP11	chr14:92436016-92436237	NM_004239.4	21
TTN	chr2:179519171-179527539	NM_001267550.1	175-192	175-192
TUBA1A	chr12:49578792-49580616	NM_006009.3	2-4
TUBB2A	chr6:3154100-3156386	NM_001069.2	2-4	4
TUBB2B	chr6:3224988-3226045	NM_178012.4	4	4
TUBB3	chr16:90001136-90002212	NM_006086.3	4
TUBB4A	chr19:6495174-6496232	NM_006087.3	4
TUBG1	chr17:40765664-40766675	NM_001070.4	7-10
TYR	chr11:89017940-89028534	NM_000372.4	4-5
UBA5	chr3:132394091-132395370	NM_024818.3	9-12
UBE3A	chr15:25615712-25616959	NM_130838.2	3
UNC93B1	chr11:67759016-67763355	NM_030930.3	9-11
USP18	chr22:18642938-18656609	NM_017414.3	3-11	3-11
VPS35	chr16:46705616-46717518	NM_018206.5	2-12
VWF	chr12:6120782-6135212	NM_000552.3	23-34
WRN	chr8:30941214-30942762	NM_000553.4	10-11
XIAP	chrX:123040837-123041031	NM_001167.3	7
ZEB2	chr2:145147017-145147590	NM_014795.3	10
ZNF341	chr20:32378793-32379323	NM_032819.4	15

Tests

Panels

Services

Frequently ordered

Order

Pricing

Clinical Report

Learn More

Innovative Solutions

Transparent Quality

Accessibility

Blueprint Genetics

Accreditation

Newsroom

Meet Us

Resources

Education

Patients