The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome

Levent Albayrak, Kamil Khanipov, Maria Pimenova, George Golovko, Mark Rojas, Ioannis Pavlidis, Sergei Chumakov, Gerardo Aguilar, Arturo Chávez, William R. Widger, Yuriy Fofanov

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency≤1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Results: Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Conclusion: Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.

Original languageEnglish (US)
Article number1017
JournalBMC Genomics
Volume17
Issue number1
DOIs
StatePublished - Dec 12 2016

Fingerprint

Mitochondrial Genome
Mitochondrial DNA
DNA
Mutation
Genome
Technology
Neoplasms
Atlases
Neurodegenerative Diseases
Population
Mitochondria
Nucleotides
Alleles
Polymerase Chain Reaction

Keywords

  • Heteroplasmy
  • High throughput sequencing
  • Low-abundance mutation
  • Minor allele
  • Mitochondria
  • NUMT
  • Rare variant

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome. / Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R.; Fofanov, Yuriy.

In: BMC Genomics, Vol. 17, No. 1, 1017, 12.12.2016.

Research output: Contribution to journalArticle

Albayrak, L, Khanipov, K, Pimenova, M, Golovko, G, Rojas, M, Pavlidis, I, Chumakov, S, Aguilar, G, Chávez, A, Widger, WR & Fofanov, Y 2016, 'The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome', BMC Genomics, vol. 17, no. 1, 1017. https://doi.org/10.1186/s12864-016-3375-x
Albayrak, Levent ; Khanipov, Kamil ; Pimenova, Maria ; Golovko, George ; Rojas, Mark ; Pavlidis, Ioannis ; Chumakov, Sergei ; Aguilar, Gerardo ; Chávez, Arturo ; Widger, William R. ; Fofanov, Yuriy. / The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome. In: BMC Genomics. 2016 ; Vol. 17, No. 1.
@article{3441f55951dc4cd89b2ac86f11b7ac50,
title = "The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome",
abstract = "Background: Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency≤1{\%}), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Results: Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Conclusion: Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4{\%} of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9{\%}, when low-abundance mutations at 100{\%} of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.",
keywords = "Heteroplasmy, High throughput sequencing, Low-abundance mutation, Minor allele, Mitochondria, NUMT, Rare variant",
author = "Levent Albayrak and Kamil Khanipov and Maria Pimenova and George Golovko and Mark Rojas and Ioannis Pavlidis and Sergei Chumakov and Gerardo Aguilar and Arturo Ch{\'a}vez and Widger, {William R.} and Yuriy Fofanov",
year = "2016",
month = "12",
day = "12",
doi = "10.1186/s12864-016-3375-x",
language = "English (US)",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome

AU - Albayrak, Levent

AU - Khanipov, Kamil

AU - Pimenova, Maria

AU - Golovko, George

AU - Rojas, Mark

AU - Pavlidis, Ioannis

AU - Chumakov, Sergei

AU - Aguilar, Gerardo

AU - Chávez, Arturo

AU - Widger, William R.

AU - Fofanov, Yuriy

PY - 2016/12/12

Y1 - 2016/12/12

N2 - Background: Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency≤1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Results: Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Conclusion: Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.

AB - Background: Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency≤1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Results: Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Conclusion: Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.

KW - Heteroplasmy

KW - High throughput sequencing

KW - Low-abundance mutation

KW - Minor allele

KW - Mitochondria

KW - NUMT

KW - Rare variant

UR - http://www.scopus.com/inward/record.url?scp=85003554360&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85003554360&partnerID=8YFLogxK

U2 - 10.1186/s12864-016-3375-x

DO - 10.1186/s12864-016-3375-x

M3 - Article

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 1017

ER -