Applications & Advances in Forensic Mitochondrial DNA Analysis
Written by Teresa Snyder-Leiby, PhD   

DNA analysis from crime scene evidence is central to a large proportion of cases. Results from DNA Short Tandem Repeat (STR or “DNA fingerprints”) were introduced to courts in the mid 1980s and widely accepted by the late 1990s as standardized protocols, and statistical probabilities were developed. STR DNA analysis has had an impact on countless investigations and court cases. Strengths of this data include both its resolving power for excluding an individual, and the ability to determine potential relationships between evidence and suspects due to Mendelian inheritance of nuclear DNA. However, there are only two copies of the DNA per cell in linear chromosomes. If DNA extracted from the source material has been degraded or is of a very low concentration, it may be unsuitable for STR analysis.

Originally published: Fall 2017 Issue (Volume 15, Number 3)
View the Digital Edition here

In contrast to nuclear DNA, each cell contains many copies of mitochondrial DNA (mtDNA). There may be several copies per mitochondrion, with each cell containing dozens to hundreds of mitochondria. Additionally, mtDNA is less prone to degradation because it is a circular chromosome, making mtDNA analysis an essential forensic DNA tool. Examples where mtDNA is a preferred approach include situations with damaged, degraded samples (typical of mass graves or bodies discovered years after death) and very small biological samples where only a small portion of the body is available for analysis. Mitochondrial DNA is inherited through maternal lineage, providing data for anthropologic and genealogical studies, and historical claims—such as those to Tsar Nicholas II Princess Anastasia. The first case in the United States to use mtDNA evidence (State of Tennessee v. Paul Ware) was in 1996.

Restriction fragment length polymorphism (RFLP) was an early method for mtDNA analysis. RFLP requires using 6 to 12 enzymes to fragment replicates of the sample, separating fragments by slab gel, and comparing patterns to determine differences between samples. RFLP was replaced by Sanger sequence analysis, which became the gold standard for mtDNA analysis.

The mtDNA sequence is highly conserved, but two regions have a higher mutation rate: hypervariable regions 1 and 2 (HV1 and HV2). Mutations are more likely to occur in these regions from one generation to the next, sometimes resulting in the ability to differentiate individuals sharing the same maternal lineage. HV1 and HV2 regions are readily sequenced by the Sanger method and capillary electrophoresis—as the size is only 0.6 kb—although the workflow is time consuming. The workflow includes: PCR and sequencing of HV1 and HV2 for each sample (forward and reverse strands), comparison of each sample to the revised Cambridge Reference Sequence (rCRS) to call variants, aligning the variants (following the recommendations of DNA Commission of the International Society for Forensic Genetics), and comparing variants of each sample to the other samples. Figure 1 provides an example of a single nucleotide polymorphism (SNP) variation in the sample when compared to the rCRS. In a large portion of the population, the non-coding hypervariable regions are well-suited to differentiation between individuals, as they are more variable than the coding portion of the mitochondrial genome and they do not contain any personal health information (PHI). Sample processing of mtDNA for Sanger sequence analysis is routinely done in batches, processing sample sizes of approximately 2 cm of hair or from pulverized bone. With specialized isolation and amplification, it is also possible to obtain valuable mtDNA sequence data from as little as 2 mm of hair (Melton et al).


Figure 1—Illustrates a variation in the sample compared to the reference sequence. The first and last electropherograms are the forward and reverse sequence for the reference. The second and fifth electropherograms are the forward and reverse sequence for the sample. The two center electropherograms indicates a C -> T variant (SNP) in the sample.


Figure 2


Figure 3F


Figure 3R


Figure 4F


Figure 4R

Figures 2 , 3 & 4—Heteroplasmy detection with Sanger sequence analysis. From the Mut_Percent column of the Mutation Quantifier Report, the variant 195C is occurring at approximately 85% in the sample HV2-CBI-53 (variant allele present at 86.39% in forward directional trace file and 82.52% in reverse directional trace file lines No. 9 and 19 table, Figure 2).

Evidence provided in support of this being a true variant would then conclude this as a heteroplasmic variant as both the reference T allele and the variant C allele was detected in this sample. Support of this being a true variant is provided by the report indicating that the same variant allele is present for a specific location in both the forward and reverse trace file and with the same relative allele percentage. Peak characteristics can provide further support of a true variant, which can be viewed by opening the electropherogram in the report for this variant in the paired trace files for this sample (HV2-53-F_195T-C_85.PNG and HV2-53-R_195T-C_85.PNG). The position of interest is indicated by the red dot (Figures 3F and 3R).

Low level heteroplasmic variants (Figure 4F and 4R) may appear more similar to the peak data at the same position in sample HV2-CBI-28 (HV2-28-F_195T-C_5.PNG and HV2-28-R_195T-C_5.PNG). Again, true variants are more likely to be detected in both the forward and reverse directions of the bi-directional data, and of similar allele ratio. With low-level heteroplasmy, the secondary allele may be occurring at a much lower frequency and so the secondary peak in the trace data will likely have a much lower intensity. An example of this may be located at genomic position 195 in sample HV2-CBI-28, where the variant allele percentage is approximately 5%.

In other words, high-level heteroplasmic variant exampled in sample HV2-CBI-53 and low-level heteroplasmic variant exampled in HV2-CBI-28. Mutation Quantifier Report shows at what positions secondary peaks exist in trace data and calculates the allele ratio. Since this does not mean variant is real, support should be provided which would include presence at relatively similar frequency/allele ratio in both forward and reverse directional trace file for same sample and that minor peak looks like “normal” (Gaussian) shape.

However, there are many samples that cannot be used to differentiate between some individuals based on the sequence of the hypervariable regions. These samples benefit from the detection of sequence variation in other regions using whole mitochondrial genome analysis. Mutations may occur anywhere in the mitochondrial genome’s 16,569 base pairs, although this happens at a lower rate outside the hypervariable regions. As these mutations accumulate in a population, the resulting haplogroups can be used to define genetic populations and are often used to trace the potential origin of an individual. Databases such as EMPOP are instrumental for these studies. In addition to mutations accumulating in a population, the mitochondria within an individual are also subject to mutations and this can result in heteroplasmy—slightly different mtDNA sequences within an individual. Figures 2, 3, and 4, illustrate heteroplasmy detected in Sanger mtDNA sequence analysis. However, Sanger sequence analysis is not an optimal technology for detecting low percentage heteroplasmy or performing whole mitochondrial genome analysis.

High Throughput Sequencing (HTS)—also referred to as Next Generation Sequencing (NGS) technology—overcomes the drawbacks of Sanger sequence analysis, which is time consuming and expensive for whole mtDNA sequencing and not well suited for low percent heteroplasmy detection.

HTS samples can be batched and barcoded for efficient sequencing of many samples in the same run. NGS produces data with greater depth of coverage that allows for detection of low frequency heteroplasmy. This technology diverges from Sanger sequencing but bridges the gaps left by this method, decreasing cost-per-sample and increasing depth of coverage resulting in improved detection of heteroplasmy. NGS technology allows the sequencing process to be extended across millions of fragments instead of a single DNA fragment. There are many commercial platforms and chemistries that can be used for NGS. The advances in sequencing technology have increased the amount of information available while decreasing the overall costs of sequencing DNA.

A HTS workflow can vary depending on the commercial kit and/or sequencer being used. A typical workflow can be broken into four steps: library preparation, template preparation, sequencing, and data analysis. The first step, library preparation, involves DNA fragmentation which can be performed using a few different methods. The fragments then undergo barcoding or adapter ligation. DNA fragments in the library are (depending on the technology) amplified in clusters on a surface or on beads in individual wells. Sequencing is then performed by detecting fluorescent signals from dNTPs linked to reversible terminators in the former case, or by detecting pH changes caused by the release of pyrophosphate in the latter case. The resulting data can be analyzed in a software program. HTS is an extremely useful technology that is advancing at a quick pace.

Analysis of the large data files generated by the new technology can be a challenge. Tens of thousands to millions of reads are generated for each sample. Computer programs are needed to perform several steps:

1) Align the reads to the rCRS.
2) Report the depth of coverage.
3) Call variants, including single nucleotide polymorphisms (SNPS), insertion/deletions (indels), heteroplasmic SNPs (point heteroplasmy, or PHP) and heteroplasmic indels (length heteroplasmy, or LHP).
4) Export the minor haplotype to determine the haplogroup of a given sample.
5) Compare sequence of test samples to known samples for evidence evaluation and missing persons sequences.

Each sequencing platform includes software for basic analysis. Public and commercial free-standing software programs are also available for analysis of NGS sequence data. Riman et al. (2017) evaluated NGS data of the Standard Reference Materials (SRM 2392 and 2392-I) which are supplied by NIST to provide quality control when sequencing human mitochondrial genomes for forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology and genetic genealogy studies. Riman’s study was performed with two different benchtop NGS platforms (Ion PGM Torrent and Illumina MiSeq) and three software programs, two commercial (CLC Bio Genomics Workbench, Qiagen; and GeneMarker HTS, SoftGenetics) and one open source (Galaxy). The analyses revealed low-level heteroplasmy variants that will be added as information values in the updated SRM certificates.


Figure 5—Provides an example of data analysis in GeneMarkerHTS software; a global image of forward and reverse coverage, reference sequence, the consensus sequence of aligned sample reads, the pile-up of the aligned data, the depth of coverage, and called variants (SNPs or Indels). Forensic alignment can be achieved using a motif file to apply the rules established by the International Society for Forensic Genetics: Revised and extended guidelines for mitochondrial DNA typing.


Figure 6—Samples most similar to each other are SID003/4 with SID001/2 (upper table). Reviewing the comparison by variant provides a rapid visual of the distribution of variants by sample. For the portion of the table displayed in this figure, SID002 has several low level variants and a major variant at A73G. The other samples have a major variant at one location.

Figure 5 provides an example of data analysis in GeneMarker HTS software; a global image of forward and reverse coverage, reference sequence, the consensus sequence of aligned sample reads, the pile-up of the aligned data, the depth of coverage, and called variants (SNPs or Indels). Forensic alignment can be achieved using a motif file to apply the rules established by the 2014 “International Society for Forensic Genetics: Revised and extended guidelines for mitochondrial DNA typing”. Comparison of large numbers of samples to each is an essential tool for a variety of applications, such as haplotype comparison, maternal relationship, and identification of remains to known samples. Figure 6 illustrates a section of a comparison report for NGS mitochondrial genome samples. Depth of coverage of NGS sequences make it a powerful tool for detection of low level heteroplasmy (Holland et al) and mixture analysis (Calloway et al). Figure 7 provides an example of heteroplasmy detection in NGS mitochondrial genome data analysis.

While Sanger sequence analysis remains the gold standard for mtDNA analysis, validation of High Throughput Sequencing will provide a valuable tool for forensic analysis of mtDNA.


About the Author

Teresa Snyder-Leiby is the product manager for Forensic and DNA fragment analysis software at SoftGenetics. Prior to joining SoftGenetics as a technical forensic specialist, she was a member of State University of New York biology faculty (SUNY New Paltz).


References

Holland, M.M., E.D. Pack, J.A. McElhoe. “Evaluation of GeneMarker HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment”, Forensic Science International: Genetics. (2017, Volume 28, pp. 90-98).

Parson, W., L. Gusmão, D.R. Hares, J.A. Irwin, W.R. Mayr, N. Morling, E. Pokorak, M. Prinz, A. Salas, P.M. Schneider, T.J. Parsons. “DNA Commission of the International Society for Forensic Genetics: Revised and extended guidelines for mitochondrial DNA typing”, Forensic Science International: Genetics (2014, Volume 13, pp. 134-142).

Riman, S., K.M. Kiesler, L.A. Borsuk, P.M. Vallone. “Characterization of NIST human mitochondrial DNA SRM-2392 and SRM-2392-I standard reference materials by next generation sequencing”, Forensic Science International: Genetics. (2017, Volume 29, pp. 181-192).

Vohr, S.H., R. Gordon, J.M. Eizenga, H.A. Erlich, C.D. Calloway, R.E. Green. “A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures”, In Press Forensic Science International: Genetics. Retrieved online 19 July 2017: http://dx.doi.org/10.1016/j.fsigen.2017.05.007

 
< Prev   Next >






Forensic Podiatry (Part Two of Two)

THE DISCIPLINE of forensic podiatry—or, in other words, the examination of pedal evidence—has progressed significantly over the past ten years. It is no longer a question of “What can you do with a footprint?” but rather, “Who can we use to evaluate the footprint?” Cases involving pedal evidence, especially bloody footprints and issues of determining shoe sizing or fit issues compared to questioned footwear, have become more common over the past two or three years.

Read more...