DNA Profile Interpretation & Probabilistic Genotyping Software
Written by Ian Webber Evett CBE   

ONE OF THE MOST CHALLENGING TASKS for today’s forensic scientists is the interpretation of low-level, degraded, or mixed DNA profiles from evidentiary material.

This article appeared in the November-December 2020 issue of Evidence Technology Magazine.
You can view that full issue here.

The theory for assigning probabilities to DNA genotypes was established to a broad consensus in the 1990s. The task of deciding on the combinations of genotypes and the relative weightings to be included in a calculation of evidential weight, though, was much slower to evolve because of the complexities and consequent computing power demands.

During this time, the forensic interpretation of profiles from DNA mixtures was by and large a manual, time-consuming process that analysts tackled by using heuristics to determine those genotype combinations that could reasonably explain a recovered profile. Now, all of that has been changed radically by the development of what has come to be known as probabilistic genotyping (PG) software.

The heuristic methodology, which is still being used in many forensic laboratories, meant that analysts relied on simplified interpretation strategies to deal with mixed DNA profiles. Applying various fixed thresholds and other biological parameters (such as heterozygote balance, mixture component ratios, and stutter ratios), they based their interpretations on genetic data being predominately above a given threshold in which a prospective contributor to the mixture could be included or excluded.

While this manual approach to mixture interpretation worked fairly well for most two-person mixtures, it was unwieldy at best and questionable at worst for more-complex mixtures. This often resulted in good data, which potentially could provide reliable weight with regard to the issue of whether or not DNA from a person of interest (POI) was present, classed as inconclusive, and ultimately discarded. Further complicating this situation was a rise in the complexity of DNA interpretation because crime samples were from an increasing proportion of lower-quality and more-complex mixtures.

Faced with a growing volume of less-than-ideal profiles, forensic labs recognized that the traditional interpretation process would no longer suffice. Fortunately, the rise in more complicated DNA mixtures was accompanied by increased availability and subsequent use of PG software as the method of choice for interpreting DNA profiling evidence.

PG software allows forensic labs to assess literally thousands of proposed profiles with respect to how closely they resemble or can explain an observed DNA mixture profile. Analysts can then calculate the probabilities of the observed DNA evidence, given propositions that might represent prosecution and defense positions at a future trial. These two probabilities, in turn, can then be presented as a likelihood ratio (LR), implying the evidential weight of the findings and the strength of support for one proposition over the other.

This approach has proven to be highly effective in allowing PG software to interpret possible components of highly complex DNA mixtures, far better than what was ever possible using the traditional manual process alone. This, in turn, has enabled PG software to produce usable, interpretable, and reliable DNA results that have contributed to the successful resolution of both criminal and civil investigations and stood up to scrutiny in subsequent judicial proceedings.

PG software has been instrumental in excluding individuals who were wrongly associated as the source of crime scene evidence—and in exonerating persons who were wrongly convicted via post-conviction cases.

PG software has been particularly productive in contributing to the resolution of violent crime and sexual assault cases. It has been instrumental in excluding individuals who have been wrongly associated as the source of crime scene evidence and in exonerating persons who were wrongly convicted via post-conviction cases. It has also been useful in cracking cold cases in which low-grade or mixture evidence that originally had to be dismissed as inconclusive could be examined again and used to develop other investigative leads.

Because of the scientific underpinnings, validation studies, and peer-reviewed publications supporting its use, PG software has garnered wide scientific support for its reliability. While PG software has been in use for less than a decade, it is based on standard mathematics. The probability models and Markov Chain Monte Carlo (MCMC) methods used by PG software originated in Los Alamos, New Mexico during World War II and were then brought closer to statistical practicality by a number of workers in the 1970s. Widely employed outside of forensic science, MCMC is at the heart of a huge range of applications, from computational biology and weather prediction to physics, engineering, and the stock market.

The scientific literature contains numerous peer-reviewed papers that support the validity of PG software. While some have pointed out that the developers of PG software are the authors of many of these papers, it is important to recognize that publication of a paper represents only the initial step in the peer-review process. Such published papers are meant not simply to inform, but to provoke discussion, promote improvements, and ultimately advance the science. To date, the overall peer-review process supports a consensus that PG software generates reliable results when it is used properly.

To that end, it is incumbent on forensic organizations to ensure that the analysts who regularly use PG software are properly trained in:

  • the interpretation of DNA evidence;
  • the formulation of meaningful propositions;
  • the principles and practices of the PG software being used; and
  • interpretation of the data generated by the software.

In addition, labs must regularly review validation studies that define the limitations of PG software and properly validate their own PG software with in-house studies in order to have a better understanding of the data being produced. They should also regularly review the peer-reviewed literature to know how others are working with PG software and any issues they are experiencing. With this knowledge in hand, labs are well-placed to put effective protocols in place that do not overstate the weight of evidence assigned from calculations carried out by PG software, and analysts will be better prepared to recognize when data provided by PG software cannot be supported.

All of this has led to PG software being recognized as the de facto “go to” method for interpreting DNA profiling evidence. It enables users to interpret DNA results faster, compare profiles against a POI, calculate a LR, use more of the information in a DNA profile, and ultimately, resolve previously unresolvable and highly complex DNA mixtures. As a result, PG software is being used in hundreds of thousands of cases worldwide. For example, one of the currently available software packages, STRmix, has been used to interpret DNA evidence in more than 220,000 cases in the past eight years.

The use of PG software internationally has risen to a level that practices are now being codified. The International Society for Forensic Genetics and the UK Forensic Science Regulator, for example, have published guidelines for validating software. The Scientific Working Group on DNA Analysis Methods (SWGDAM) has produced guidelines for validating PG tools. The Organization of Scientific Area Committees’ DNA Analysis 2 Sub-Committee is developing standards for assessing PG software tools.

While PG software use is now widespread, it is not without its critics and, like virtually all other technology, it has limitations. It should not come as a surprise that no matter how good it is, PG software cannot interpret every DNA profile. There must be sufficient DNA signal in the profile to move forward with an analysis using PG software. Some profiles are simply too degraded, too complex, or have too little information to be meaningfully interpreted.

It is essential for forensic analysts to be trained on the proper use of the PG software their lab employs, while recognizing that no matter how good, it will not always be able to provide a useful result.

With that in mind, it is essential for forensic analysts to be trained on the proper use of the PG software their lab employs, while recognizing that no matter how good, it will not always be able to provide a useful result. As for the lab, proper validation studies must be conducted so that everyone is aware of the limitations of the PG tools in use. That step, in combination with development and implementation of effective protocols designed to represent—reliably and robustly—the strength of the PG results, will reduce the chances of interpreting software output that is not supportable.

Limitations aside, critics have argued that DNA analysis results generated by PG tools are unreliable because miscodes have been discovered in the software, casting doubt on whether the output realistically can be trusted to produce error-free results. Attorneys, in particular, have been quick to argue that PG software represents a flawed “black box” approach to DNA analysis in which data is fed into a computer and a result is generated, while little is known about how that result has been derived or the computer algorithms that produced it. This has typically resulted in attorneys demanding to have access to the source code—and any potential miscodes it might contain—in cases where PG software has played a key role in conviction or exoneration.

Developers of PG software have countered such charges by pointing out that miscodes are present in virtually every software package. Moreover, they claim those miscodes which have been identified to date have tended to be on the fringe of DNA typing results. As such, they are difficult to define, not typically encountered, and have negligible impact on the reliability of the output.

There is a view among some developers that PG software can be effectively scrutinized by examining its extended output, which embodies the intermediate steps of the interpretation process. They also suggest that proper testing and comprehensive training can help in identifying miscodes.

Requests from prosecutors and defense attorneys to grant access to source code appear to have been met by various responses, ranging from permission to refusal to comply (citing issues of intellectual property rights). One developer has adopted policies which grant attorneys, scientists, expert witnesses, and others access not only to their source code, but also their developmental validation records, user’s manuals, and extended output. It seems that each case needs to be considered on its merits.

It appears that U.S. courts have, in the main, denied motions to exclude evidence produced by PG software, citing general acceptance in the scientific community and scientific validity. It is worth noting, however, that it can be extremely challenging for judges, attorneys, and juries to fathom the intricacies of DNA evidence. The responsibility to understand and validate PG software and its applications, however, needs to be accepted by those who regularly use forensic DNA typing methodologies. Doing so will provide confidence in its use and proper significance in casework, while simultaneously meeting challenges in the legal arena.

Despite these and other challenges that are likely to continue in the foreseeable future, even some of the most ardent critics of PG software have been forced to admit that it represents a major advance in DNA profile interpretation when used properly. By employing sound science, PG software enables the scientist to provide reliable and robust assignments of evidential value from a significantly broader range of DNA profiles than ever before possible. And while many other developments will occur in the future, PG software will continue to have a profound impact on the evaluation of DNA profiling evidence in both criminal and civil investigations.

About The Author
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it , DSc, Hon FCSFS is a statistician with Principal Forensic Services Ltd whose career-long specialty has been the evaluation of evidence. Evett has worked with many different evidence types, from fingerprints to handwriting to DNA profiling. He worked in the Home Office Forensic Science Service until its closure in 2012 then joined a group of senior colleagues from a range of forensic disciplines in forming Principal Forensic Services Ltd, which offers casework, consultancy, and training services internationally. He has doctorates from the universities of Strathclyde and Lausanne, is an honorary life fellow of the Chartered Society of Forensic Sciences, and was invested as a Commander of the Order of the British Empire in 2016 in recognition of his services to forensic science.

< Prev   Next >

Court Case Update

FINGERPRINT EVIDENCE went through a nearly three-year ordeal in the New Hampshire court system, but eventually emerged unscathed. On April 4, 2008, the New Hampshire Supreme Court unanimously reversed the decision of a lower court to exclude expert testimony regarding fingerprint evidence in the case of The State of New Hampshire v. Richard Langill. The case has been remanded back to the Rockingham County Superior Court.