The mathematics behind DNA mismatch detection assays
Learn how to derive the equations behind mismatch detection assays
We recently wrote a featured article describing assessment of gene editing with DNA mismatch detection assays where we explain the relationship between percent editing and percent cleavage. For many, this is a nonintuitive result, but it is easy to understand after examining the derivation of Eq. 1. The underlying mathematics also reveals several fundamental assumptions not often discussed.
In Eq. 1, a, b, and c represent the density of the three bands in an agarose gel, which result from running a T7EI mismatch detection assay, depicted in Fig. 1c.
Mismatch detection assays
Our previous article describes the T7EI assay in detail, but it can be summarized in three broad steps.
 Step 1: After a gene editing experiment, DNA is collected from the experimental population. Double stranded DNA regions straddling the intended edit site are PCRamplified. The idea is to amplify wild type (WT) amplicons (if it was not edited) or edited amplicons (if it has an insertion or deletion; Fig. 1a).
 Step 2: The resulting pool of amplicons is then melted and allowed to reanneal before adding T7EI. The strands will reanneal randomly if the sequences are similar between WT and edited (Fig. 1b). We will assume that any reannealed amplicons with a mismatch are cleaved upon the addition of T7EI. However, it is known that mismatch detection enzymes are not 100% efficient at identifying and cutting all types of mismatches or bulges/distortions in DNA.^{1,2}
 Step 3: An agarose gel run on the resulting mixture gives the fraction of reannealed amplicons that were cleaved as the density of the two lower, cut bands (separate from the upper, uncut WT band; Fig. 1c).
Figure 1: Illustration of key steps in mismatch detection assays.
The Mathematics
Calculating the fraction of alleles that have an edit from the fraction of amplicons cut by T7EI in the gel (Eq. 1) essentially boils down to calculating the probability that two reannealed strands (from step 2) will have a mismatch. If strands reanneal randomly, we can calculate the probability of each possible reannealing combination (WTWT, WTedited, and editededited) in terms of the fraction editing, p_{e} (Fig 2). And if we make the simplifying assumption that any mismatch will be cleaved by T7EI, we can calculate the cutting probability for each scenario (Fig. 2).
Figure 2: Illustration of reannealing probabilities. These probabilities are a good approximation when there are many cells and the amplicons are a result of many PCR cycles.
Once we have defined these probabilities, we can calculate the fraction of reannealed amplicons that we expect to be cleaved (expected f_{c}). This is simply the reannealing probability for a given state i multiplied by the probability that T7EI will cleave state i, summed over all N possible states.
Equating the fraction of reannealed amplicons cleaved in the experiment (f_{c}) to Eq. 2 gives us
where p_{e} is the editing probability and f_{c} is the following experimentally measured ratio (see Fig. 1c)
Solving Eq. 3 gives us the general formula for fraction editing in terms of f_{c}.
Now, all that remains to be done is to calculate (or approximate) p_{m} for a given experiment. We will now outline several ways in which this can be done.
Approximation 1: All Edits Are Unique
If we assume that no two edited strands are identical (i.e. edits are random), reannealed editededited strands (from Fig. 2) will always have a mismatch. If there is always a mismatch, we can approximate p_{m} = 1, since T7EI should always cleave the amplicon. Substituting p_{m} = 1 into Eq. 4 gives us
which is identical to Eq. 1 after multiplying by 100! This is the approximation often taken in literature. As a mathematical side note, there are two solutions given by Eq. 4, but, in this case, we can discard the second, which gives a nonphysical result (i.e. p_{e} > 1, which violates the requirement that p_{e} range between 0 and 1).
Figure 3: The fraction editing (p_{e}) as a function of fraction T7EI cutting measured from the gel (f_{c}) for unique edits (approximation 1; Eq. 5).
Key Assumptions
Since this is the approximation most often taken in literature, it is worth noting several key assumptions in the derivation of Eq. 5. These assumptions are likely reasonable in many experiments, but are useful to consider.
 Reannealing is equally likely to occur between any two amplicon strands. If this were not the case, it is difficult to make any conclusions with this assay.
 No two edits are the same. If edits are decidedly nonrandom, you can no longer assume that editededited reannealing will result in T7EI cleavage. See the next two sections for examples (Approximations 2 and 3).
 There are many cells in each gene editing experiment. Were this not the case, we cannot approximate the probabilities as we have in Fig. 2.
There are, of course, other potential concerns common to many assays that we will not discuss in this article: potential PCR amplification bias, gel image saturation, mismatch/deformation bias of the given nuclease (e.g. T7EI versus CEL1^{2}).
Approximation 2: All Edits Are Identical
As we saw in the previous section, one key assumption to Eq. 1 is that no two edited strands are identical. What if the opposite were true and all edits were the same? This situation can also arise in the context of genotyping plants (the original application of mismatch detection) when there is a naturallyoccurring heterozygous locus.^{4} In this situation, we can approximate p_{m} = 0 since reannealed editededited strands (Fig. 2) are identical and will therefore not be cleaved by T7EI. Substituting p_{m} = 0 into Eq. 4 gives us
which has two valid solutions. In this situation (as shown in Fig. 4), there is no way to determine from a gel image alone whether an f_{c} of 0 is the result of 0% editing or 100% editing! Also, note that it is no longer possible to measure an f_{c} > 0.5, which makes intuitive sense. Under these assumptions, the maximum possible f_{c} occurs when the probability of a mismatch is maximized [2p_{e}(1p_{e}) in Fig. 2], which occurs at f_{c} = 0.5.
Figure 4: The fraction editing (p_{e}) as a function of fraction T7EI cutting measured from the gel (f_{c}) for identical edits (approximation 2; Eq. 6).
Approximation 3: N Equally Likely Edits
In the first approximation, we assumed that no two edits were the same, and in the second approximation we took the opposite assumption (all edits are identical), which resulted in two different estimates for the fraction editing (Eqs. 5 and 6). It is reasonable to guess that the truth for any given experiment lies somewhere in between. For this approximation, we take the more general case where there are N distinct edits that are equally likely to occur.
The probability of reannealed editededited strands having a mismatch is one minus the probability of editededited strands matching (Eq. 7).
ν_{i} is probability of selecting edited strand i from the pool of all N edited strands. In this approximation, ν_{i} = 1/N since all N edits are equally likely, which reduces Eq. 7 to
Plugging p_{m} into Eq. 4 gives the fraction editing for N equally likely edits.
Both solutions in Eq. 9 are valid for values of f_{c} that give p_{e} between 0 and 1 (and rational). Notice that limiting values of N in Eq. 9 give the same result as our first two approximations. If all edits are unique, there are an (effectively) infinite number of possible edits (N → ∞), which gives us Eq. 5 (approximation 1). If all edits are identical, there is a single possible edit (N = 1), which reduces to Eq. 6 (approximation 2).
Figure 5: The fraction editing (p_{e}) as a function of fraction T7EI cutting measured from the gel (f_{c}) for N equally likely edits (approximation 3; Eq. 9).
As you can see from Fig. 5, as the number of distinct edits increases beyond N=16, the estimated fraction editing begins to closely approximate the published formula (Eq. 1). Although Eq. 1 is likely a good approximation for most experiments, there is evidence that certain CRISPR editing events will produce highly nonrandom results with only a few prevalent edits.^{3} We have also observed this in some experiments (Fig. 6); we treat our estimate of the percent editing as a lower limit and take care to not overinterpret small differences between T7EI assay results.
Figure 6: The distribution of different target site editing events for a crRNA targeting CDKN1A as obtained from NextGeneration Sequencing. Editing events are ordered from most to least prevalent, and each “event” is defined as resulting in a unique sequence (e.g. editing event 1 is a twobase deletion at the cut site; editing event 2 is an insertion of a thymine at the cut site; etc). It is possible to experimentally calculate p_{m} from this data using Eq. 7, which gives a value of 0.84. This can be used to calculate the percent editing (Eq. 4) in a mismatch detection assay for this (or similar targets).
Summary
When analyzing mismatch detection assays, we often do not know the underlying distribution of editing events (e.g. Fig. 6), which can differ between target site, target sequence, and cell line. This translates to uncertainty in the probability of two edited strands reannealing with a mismatch (p_{m} in Eq. 4). Due to this uncertainty, it is often best to use the standard equation (Eq. 1) when calculating the percent editing, which provides a lower limit. Keeping in mind that this can be a (sometimes substantial) underestimate of the true percentage editing, we take care to not overinterpret minor differences between different mismatch detection assays.
Using these calculations as a starting point, it is straightforward to begin relaxing other assumptions and testing for their consequences. We hope that taking a closer look at the mathematics has helped make mismatch detection assay analysis easier to understand!
T7EI Web Tool
Check out our T7EI Calculator available as part of the bioinformatics group’s beta tools offering (freely available to the public). If you find this tool useful or would like to see additional features added, contact us.
Authors: Matthew R. Perkett, Bioinformatics Developer; Emily M. Anderson, Senior Scientist; Jesse Stombaugh, Bioinformatics Developer
References
 R. D. Mashal, J. Koontz, J. Sklar, Detection of mutations by cleavage of DNA heteroduplexes with bacteriophage resolvases. Nat Genet 9, 177183 (1995).
 L. Vouillot, A. Thelie, N. Pollet, Comparison of T7E1 and surveyor mismatch cleavage assays to detect mutations triggered by engineered nucleases. G3 (Bethesda) 5, 407415 (2015).
 M. van Overbeek, et. al., DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9Mediated Breaks. Mol Cell 63(4), 633646 (2016).
 N Paniego, C. Fusari, V. Lia, A. Puebla, SNP genotyping by heteroduplex analysis. Methods Mol Biol 1245, 141150 (2015).
Additional Resources

Lentiviral and synthetic reagents for targeted gene knockout

Speciesspecific crRNAs targeting wellcharacterized genes, as well as mismatch detection assay primers, to determine the effectiveness of your gene editing conditions for maximal efficiency.

Bioinformatics tools freely available to the public