Aligned read length calculation
The aligned length of a read at a given accuracy threshold is defined as the greatest position in the read at which the accuracy in the bases up to and including the position meets the accuracy threshold. Accuracy is specified using the Phred -10log10 transformation. As a result, 20 refers to an error rate of 1%, 17 refers to an error rate of 2%, and so on.
For example, the AQ20 length is the greatest length at which the error rate is 1% or less, and the AQ17 length is the greatest length at which the error rate is 2% or less. The "perfect" length is the longest perfectly aligned segment.
For all these calculations, the alignment is constrained to start from position 1 in the read - that is, no 5' clipping is allowed. The underlying assumption is that the reference to which the read is aligned represents the true sequence that is seen.
Appropriate caution must be taken when interpreting AQ20 values in situations where the sample sequenced has substantial differences relative to the reference used, such as working with alignments to a rough draft genome or with samples that are expected to have high mutation rates relative to the reference used. In these situations, the AQ20 lengths might be short even when sequencing quality is excellent.
Specifically, the AQ20 length is calculated as follows:
-
Every base in the read is classified as being correct or not correct according to the alignment to the reference.
-
At every position in the read, the total error rate is calculated up to and including that position.
-
The greatest position at which the error rate is one percent or less is identified and that position defines the AQ20 length.
For example, if a 100–bp read consists of 80 perfect bases followed by 2 errors followed by 18 more perfect bases, the total error rate at position 80 is zero percent. At position 81 the total error rate is 1.2% (1/81), at position 82 the error rate is 2.4%, continuing up to position 100 where it is two percent (2/100). The greatest length at which the error rate is one percent or less is 80 and the greatest length at which the error rate is two percent or less is 100, so the AQ20 and AQ17 lengths are 80 and 100 bases, respectively.