Example for output produced by EvalSec
Example for output produced by EvalSec
(evaluation of secondary structure prediction accuracy)
OUTPUT
SYM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SYM Abbreviations for accuracy of secondary structure prediction
SYM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SYM
SYM For an explanation of the scores, please see:
SYM per-residue accuracy: Rost & Sander, JMB, 1993, 232, 584-599
SYM per-segment accuracy: Rost et al., JMB, 1994, 235, 13-26
SYM
SYM H, E, L: helix (H), extended strand (E), all others (L = loop)
SYM
SYM obs, prd: observed, predicted
SYM
SYM ~~~~~~~~~~~~~~~~~~
SYM Per-residue scores
SYM ~~~~~~~~~~~~~~~~~~
SYM
SYM A(i,j): number of residues observed in secondary structure state i and
SYM predicted in secondary structure state j, where i and j is can
SYM be either of the following: helix (H), strand (E), or other (L)
SYM
SYM number of residues correctly predicted in state i
SYM Q(i)obs = ------------------------------------------------- * 100
SYM number of residues observed in state i
SYM
SYM number of residues correctly predicted in state i
SYM Q(i)prd = ------------------------------------------------- * 100
SYM number of residues predicted in state i
SYM
SYM Q3 overall three-state per-residue accuracy (three states: H,E,L)
SYM defined by:
SYM number of residues correctly predicted
SYM = --------------------------------------- * 100
SYM number of all residues
SYM
SYM BAD percentage of residues predicted in helix, observed in strand
SYM or predicted in strand and observed in helix
SYM
SYM OVER percentage of residues predicted in helix or strand, and ob-
SYM served in loop
SYM
SYM UNDER percentage of residues predicted in helix or strand, and ob-
SYM served in loop
SYM
SYM Iobs information entropy contained in matrix A(i,j), defined by:
SYM
SYM SUM SUM
SYM SUM a(i)*ln a(i) - SUM A(i,j) * ln A(i,j)
SYM SUM SUM
SYM i ij
SYM = ________________________________________________
SYM
SYM SUM
SYM N * ln N - SUM b(i) * ln b(i)
SYM SUM
SYM i
SYM
SYM where N is the number of residues, a(i) the number of residues
SYM predicted to be in secondary structure i; b(i) the number of re-
SYM sidues observed to be in i; and A(i,j) the number of residues
SYM predicted to be in i and observed to be in j.
SYM
SYM Iprd information entropy but weighted by the predicted numbers, i.e.,
SYM same as Iobs by exchanging b(i) <-> a(i).
SYM
SYM COR(i) Matthew correlation coefficient for structure i
SYM
SYM /prot
SYM overall three-state accuracy averaged over proteins (as opposed
SYM posed to residues).
SYM
SYM Dcontent(i)
SYM difference between observed and predicted content of secondary
SYM structure type i (percentage)
SYM
SYM
SYM ~~~~~~~~~~~~~~~~~~
SYM Per-segment scores
SYM ~~~~~~~~~~~~~~~~~~
SYM
SYM avL(i)obs average length for the structure type i as observed
SYM e.g., average length of an observed helix
SYM
SYM avL(i)prd average length for the structure type i as predicted
SYM e.g., average length of a predicted helix
SYM
SYM SOV(i)obs
SYM SOV(i)prd
SYM fractional overlap (in percentage between segments predicted
SYM and observed in structure type i), defined by:
SYM
SYM SUM 1 MINOV(S1;S2) + DELTA
SYM SOV(i) = SUM - * -------------------- * LEN(S1)
SYM SUM N MAXOV(S1;S2)
SYM S
SYM
SYM where N is the total number of residues, S1 and S2 are the ob-
SYM served and predicted secondary structure segments (in state i),
SYM and LEN(S1) is the number of residues in the segments of S1.
SYM The sum (SUMSUMSUM) is taken over all segment pairs S={S1,S2}.
SYM The actual overlap bewteen the two segments is MINOV, i.e.,the
SYM number of residues for which both segments have, e.g., a H (he-
SYM lix) in common; maxov is the total extent of both segments,i.e.,
SYM the number of residues jfor which either jof the two has, say,
SYM the assigned state H. The accepted variation DELTA assures a
SYM ratio of 1.0 when there are only minor deviations at segment
SYM ends; it is chosen to be smaller than MINOV and smaller than
SYM half the length of segment S1. The ratio MINOV/MAXOV is con-
SYM strained to a maximum value of 1.0, i.e., the allowance cannot
SYM lead to a "more than perfect" value of fractional overlap for
SYM obs any segment comparison. The addition of 'obs' (SOV(i)obs)
SYM indicates that the length of the observed segments was used for
SYM weighting (likelihood that an observed segment is correctly
SYM prd predicted), i.e., S1 is the observed segment. In contrast, 'prd'
SYM labels the weighting by the lengtyh of the predicted segments
SYM (likelihood that a predicted segment is correct).
SYM
SYM SOV3 fractional segment overlap for all three states H, E, L
SYM
#
# *****************************
# Prediction accuracy for FIRST
# *****************************
#
# A(i,j): number of residues observed in state i, predicted in j:
#
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS | prd H | prd E | prd L | obs Sum |
DAT +---------+---------+---------+---------+---------+
DAT | obs H | 8 | 0 | 2 | 10 |
DAT | obs E | 0 | 0 | 0 | 0 |
DAT | obs L | 0 | 0 | 5 | 5 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum | 8 | 0 | 7 | 15 |
DAT +---------+---------+---------+---------+---------+
#
# Per-residue and Per-segment scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Per-residue scores | | Per-segment scores |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i = H | 80 | 100 | 0.76 | | 100.0 | 100.0 | 5.0 | 4.0 |
DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 0.0 |
DAT | i = L | 100 | 71 | 0.76 | | 100.0 | 100.0 | 2.5 | 3.5 |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
#
# Overall scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Overall per-residue scores | | Overall per-segment scores |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER | 0.0 | UNDER | 13.3 | | |
DAT | I obs | 0.48 | I prd | 0.48 | | |
DAT | Q3 | 86.7 | BAD | 0.0 | | SOV3obs | 100.0 | SOV3prd | 100.0 |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
#
# ******************************
# Prediction accuracy for SECOND
# ******************************
#
# A(i,j): number of residues observed in state i, predicted in j:
#
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS | prd H | prd E | prd L | obs Sum |
DAT +---------+---------+---------+---------+---------+
DAT | obs H | 0 | 2 | 2 | 4 |
DAT | obs E | 0 | 0 | 0 | 0 |
DAT | obs L | 0 | 1 | 2 | 3 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum | 0 | 3 | 4 | 7 |
DAT +---------+---------+---------+---------+---------+
#
# Per-residue and Per-segment scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Per-residue scores | | Per-segment scores |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i = H | 0 | 0 | 0.00 | | 0.0 | 0.0 | 4.0 | 0.0 |
DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 3.0 |
DAT | i = L | 66 | 50 | 0.17 | | 100.0 | 50.0 | 3.0 | 2.0 |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
#
# Overall scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Overall per-residue scores | | Overall per-segment scores |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER | 14.3 | UNDER | 28.6 | | |
DAT | I obs | -0.38 | I prd | -0.38 | | |
DAT | Q3 | 28.6 | BAD | 28.6 | | SOV3obs | 42.9 | SOV3prd | 28.6 |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
#
# *************************************************
# Prediction accuracy for Average over all residues
# *************************************************
#
# A(i,j): number of residues observed in state i, predicted in j:
#
DAT +---------+---------+---------+---------+---------+
DAT | NUMBERS | prd H | prd E | prd L | obs Sum |
DAT +---------+---------+---------+---------+---------+
DAT | obs H | 8 | 2 | 4 | 14 |
DAT | obs E | 0 | 0 | 0 | 0 |
DAT | obs L | 0 | 1 | 7 | 8 |
DAT +---------+---------+---------+---------+---------+
DAT | prd Sum | 8 | 3 | 11 | 22 |
DAT +---------+---------+---------+---------+---------+
#
# Per-residue and Per-segment scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Per-residue scores | | Per-segment scores |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | SCORES |Q(i)obs|Q(i)prd| COR(i)| |SOV(i)obs|SOV(i)prd|avL(i)obs|avL(i)prd|
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
DAT | i = H | 57 | 100 | 0.57 | | 71.4 | 100.0 | 4.7 | 4.0 |
DAT | i = E | 0 | 0 | 0.00 | | 0.0 | 0.0 | 0.0 | 3.0 |
DAT | i = L | 87 | 63 | 0.57 | | 100.0 | 81.8 | 2.7 | 2.8 |
DAT +---------+-------+-------+-------+ +---------+---------+---------+---------+
#
# Overall scores:
#
DAT +---------------------------------+ +---------------------------------------+
DAT | Overall per-residue scores | | Overall per-segment scores |
DAT +-------+--------+-------+--------+ +---------+---------+---------+---------+
DAT | OVER | 4.5 | UNDER | 18.2 | | |
DAT | I obs | 42.00 | I prd | 42.00 | | |
DAT | Q3 | 68.2 | BAD | 9.1 | | SOV3obs | 81.8 | SOV3prd | 77.3 |
DAT +-------+========+-------+--------+ +---------+=========+---------+---------+
#
# Per-residue accuracy averaged over all 2 proteins:
#
+---------------------+---------------------------------+
| /prot = 57.62 | one standard deviation = 57.62 |
+---------------------+---------------------------------+
#
# Accuracy of predicting secondary structural content:
#
DAT +---------------------+---------------------------------+
DAT | Dcontent H = 35.24 | one standard deviation = 35.24 |
DAT | Dcontent E = 21.43 | one standard deviation = 21.43 |
DAT +---------------------+---------------------------------+
#
# Accuracy of predicting secondary structural class:
#
# Sorting into structure class according to
# Zhang, C.-T. and Chou, K.-C., Prot. Sci. 1:401-408, 1992:
# all-H: percentage of H >= 45% , percentage of E < 5%
# all-E: percentage of H < 5% , percentage of E >=45%
# mix : percentage of H >= 30% , percentage of E >=20%
#
DAT +-------+-------+-------+-------+-------+-------+
DAT | | sum | sum | sum | Q | Q |
DAT | class | obs | prd |correct| %obs | %prd |
DAT +-------+-------+-------+-------+-------+-------+
DAT | all-H | 2 | 1 | 1 | 50.0 | 100.0 |
DAT | all-E | 0 | 0 | 0 | 0.0 | 0.0 |
DAT | mix | 0 | 0 | 0 | 0.0 | 0.0 |
DAT | other | 0 | 1 | 0 | 0.0 | 0.0 |
DAT +-------+-------+-------+-------+-------+-------+
DAT | SUM | 4 | 4 | 2 | 50.0 | 50.0 |
DAT +-------+-------+-------+-------+-------+-------+
END
|