Benchmarking IUPred3

IUPred2 has been tested using the latest CAID dataset, as well as custom dataset consisting of fully annotated proteins at a residue level (more than 95% of the sequence has an order or disorder annotation). This dataset can be downloaded from here.

Dataset AUC
CAID 0.743
Fully annotated dataset 0.761
Benchmarking IUPred2

The performance of the second version of IUPred has been tested using customized datasets. Intrinsically disordered protein regions (IDRs) were taken from the DisProt database as the positive testing dataset. Only IDRs with a length of at least 9 residues were included.

The negative dataset comprises protein regions that are known to represent independently folding, stable monomeric units encompassing a single domain according to CATH definitions. These structures were collected from the PDB and were filtered to include only one protein region from each UniRef90 sequence cluster. Structures with flexible residues, as evidenced by highly dissimilar NMR models or missing X-ray coordinates, were removed.

The positive and negative datasets used for benchmarking can be downloaded from here and here, respectively.

Number of residues Number of proteins
Positive (DisProt) 84,479 1,195
Negative (monomeric structures) 178,957 1,095

Using the above bechmark sets, IUPred2 can be characterized with the following binary classifier measures:

Sensitivity (True Positive Rate) 61.85%
Specificity (True Negative Rate) 94.03%
Precision* 91.20%

*as the value of precision depends heavily on the relative sizes of the positive and negative datasets, the database sizes were scaled to be equal to achieve an unbiased measure

Benchmarking ANCHOR2

The performance of ANCHOR2 was tested on the recently published DIBS database, as the positive testing dataset. DIBS represens the largest currently available set of experimentally verified IDRs capable of forming ordered structures upon binding to protein domains. Only entries not used in the training of ANCHOR2 were used in testing.

For negative testing, the same monomeric single-domain protein dataset was used as for testing IUPred2, but allowing for structures with up to 20% of flexible residues. Only entries not used in the training of ANCHOR2 were used in testing. Furthemore, ANCHOR2 was also evaluated on a set of flexible linkers that are disordered but are known to lack a primary binding function.

To get a fuller picture about the efficiency of ANCHOR2 on sequence sets with different compositions, two auxiliary datasets were also considered. The first is composed of disordered regions from DisProt, the next is a collection of random (decoy) segments from the human proteome excluding transmembrane regions, structured Pfam domains and extracellular proteins. Both datasets are expected to contain disordered binding regions, albeit to a significantly lower extent, compared to DIBS.

The positive, negative, and auxiliary datasets used for benchmarking can be downloaded from here.

Number of residues Number of proteins
Positive (DIBS) 2,135 140
Negative (monomeric structures) 583,033 3,320
Negative (flexible linkers) 5,425 389
Auxiliary (DisProt) 79,049 1,042
Auxiliary (decoy) 76,860 5,040

Using the above bechmark sets, ANCHOR2 can be characterized with the following binary classifier measures. As ANCHOR often specifically identifies only strongly binding sub-regions inside larger binding regions, segment-based sensitivity was also calculated. In this case a binding region was considered found if it incorporates at least one ANCHOR-identified region, regardless of possible difference in length:

Residue-based metrics Segment-based metrics
Sensitivity (True Positive Rate) 62.67% 69.29%
Specificity (True Negative Rate on ordered monomers) 98.26% -
Specificity (True Negative Rate on flexible linkers) 94.58% -
Fraction of predicted binding residues in auxiliary DisProt dataset 50.00% -
Fraction of predicted binding residues in auxiliary decoy dataset 10.93% -

Benchmarking other context dependent features

As currently there are no comprehensive datasets collecting a large number of experimentally verified examples for other types of context-dependent IDRs targeted by IUPred3, the rigorous testing of these features are not possible as of yet. In accord, these features are marked as ‘Experimental’. However, a number of select examples are available in the How to use section.

Primary citations

Bálint Mészáros, Gábor Erdős, Zsuzsanna Dosztányi
IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding
Nucleic Acids Research 2018;46(W1):W329-W337.

Gábor Erdős, Zsuzsanna Dosztányi
Analyzing Protein Disorder with IUPred2A
Current Protocols in Bioinformatics 2020;70(1):e99

Additional citations

Zsuzsanna Dosztányi
Prediction of protein disorder based on IUPred
Protein Science 2017;27:331-340.

Dosztányi Z, Csizmók V, Tompa P, Simon I.
The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins
J Mol Biol. 2005;347:827-39.

Mészáros B, Simon I, Dosztányi Z.
Prediction of protein binding regions in disordered proteins
PLoS Comput Biol. 2009;5:e1000376.