N data. In addition our system should function well in making
N data. In addition our system should function well in making condition-specific predictions if appropriate expression data are acquired. Here, a control experiment was run to test the performance of SVM Aprotinin biological activity classifiers against randomized datasets. Three regulators were chosen according to the number of available targets (WT1?5 targets, MYC?7 targets, and OCT4?18 targets). For each regulator, the index of positives and negatives was shuffled during training (in all 100 classifiers representing the TF) to create randomized classifiers. These classifiers were then applied to the human genome and compared to the classifiers made with notshuffled data. As expected, the shuffled classifiers make very few predictions in the genome which pass the 0.95 threshold. The shuffled WT1 classifier makes no predictions, while the shuffled OCT4 and MYC classifiers make 1 and 2 predictions respectively. In a genome of 18660 genes, this suggests that a random classifier will make fewer than 1 false positive per 10000 predictions when the threshold is set to 0.95 or greater. The performance of therandomized classifiers was tested using cross-validation (the classification threshold used in cross validation is 0.5). The real classifiers had performance measures which were significantly better than random in the cases which were tested (p-values for PPV and Accuracy less than 2.59e-28). This is shown in Figure 2 for PPV, where boxplots are used to compare the performance of actual and random classifiers. Since cross-validation is performed at the 0.5 decision threshold, an immediate question that comes to mind is how to evaluate the significance of the performance accuracy of a given classifier (e.g., at P = 0.5, is 68 accuracy significantly better than random?) We have therefore constructed a hypothesis test to determine whether any measured accuracy is different than random. This test showsFigure 2 Actual vs. Label-Shuffled Classifier Box-plots Actual vs. Label-Shuffled Classifier Box-plots. 100 classifiers represent each TF, meaning that cross-validation produces a population of PPV measurements to represent a TF classifier. These populations are used to compare the significance of the actual vs. the label-shuffled classifiers (denoted with the prefix “Rand”). Here the PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/27872238 comparison is shown for WT1, Myc, and OCT4. Each box-and-whisker plot has a top line (the upper quartile value-not whisker line), a central red line (the median), and a bottom (the lower quartile value). If the notches on two different boxes do not overlap then one may conclude that the two population medians are significantly different (at the 5 level). Each box also has whiskers which look like standard error bars. The length of a whisker equals 1.5 times the interquartile range, which is the default value in Matlab [214]. Plus(+) signs represent potential outlier points existing beyond that default range.Page 4 of(page number not for citation purposes)Biology Direct 2008, 3:http://www.biology-direct.com/content/3/1/that the 68 accuracy measured for WT1 (averaged over 100 classifiers) is significant at p = 1.36e-4, making it unlikely that our results would have been obtained at random. Classifiers with larger numbers of known targets will show even stronger significance at the same accuracy. The full details of the hypothesis test as well as a brief discussion of its application to other TFs can be found in our Additional File 1. Our method begins with 8817 known TF-gene interactions for 1.