L fit are likely to disappear first until a trade off
L fit are likely to disappear first until a trade off between model complexity and goodness of fit, as measured by the likelihood function, begins as iterations continue. For models in the exponential family class, for example generalised linear models, such an initial value can easily be obtained by performing a ridge regression of a transformed and possibly slightly perturbed version of the response vector y, see the supplementary information for more details.The E step The components of the conditional expectation required in (7) are given by the following expressionFor p(n) N useT T r = (d (n) )[Yn B r Yn + I] -1( YnL – r ) ?r d (n)(12)and for p(n) > N use- T T T r = (d (n) )[I – Yn (Yn Yn + B r 1) -1 Yn ]( YnL – r ) ?r d (n) (13)Note that (12) appears to require the inversion of a p by p matrix, however the calculation can be done by inverting a p(n) by p(n) matrix since p-p(n) columns of Yn are identically zero, see the definition of Yn in Equation (8). By partitioning Yn, rand d(n) into conformable zero and nonzero components (12) and (13) can be calculated efficiently. In fact it is only necessary to calculate r for parameters which are currently non-zero. When the number of parameters p(n) in the model becomes less than N the size of the matrices being inverted becomes p(n) by p(n) and continues to decrease as more parameters are eliminated from the model. Note that the algorithm can be implemented to be O(min(N3, p3)).Convergence In practice the algorithm converges rapidly. To see the reason for this, differentiate (7) with respect to to obtainE i(n) , k , =(n) K 3 / 2 – k ( | i )|) (n) (n) | i | K 1 / 2 – k ( | i )|)(9)for i = 1,…, p, where K denotes the modified Bessel function of the third kind and =2 / b . The function K is astandard function in the R package [11], see also Zhang and Jin [12] for stand alone code. A sketch of the derivation of the above result is given in Appendix 2 in the supplementary information. Some useful special cases of (9) are: k=Q L = – (n)) 2 (d(14)By the definition of the algorithm in Section 2, (n+1) is defined so that the left hand side of (14) is zero. Hence if the sequence ((n), (n)) converges (10)E i(n) , k = 1, = / | i(n) |k=(d ( n ) ) 2 (L ) = (n +1). (n +1)Page 4 of(page number not for citation purposes)BMC Bioinformatics 2008, 9:http://www.biomedcentral.com/1471-2105/9/For the NG prior, using Abramowitz and Stegun [13], section 9.6.9, it can be shown that for small beta and 0 k 0.5 we have E-2 b(k)/| |2 and for 0.5 k 1 we have E (n) c(k, )/| |(3-2k) (16) (15)of values of k works well. Note that any process for assessing the quality of the predictions from a model chosen in this way should explicitly include this selection process to avoid selection bias. We will expand on this below.Implementing multiclass logistic regression To implement the algorithm for a particular model simply requires expressions for the first two derivatives of the likelihood function. See the supplementary information for details for multiclass logistic regression. Enlarged sets of predictors As mentioned earlier, enlarged sets of predictor variables for biological interpretation can be identified by running the algorithm multiple times and removing variables previously selected from consideration. An alternative strategy, which can identify sets of important Pyrvinium pamoate solubility highly correlated variables is PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/26780312 to define a new X matrix by clustering the columns of the original matr.