N are described. At the end of the section, the general performance of the two combined strategies of estimation is presented. The results are compared together with the configuration from the femur obtained by manually marked keypoints.Appl. Sci. 2021, 11,ten of3.1. PS Estimation Because of this of coaching over 200 networks with unique architectures, the one guaranteeing the minimum loss Tasisulam Description function value (7) was selected. The network architecture is presented in Figure 8. The optimal CNN architecture [26] consists of 15 layers, ten of which are convolutional. The size from the last layer represents the number of network outputs, i.e., the coordinates of keypoints k1 , k2 , k3 .Input imageFigure 8. The optimal CNN architecture. Each and every rectangle represents 1 layer of CNN. The following colors are employed to distinguish critical components with the network: blue (fully connected layer), green (activation functions, where HS stands for tough sigmoid, and LR denotes leaky ReLU), pink (convolution), purple (pooling), white (batch normalization), and yellow (dropout).Soon after 94 epochs of Lanopepden Cancer instruction, the early stopping rule was met and the learning approach was terminated. The loss function of improvement set was equal to 8.4507 px2 . The results for all finding out sets are gathered in Table two.Table two. CNN loss function (7) values for distinct studying sets. Studying Set Train Development Test Proposed Answer 7.92 px2 8.45 px2 6.57 px2 U-Net [23] (with Heatmaps) 9.04 px2 10.31 px2 six.43 pxLoss function values for all learning sets are within acceptable range, given the overall complexity of the assigned activity. The performance was slightly superior for the train set in comparison for the development set. This feature normally correlates to overfitting of train data. Fortunately, low test set loss function value clarified that the network efficiency is accurate for previously unknown data. Interestingly, test set data achieved the lowest loss function worth, which can be not popular for CNNs. There could possibly be quite a few causes for that. Initially, X-ray images utilised throughout coaching were of slightly diverse distribution than those in the test set. The train set consisted of pictures of young children varying in age and, consequently, of a distinctive knee joint ossification level, whereas the test set incorporated adult X-rays. Second, train and development sets had been augmented making use of standard image transformations, to constitute a valid CNN mastering set (as described in Table 1). The corresponding loss function values in Table 2 are calculated for augmented sets. A number of the image transformations (randomly selected) resulted in higher contrast photos, close to binary. Consequently, these photos were validated with high loss function value, influencing the general efficiency on the set. However, the test set was not augmented, i.e., X-ray pictures were not transformed before the validation. The optimization in the hyperparameters of CNN, as described in Appendix A, enhanced the method of network architecture tuning, in terms of processing time also as low loss function worth (7). The optimal network architecture (optimal within the sense of minimizing the assumed criterion (7)) consists of convolution layers with distinct window sizes, for convolution and for pooling layers. It is actually not consistent with the broadly well known heuristics of tiny window sizes [33]. Within this distinct scenario, smaller window sizes inAppl. Sci. 2021, 11,11 ofCNN resulted in larger loss function or exceeded the maximum network size limi.