Tivity analysis showed that three levels of graph convolutions with 12 nearest neighbors had an optimal remedy for spatiotemporal neighborhood modeling of PM. The reduction in graph convolutions and/or the amount of nearest neighbors reduced the generalization of the trained model. While a further increase in graph convolutions can further boost the generalization capability from the educated model, this improvement is trivial for PM modeling and requires extra intensive computing sources. This showed that compared with neighbors that were closer towards the target geo-features, the remote neighbors beyond a specific range of spatial or spatiotemporal distance had limited influence on spatial or spatiotemporal neighborhood modeling. Because the results showed, though the full residual deep Combretastatin A-1 Cancer network had a functionality equivalent for the proposed geographic graph process, it performed poorer than the proposed approach in regular testing and site-based independent testing. Furthermore, there were considerable differences (ten ) in the overall performance in between the independent test and test (R2 improved by about four vs. 15 ; RMSE decreased by about 60 vs. 180 ). This showed that the site-based independent test measured the generalization and extrapolation capability of the educated model improved than the common validation test. Sensitivity evaluation also showed that the geographic graph model performed improved than the nongeographic model in which all the characteristics were applied to derive the nearest neighbors and their distances. This showed that for geo-features which include PM2.5 and PM10 with powerful spatial or spatiotemporal correlation, it was acceptable to work with Tobler’s Very first Law of Geography to construct a geographic graph ML-SA1 Protocol hybrid network, and its generalization was improved than general graph networks. Compared with decision tree-based learners which include random forest and XGBoost, the proposed geographic graph strategy didn’t call for discretization of input covariates [55], and maintained a complete array of values from the input data, thereby avoiding data loss and bias caused by discretization. Additionally, tree-based learners lacked the neighborhood modeling by graph convolution. Even though the efficiency of random forest in instruction was quite equivalent to the proposed approach, its generalization was worse compared using the proposed process, as shown within the site-based independent test. Compared with all the pure graph network, the connection with the complete residual deep layers is critical to cut down over-smoothing in graph neighborhood modeling. The residual connections together with the output on the geographic graph convolutions could make the error info directly and correctly back-propagate for the graph convolutions to optimize the parameters from the educated model. The hybrid method also tends to make up for the shortcomings from the lack of spatial or spatiotemporal neighborhood function in the complete residual deep network. Moreover, the introduction of geographic graph convolutions tends to make it possible to extract significant spatial neighborhood features in the nearest unlabeled samples in a semi-supervised manner. This is especially helpful when a sizable volume of remotely sensed or simulated data (e.g., land-use, AOD, reanalysis and geographic environment) are accessible but only restricted measured or labeled information (e.g., PM2.5 and PM10 measurement data) are out there. For PM modeling, the physical connection (PM2.5 PM10 ) among PM2.5 and PM10 was encoded inside the loss by way of ReLU activation a.