Abstract:
Accurate monitoring of soil salt content is critical for optimizing land management and maintaining ecological balance. In this study, the Pingluo County of Ningxia was used as the research area, and 16 spectral indices were extracted based on Sentinel-2 remote sensing imagery and ground-measured soil sampling data. Three feature selection methods were integrated to identify key variables, and Bayesian optimization was employed to tune the hyperparameters of four machine learning algorithms, thereby developing a high-accuracy soil salinity prediction model. The inverse distance weighting method was used to compare the spatial distributions of the measured and predicted points. The results indicate that the sequential forward selection with cross- validation(SFSCV) method enhances the reliability of feature selection via a stable cross-validation mechanism, effectively identifies the spectral indices most sensitive to soil salinity, and enables the construction of superior prediction models. The XGBoost model based on features selected by SFSCV and optimized with Bayesian optimization achieved high prediction accuracy. The validation set yielded a coefficient of determination of 0.625 and a root mean square error of 1.120. The developed model was applied to predict the soil salt content in Pingluo County, and the validation results showed high consistency. The residual values between the actual and predicted soil salinity values were primarily distributed within the range of −0.98~0.82 g·kg-1. This study shows that feature selection combined with Bayesian optimization significantly enhances the generalizability and prediction accuracy of the model in complex environments, making it suitable for soil salinization monitoring.