Soil moisture is a critical environmental variable, but its retrieval in high-altitude regions is challenging due to Limited ground-truth data, frequent freeze-thaw cycles that alter soil dielectric properties, and complex surface-vegetation interactions unique to alpine ecosystems. This study addresses these challenges by developing an integrated framework for soil moisture retrieval at 10, 20, and 40 cm depths in the Naqu region, Tibetan Plateau. To isolate the soil backscatter signal from vegetation interference, we first apply a modified Water Cloud Model that incorporates vegetation structure by accounting for fractional vegetation cover and height. Subsequently, we compare the performance of five machine learning models (polynomial regression, 1D CNN, MLP, KAN, and an attention-based model) using the processed backscatter data and optical vegetation indices. The t raining dataset was augmented to improve model generalization. The modified WCM proved highly effective, improving the correlation between backscatter and soil moisture by up to 51% (at 20 cm depth). The comparative analysis revealed that model performance is depth-dependent: the attention-based model was optimal for surface moisture at 10 cm (R² = 0.713), MLP excelled at 20 cm (R² = 0.786), and KAN was superior at 40 cm (R² = 0.780). These findings highlight the importance of physically-based vegetation correction and provide a guide for selecting depth-specific models for soil moisture retrieval in complex high-altitude environments, offering valuable insights for hydrological modeling and ecological monitoring.