Substance Use Disorder (SUD) is a devastating disease that leads to significant mental and behavioral impairments. Its negative effects damage the homeless youth population more severely (as compared to stably housed counterparts) because of their high-risk behaviors. To assist policymakers in devising effective and accurate long-term strategies to mitigate SUD, it is necessary to critically analyze environmental, psychological, and other factors associated with SUD among homeless youth. Unfortunately, there is no definitive data-driven study on analyzing factors associated with SUD among homeless youth. While there have been a few prior studies in the past, they (i) do not analyze variation in the associated factors for SUD with geographical heterogeneity in their studies; and (ii) only consider a few contributing factors to SUD in relatively small samples. This work aims to fill this gap by making the following three contributions: (i) we use a real-world dataset collected from ∼1,400 homeless youth (across six American states) to build accurate Machine Learning (ML) models for predicting the susceptibility of homeless youth to SUD; (ii) we find a representative set of factors associated with SUD among this population by analyzing feature importance values associated with our ML models; and (iii) we investigate the effect of geographical heterogeneity on the factors associated with SUD. Our results show that our system using adaptively boosted decision trees achieves the best predictive accuracy out of several algorithms on the SUD prediction task, achieving an Area Under the ROC Curve of 0.85. Further, among other things, we also find that both Post-Traumatic Stress Disorder (PTSD) and depression are very strongly associated with SUD among homeless youth because of their propensity to self-medicate to alleviate stress. This work is done in collaboration with social work scientists, who are currently evaluating the results for potential future deployment.