Correctly classifying different sleep stages is a critical and prerequisite step in diagnosing sleep-related issues. In practice, the clinical experts must manually review the polysomnography (PSG) recordings to classify sleep stages. Such a procedure is time-consuming, laborious, and potentially prone to human subjective errors. Deep learning-based methods have been successfully adopted for automatically classifying sleep stages in recent years. However, they cannot simply say 'I do not know' when they are uncertain in their predictions, which may easily create significant risk in clinical applications, despite their good performance. To address this issue, we propose a deep model, named TrustSleepNet, which contains evidential learning and cross-modality attention modules. Evidential learning predicts the probability density of the classes, which can learn an uncertainty score and make the prediction trustable in real-world clinical applications. Cross-modality attention adaptively fuses multimodal PSG data by enhancing the significant ones and suppressing irrelevant ones. Experimental results demonstrate that TrustSleepNet outperforms state-of-the-art benchmark methods, and the uncertainty score makes the prediction more trustable and reliable.