The phenomena of influence diffusion on social networks have received tremendous research interests in the past decade. While most prior works mainly focus on predicting the total influence spread on a single network, a marketing campaign that exploits influence diffusion often involves multiple channels with various information disseminated on different media. In this paper, we introduce a new influence estimation problem, namely Content-aware Multi-channel Influence Diffusion (CMID), and accordingly propose CMINet to predict newly influenced users, given a set of seed users with different multimedia contents. In CMINet, we first introduce DiffGNN to encode the influencing power of users (nodes) and Influence-aware Optimal Transport (IOT) to align the embeddings to address the distribution shift across different diffusion channels. Then, we transform CMID into a node classification problem and propose Social-based Multimedia Feature Extractor (SMFE) and Content-aware Multi-channel Influence Propagation (CMIP) to jointly learn the user preferences on multimedia contents and predict the susceptibility of users. Furthermore, we prove that CMINet preserves monotonicity and submodularity, thus enabling (1 - 1/e)-approximate solutions for influence maximization. Experimental results manifest that CMINet outperforms eleven baselines on three public datasets.