Feature engineering is an important pre-processing step in applying machine learning algorithms for knowledge discovery in all fields of scientific research and business applications. In these applications it is crucial to obtain appropriate features that best describe the observed phenomena. Traditionally, researchers often manually decide features of interest based on the knowledge and experiences of domain experts, which is costly and labor-intensive. Recently, a new line of research, called representation learning, has used neural networks to automatically learn features that may be used in various scientific research projects and business applications. The PI plans new representation learning methods to capture rich, meaningful and discriminative features in heterogeneous information networks (HINs), which have been used to model heterogeneous types of network entities and their relationships in support of network data analysis and mining. The work planned in this project includes information about model design, scalability, sample data extraction, network variety and data heterogeneity issues in the implementation of the learning frameworks. This research will be integrated into graduate and undergraduate courses of data mining and machine learning, enabling students to develop analytics and big data skills.
The specific research objectives of this project are three-fold: 1) The PI aims to leverage information in HINs to learn representations of latent features for nodes and relationships specified by meta-paths in the network. Novel techniques will be developed to address the scalability issues in learning. 2) The PI seeks to address model design and learning issues arising in HINs growing with time, e.g., citation networks. New neural network architectures and new sample data extraction schemes will be devised. 3) The PI plans to integrate both content and network structures in representation learning of HINs. New neural network architectures will be devised. To evaluate research prototypes, the PI will develop a testbed consisting of new neural network frameworks for representation learning on HINs. Techniques and software will be made available as research resources to the communities of data mining and representation learning.
|Effective start/end date
|8/1/17 → 7/31/21
- National Science Foundation: $499,635.00