Statistical Inference for Networks with Complex Topological Structures

Project: Research project

Project Details

Description

Understanding the structure of networks is critical to understanding and predicting application phenomena involving networks, such as the resilience of insurgent and terrorist networks and the impact of disease-transmission networks on epidemics. In the past decade, tremendous progress has been made on statistical inference for networks with simple topological features, such as the number of connections in networks and the propensities of network members to form connections. However, statistical inference for networks with complex topological features, such as various forms of network closure that are believed to be crucial, remains underdeveloped, because complex topological features raise challenging conceptual, theoretical, and computational issues. This project addresses fundamental questions underpinning statistical inference for such networks. The statistical models and methods that are developed will be made publicly available in the form of R packages.

This research project is concerned with the foundations of statistical inference for networks with complex topological features. It starts with a question that is at the heart of statistical inference: What does it mean to observe more data from the same source? The conventional answer is that more data are observed by observing a larger and larger network. Some interesting insights have emerged by studying growing networks. Among them is that models with complex topological features may depend on the size of the network and that using the same model, with the same parameters, for small and large networks may not be meaningful. Despite such insights, the question of how to model a wide range of complex topological features and how to conduct sound statistical inference remains unanswered. This project attempts to provide answers and is based on the following idea: If models with complex topological features depend on the size of the network, then statistical inference should be based on networks of the same order of magnitude. In other words, statistical inference should be based on replication. At least two forms of replication are possible: replication based on a single network consisting of many subnetworks or many networks of the same order of magnitude, i.e., the size of the largest network is a constant multiple of the size of the smallest network. Such network data, called multilevel network data, has important applications; examples include networks of armed forces consisting of units and subunits and school networks consisting of schools and school classes. Multilevel network data offer outstanding opportunities for modeling, methods, and theory. This project will take advantage of these opportunities to elaborate novel multilevel network models capturing complex topological features in networks, overlapping subsets of nodes, and temporal networks. Statistical theory will take advantage of the replicative nature of multilevel network data, which provides a path to the first general statistical theory for networks with complex topological features. Statistical computing will take advantage of the conditional independence structure of multilevel network models and will develop large-scale parallel computing procedures.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date8/1/187/31/21

Funding

  • National Science Foundation: $150,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.