TY - JOUR
T1 - Fast inference for the latent space network model using a case-control approximate likelihood
AU - Raftery, Adrian E.
AU - Niu, Xiaoyue
AU - Hoff, Peter D.
AU - Yeung, Ka Yee
N1 - Funding Information:
This research was supported by NIH grants R01 GM-84163, R01 HD-54511, and R01 HD-70936. The authors are grateful to Pavel Krivitsky, the editor, the associate editor, and an anonymous referee for very helpful comments and discussions.
PY - 2012
Y1 - 2012
N2 - Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.
AB - Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.
UR - http://www.scopus.com/inward/record.url?scp=84866401750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866401750&partnerID=8YFLogxK
U2 - 10.1080/10618600.2012.679240
DO - 10.1080/10618600.2012.679240
M3 - Article
C2 - 27570438
AN - SCOPUS:84866401750
SN - 1061-8600
VL - 21
SP - 901
EP - 919
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 4
ER -