We study the following structure learning problem for H-colorings. For a fixed (and known) constraint graph H with q colors, given access to uniformly random H-colorings of an unknown graph G=(V,E), how many samples are required to learn the edges of G? We give a characterization of the constraint graphs H for which the problem is identifiable for every G and show that there are identifiable constraint graphs for which one cannot hope to learn every graph G efficiently. We provide refined results for the case of proper vertex q-colorings of graphs of maximum degree d. In particular, we prove that in the tree uniqueness region (i.e., when q≤ d), the problem is identifiable and we can learn G in poly(d,q)× O(n2 log n) time. In the tree non-uniqueness region (i.e., when q≤ d), we show that the problem is not identifiable and thus G cannot be learned. Moreover, when q ≤ d - d + Θ(1), we establish that even learning an equivalent graph (any graph with the same set of H-colorings) is computationally hard - sample complexity is exponential in n in the worst case. We further explore the connection between the efficiency/hardness of the structure learning problem and the uniqueness/non-uniqueness phase transition for general H-colorings and prove that under a well-known uniqueness condition in statistical physics, we can learn G in poly(d,q)× O(n2 log n) time.
All Science Journal Classification (ASJC) codes
- Mathematics (miscellaneous)