Inferring statistically significant hidden Markov models

Lu Yu, Jason M. Schwier, Ryan M. Craven, Richard R. Brooks, Christopher Griffin

Research output: Contribution to journalArticlepeer-review

21 Scopus citations


Hidden Markov models (HMMs) are used to analyze real-world problems. We consider an approach that constructs minimum entropy HMMs directly from a sequence of observations. If an insufficient amount of observation data is used to generate the HMM, the model will not represent the underlying process. Current methods assume that observations completely represent the underlying process. It is often the case that the training data size is not large enough to adequately capture all statistical dependencies in the system. It is, therefore, important to know the statistical significance level for that the constructed model representing the underlying process, not only the training set. In this paper, we present a method to determine if the observation data and constructed model fully express the underlying process with a given level of statistical significance. We use the statistics of the process to calculate an upper bound on the number of samples required to guarantee that the model has a given level significance. We provide theoretical and experimental results that confirm the utility of this approach. The experiment is conducted on a real private Tor network.

Original languageEnglish (US)
Article number6193099
Pages (from-to)1548-1558
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number7
StatePublished - 2013

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics


Dive into the research topics of 'Inferring statistically significant hidden Markov models'. Together they form a unique fingerprint.

Cite this