## Abstract

Hidden Markov models (HMMs) are used to analyze real-world problems. We consider an approach that constructs minimum entropy HMMs directly from a sequence of observations. If an insufficient amount of observation data is used to generate the HMM, the model will not represent the underlying process. Current methods assume that observations completely represent the underlying process. It is often the case that the training data size is not large enough to adequately capture all statistical dependencies in the system. It is, therefore, important to know the statistical significance level for that the constructed model representing the underlying process, not only the training set. In this paper, we present a method to determine if the observation data and constructed model fully express the underlying process with a given level of statistical significance. We use the statistics of the process to calculate an upper bound on the number of samples required to guarantee that the model has a given level significance. We provide theoretical and experimental results that confirm the utility of this approach. The experiment is conducted on a real private Tor network.

Original language | English (US) |
---|---|

Article number | 6193099 |

Pages (from-to) | 1548-1558 |

Number of pages | 11 |

Journal | IEEE Transactions on Knowledge and Data Engineering |

Volume | 25 |

Issue number | 7 |

DOIs | |

State | Published - Jun 3 2013 |

## All Science Journal Classification (ASJC) codes

- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics