Abstract
We propose a new method for learning discrete space statistical classifiers. Similar to [4] and [2] we cast classification/inference within the more general framework of estimating the joint pmf for the (feature vector, class label) pair. The proposal of Cheeseman to construct the maximum entropy (ME) joint pmf consistent with general lower order probability constraints is in principle powerful, allowing for general dependencies between features. However, this approach has been severely limited by its huge learning complexity. Alternatives such as Bayesian networks (BNs) require explicit specification of conditional independencies. These may be difficult to assess given finite data. Also, BN learning approaches are typically greedy and must also address a difficult model order selection problem to avoid overfitting. Here we reconsider the ME problem, proposing an approximate method which encodes arbitrary low order constraints, but while retaining quite tractable learning. The new method approximates the joint feature pmf during learning on a subgrid of the full feature space. Results on the UC-Irvine repository reveal performance gains over [4], over the BN Kutato, and also over MLPs. Extensions to more general inference problems are indicated.
Original language | English (US) |
---|---|
Pages | 1419-1424 |
Number of pages | 6 |
State | Published - 1999 |
Event | International Joint Conference on Neural Networks (IJCNN'99) - Washington, DC, USA Duration: Jul 10 1999 → Jul 16 1999 |
Other
Other | International Joint Conference on Neural Networks (IJCNN'99) |
---|---|
City | Washington, DC, USA |
Period | 7/10/99 → 7/16/99 |
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence