Markov random fields on graphs for natural languages
Oapos;Sullivan, J.A.; Mark, K.; Miller, M.I.
Information Theory and Statistics, 1994. Proceedings., 1994 IEEE-IMS Workshop on
Volume , Issue , 27-29 Oct 1994 Page(s):47 -
Digital Object Identifier 10.1109/WITS.1994.513880
Summary:The use of model-based methods for data compression for English
dates back at least to Shannon's Markov chain (n-gram) models, where the
probability of the next word given all previous words equals the
probability of the next word given the previous n-1 words. A second
approach seeks to model the hierarchical nature of language via tree
graph structures arising from a context-free language (CFL). Neither the
n-gram nor the CFL models approach the data compression predicted by the
entropy of English as estimated by Shannon and Cover and King. This
paper presents two models that incorporate the benefits of both the
n-gram model and the tree-based models. In either case the neighborhood
structure on the syntactic variables is determined by the tree while the
neighborhood structure of the words is determined by the n-gram and the
parent syntactic variable (preterminal) in the tree, Having both types
of neighbors for the words should yield decreased entropy of the model
and hence fewer bits per word in data compression. To motivate
estimation of model parameters, some results in estimating parameters
for random branching processes is reviewed
View citation and abstract |