By Topic

A convergent gambling estimate of the entropy of English

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)

In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject's capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of approximately1.3bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject's capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper - bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process. Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of approximately1.3bits/symbol, which agrees well with Shannon's estimate. In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text. IfS_{n}denotes the subject's capital afternbets at27for1odds, and if it is assumed that the subject knows the underlying probability distribution for the processX, then the entropy estimate ishat{H}_{n}(X)=(1-(1/n) log_{27}S_{n}) log_{2} 27bits/symbol. If the subject does not know the true probability distribution for the stochastic process, thenhat{H}_{n}(X)is an asymptotic upper bound for the true entropy. IfXis stationary,Ehat{H}_{n}(X)rightarrowH(X), H(X)being the true entropy of the process.Moreover, if X is ergodic, then by the Shannon-McMillan-Breiman theoremhat{H}_{n}(X)rightarrowH(X)with probability one. Preliminary indications are that English text has an entropy of a

Published in:

Information Theory, IEEE Transactions on  (Volume:24 ,  Issue: 4 )