Randomness, Structure and Causality - Agenda: Difference between revisions
From Santa Fe Institute Events Wiki
No edit summary |
No edit summary |
||
Line 15: | Line 15: | ||
natural language which is grounded in information theory and inspired | natural language which is grounded in information theory and inspired | ||
by recent research in excess entropy. Namely, we will demonstrate a | by recent research in excess entropy. Namely, we will demonstrate a | ||
theorem with the following informal statement: If a text of length | theorem with the following informal statement: If a text of length <math>n</math> | ||
describes | describes <math>n^\beta</math> independent facts in a repetitive way then the | ||
text contains at least | text contains at least <math>n^\beta/\log n</math> different words. In the | ||
formal statement, two modeling postulates are adopted. Firstly, the | formal statement, two modeling postulates are adopted. Firstly, the | ||
words are understood as nonterminal symbols of the shortest | words are understood as nonterminal symbols of the shortest | ||
Line 25: | Line 25: | ||
way. Besides the theorem, we will exhibit a few stochastic processes | way. Besides the theorem, we will exhibit a few stochastic processes | ||
to which this and similar statements can be related. | to which this and similar statements can be related. | ||
<br> | |||
<br> | <br> | ||
Links: [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]] | |||
[[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]] |
Revision as of 18:41, 16 December 2010
Workshop Navigation |
Abstracts
The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences
We will present a new explanation for the distribution of words in
natural language which is grounded in information theory and inspired
by recent research in excess entropy. Namely, we will demonstrate a
theorem with the following informal statement: If a text of length
describes independent facts in a repetitive way then the
text contains at least different words. In the
formal statement, two modeling postulates are adopted. Firstly, the
words are understood as nonterminal symbols of the shortest
grammar-based encoding of the text. Secondly, the text is assumed to
be emitted by a finite-energy strongly nonergodic source whereas the
facts are binary IID variables predictable in a shift-invariant
way. Besides the theorem, we will exhibit a few stochastic processes
to which this and similar statements can be related.
Links: [[1]] and [[2]]