Actions

Randomness, Structure and Causality - Agenda: Difference between revisions

From Santa Fe Institute Events Wiki

No edit summary
No edit summary
Line 15: Line 15:
natural language which is grounded in information theory and inspired
natural language which is grounded in information theory and inspired
by recent research in excess entropy. Namely, we will demonstrate a
by recent research in excess entropy. Namely, we will demonstrate a
theorem with the following informal statement: If a text of length $n$
theorem with the following informal statement: If a text of length <math>n</math>
describes $n^\beta$ independent facts in a repetitive way then the
describes <math>n^\beta</math> independent facts in a repetitive way then the
text contains at least $n^\beta/\log n$ different words.  In the
text contains at least <math>n^\beta/\log n</math> different words.  In the
formal statement, two modeling postulates are adopted. Firstly, the
formal statement, two modeling postulates are adopted. Firstly, the
words are understood as nonterminal symbols of the shortest
words are understood as nonterminal symbols of the shortest
Line 25: Line 25:
way. Besides the theorem, we will exhibit a few stochastic processes
way. Besides the theorem, we will exhibit a few stochastic processes
to which this and similar statements can be related.
to which this and similar statements can be related.
<br>
<br>
<br>


<p>
Links: [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]
 
[[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]

Revision as of 18:41, 16 December 2010

Workshop Navigation


Abstracts


The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences

We will present a new explanation for the distribution of words in natural language which is grounded in information theory and inspired by recent research in excess entropy. Namely, we will demonstrate a theorem with the following informal statement: If a text of length describes independent facts in a repetitive way then the text contains at least different words.  In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way. Besides the theorem, we will exhibit a few stochastic processes to which this and similar statements can be related.

Links: [[1]] and [[2]]