Actions

Difference between revisions of "Randomness, Structure and Causality - Agenda"

From Santa Fe Institute Events Wiki

Line 1: Line 1:
 
{{Randomness, Structure and Causality}}
 
{{Randomness, Structure and Causality}}
  
We're working on it!
+
 
 +
== Abstracts ==
 +
 
 +
<br>
 +
 
 +
The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts<br>
 +
 
 +
Debowski, Lukasz (ldebowsk@ipipan.waw.pl<br>
 +
Polish Academy of Sciences<br>
 +
<br>
 +
<p>
 +
We will present a new explanation for the distribution of words in
 +
natural language which is grounded in information theory and inspired
 +
by recent research in excess entropy. Namely, we will demonstrate a
 +
theorem with the following informal statement: If a text of length $n$
 +
describes $n^\beta$ independent facts in a repetitive way then the
 +
text contains at least $n^\beta/\log n$ different words.  In the
 +
formal statement, two modeling postulates are adopted. Firstly, the
 +
words are understood as nonterminal symbols of the shortest
 +
grammar-based encoding of the text. Secondly, the text is assumed to
 +
be emitted by a finite-energy strongly nonergodic source whereas the
 +
facts are binary IID variables predictable in a shift-invariant
 +
way. Besides the theorem, we will exhibit a few stochastic processes
 +
to which this and similar statements can be related.
 +
 
 +
<p>
 +
 
 +
[[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]

Revision as of 18:39, 16 December 2010

Workshop Navigation


Abstracts


The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Debowski, Lukasz (ldebowsk@ipipan.waw.pl
Polish Academy of Sciences

We will present a new explanation for the distribution of words in natural language which is grounded in information theory and inspired by recent research in excess entropy. Namely, we will demonstrate a theorem with the following informal statement: If a text of length $n$ describes $n^\beta$ independent facts in a repetitive way then the text contains at least $n^\beta/\log n$ different words.  In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way. Besides the theorem, we will exhibit a few stochastic processes to which this and similar statements can be related.

[[1]] and [[2]]