Actions

Difference between revisions of "Randomness, Structure and Causality - Agenda"

From Santa Fe Institute Events Wiki

Line 15: Line 15:
 
natural language which is grounded in information theory and inspired
 
natural language which is grounded in information theory and inspired
 
by recent research in excess entropy. Namely, we will demonstrate a
 
by recent research in excess entropy. Namely, we will demonstrate a
theorem with the following informal statement: If a text of length $n$
+
theorem with the following informal statement: If a text of length <math>n</math>
describes $n^\beta$ independent facts in a repetitive way then the
+
describes <math>n^\beta</math> independent facts in a repetitive way then the
text contains at least $n^\beta/\log n$ different words.  In the
+
text contains at least <math>n^\beta/\log n</math> different words.  In the
 
formal statement, two modeling postulates are adopted. Firstly, the
 
formal statement, two modeling postulates are adopted. Firstly, the
 
words are understood as nonterminal symbols of the shortest
 
words are understood as nonterminal symbols of the shortest
Line 25: Line 25:
 
way. Besides the theorem, we will exhibit a few stochastic processes
 
way. Besides the theorem, we will exhibit a few stochastic processes
 
to which this and similar statements can be related.
 
to which this and similar statements can be related.
 +
<br>
 
<br>
 
<br>
  
<p>
+
Links: [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]
 
 
[[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]
 

Revision as of 18:41, 16 December 2010

Workshop Navigation


Abstracts


The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences

We will present a new explanation for the distribution of words in natural language which is grounded in information theory and inspired by recent research in excess entropy. Namely, we will demonstrate a theorem with the following informal statement: If a text of length describes independent facts in a repetitive way then the text contains at least different words.  In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way. Besides the theorem, we will exhibit a few stochastic processes to which this and similar statements can be related.

Links: [[1]] and [[2]]