 Randomness, Structure and Causality 

 

   Agenda PDF 
 The Vocabulary of GrammarBased Codes and the Logical Consistency of Texts
 
  
 Debowski, Lukasz (ldebowsk@ipipan.waw.pl
 
 Polish Academy of Sciences
 
 We will present a new explanation for the distribution of words in
 
 natural language which is grounded in information theory and inspired
 
 by recent research in excess entropy. Namely, we will demonstrate a
 
 theorem with the following informal statement: If a text of length $n$
 
 describes $n^\beta$ independent facts in a repetitive way then the
 
 text contains at least $n^\beta/\log n$ different words. In the
 
 formal statement, two modeling postulates are adopted. Firstly, the
 
 words are understood as nonterminal symbols of the shortest
 
 grammarbased encoding of the text. Secondly, the text is assumed to
 
 be emitted by a finiteenergy strongly nonergodic source whereas the
 
 facts are binary IID variables predictable in a shiftinvariant
 
 way. Besides the theorem, we will exhibit a few stochastic processes
 
 to which this and similar statements can be related.
 
  
 [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]]  