Randomness, Structure and Causality - Agenda
From Santa Fe Institute Events Wiki
Workshop Navigation |
Abstracts
The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences
We will present a new explanation for the distribution of words in
natural language which is grounded in information theory and inspired
by recent research in excess entropy. Namely, we will demonstrate a
theorem with the following informal statement: If a text of length $n$
describes $n^\beta$ independent facts in a repetitive way then the
text contains at least $n^\beta/\log n$ different words. In the
formal statement, two modeling postulates are adopted. Firstly, the
words are understood as nonterminal symbols of the shortest
grammar-based encoding of the text. Secondly, the text is assumed to
be emitted by a finite-energy strongly nonergodic source whereas the
facts are binary IID variables predictable in a shift-invariant
way. Besides the theorem, we will exhibit a few stochastic processes
to which this and similar statements can be related.