Actions

Randomness, Structure and Causality - Agenda

From Santa Fe Institute Events Wiki

Revision as of 18:41, 16 December 2010 by Chaos (talk | contribs)
Workshop Navigation


Abstracts


The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts

Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences

We will present a new explanation for the distribution of words in natural language which is grounded in information theory and inspired by recent research in excess entropy. Namely, we will demonstrate a theorem with the following informal statement: If a text of length Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} describes Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n^\beta} independent facts in a repetitive way then the text contains at least Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n^\beta/\log n} different words.  In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way. Besides the theorem, we will exhibit a few stochastic processes to which this and similar statements can be related.

Links: [[1]] and [[2]]