Randomness, Structure and Causality - Agenda: Difference between revisions
From Santa Fe Institute Events Wiki
No edit summary |
No edit summary |
||
| Line 15: | Line 15: | ||
natural language which is grounded in information theory and inspired | natural language which is grounded in information theory and inspired | ||
by recent research in excess entropy. Namely, we will demonstrate a | by recent research in excess entropy. Namely, we will demonstrate a | ||
theorem with the following informal statement: If a text of length | theorem with the following informal statement: If a text of length <math>n</math> | ||
describes | describes <math>n^\beta</math> independent facts in a repetitive way then the | ||
text contains at least | text contains at least <math>n^\beta/\log n</math> different words. In the | ||
formal statement, two modeling postulates are adopted. Firstly, the | formal statement, two modeling postulates are adopted. Firstly, the | ||
words are understood as nonterminal symbols of the shortest | words are understood as nonterminal symbols of the shortest | ||
| Line 25: | Line 25: | ||
way. Besides the theorem, we will exhibit a few stochastic processes | way. Besides the theorem, we will exhibit a few stochastic processes | ||
to which this and similar statements can be related. | to which this and similar statements can be related. | ||
<br> | |||
<br> | <br> | ||
Links: [[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]] | |||
[[http://arxiv.org/abs/0810.3125]] and [[http://arxiv.org/abs/0911.5318]] | |||
Revision as of 18:41, 16 December 2010
| Workshop Navigation |
Abstracts
The Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts
Debowski, Lukasz (ldebowsk@ipipan.waw.pl)
Polish Academy of Sciences
We will present a new explanation for the distribution of words in
natural language which is grounded in information theory and inspired
by recent research in excess entropy. Namely, we will demonstrate a
theorem with the following informal statement: If a text of length Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n}
describes Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n^\beta}
independent facts in a repetitive way then the
text contains at least Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n^\beta/\log n}
different words. In the
formal statement, two modeling postulates are adopted. Firstly, the
words are understood as nonterminal symbols of the shortest
grammar-based encoding of the text. Secondly, the text is assumed to
be emitted by a finite-energy strongly nonergodic source whereas the
facts are binary IID variables predictable in a shift-invariant
way. Besides the theorem, we will exhibit a few stochastic processes
to which this and similar statements can be related.
Links: [[1]] and [[2]]
