Actions

Future Challenges in Theoretical Biology: Difference between revisions

From Santa Fe Institute Events Wiki

No edit summary
No edit summary
Line 3: Line 3:
session on Monday morning after breakfast.
session on Monday morning after breakfast.


Below is a list of discussion topics that have been contributed by the  
Below is a list of discussion topics that have been contributed by the invited participants. The material below is structured
invited participants.  
vaguely into a set of ''very general'' topis and a few ``relatively concrete'' questions addressing specific topis/models/phenomena.


''Theoretical Biology - Not even wrong?''
''Theoretical Biology - Not even wrong?''


A collection of questions addressing the relationship of Theoretical Biology  
A collection of questions addressing the relationship of Theoretical Biology and (minimal) mathematical models. The PDF is here:
and (minimal) mathematical models. The PDF is here:
http://www.santafe.edu/events/workshops/index.php/Image:Not_even_wrong.pdf
http://www.santafe.edu/events/workshops/index.php/Image:Not_even_wrong.pdf


''A definition of scientific theory''
* One point of departure could be Ernst Mach's 1883 statement that, "Science itself should therefore be regarded as a minimal
problem consisting of the completest possible presentation of the facts with the least expenditure of thought".  This found its equivalent in Walter Elsasser's statement: ''we reject the sole function of theory as 'explanation';  we consider it the main function of theory to furnish us with a concise, efficient and economical description of the phenomena of nature.''  Theory is to be distinguished from modeling, which consists of taking data points, and finding a pattern for their description.  An exemplary representation of biological theory is constituted by Elsasser's three basic principles: ordered heterogeneity, creative selection and holistic memory, or memory without storage. These principles conform to his definition of science as that activity wherein bodies of information are represented by pure abstractions. His authority in defining theory rests on his
three fundamental contributions in physical science:  the interpretation of the Davisson-Kunsman experiment as an effect of electron waves, the shell structure of the atomic nucleus, and the dynamo theory of terrestrial magnetism.
There will likely be different views on the nature of theory, and perhaps we can forge a pluralistic definition to accommodate them.  We could then move on to methodological approaches that are important for future advances in theoretical biology. Elsasser's basic assumption, after half a century of critical thought about a holistic approach to biology is that, ''an organism is a source'' of causal chains which cannot be traced beyond a terminal point because they are lost in the unfathomable complexity of the organism."  Methodologically, he felt that biological theory must work hand in glove with experimentation and that ''verification of the holistic properties of organisms requires experimentation of a different type from that conventionally used by physicists and chemists''. He was however unaware that the era of monolayer cell culture had introduced a methodology that would confirm his first principle in which ''there can be regularity in the large where there is heterogeneity in the small''.  This principle was related to the definition of life as ''the repetitive production of ordered heterogeneity''
proposed by the pioneer of molecular genetics Rollin Hotchkiss, and to "the invariance of systems over the variant fluxes of their constituents" as stated by the prominent neuroembryologist Paul Weiss.
Some experiments from cell culture could be described that verify the principle of ordered heterogeneity.  They are basically operational in character, using only observable quantities obtained from the interactions of and between normal and neoplastic cells, rather than reducing those interactions to their subcellular or molecular constituents, as is the current convention.  Only a beginning has been made in the operational mode in addressing the principle, with much more that can be done.  This brings out what I consider the number one challenge to theoretical biology, how best to approach the problem of organismic complexity:  by reductionist, systems analysis or by a more holistic operational analysis.  Currently, the former is the predominant focus of activity, but the latter needs the most attention.  For maximum insight, however, the two approaches should be considered complementary rather than competitive.
* Theory in physics, one the other hand, is the 'mathematical modelling' of the phenomena --- Can we really distinguish Theory from Modelling 'in practice', or do we rather need to emphasize concepts 'in' our modelling? If so, how?
''What kind of Mathematics Do we Need?''
* It seems obvious that the usual toolkit of mathematical techniques taught in the Physical Sciences (e.g.
Differential Equations) does not fit all to well with the "structure" of biological questions. So what would fit better?
* What kind of Simulations Do we Need?
  This is in essence the complement to the topic above.
* What is the relationship of biology to physics --complex systems and emergence?
''Theory integration''
How can we do that across domains with different traditions, experimental systems and concepts?
How does tht relate to the questions about tools above?
''The Data Crisis in Biology''
* What have we learned from the immense stream of molecular data? How do we develop new concepts/insights from these data?
  (some more from Manfred L?)


''Regulation and Evolution of Regulation''
''Regulation and Evolution of Regulation''


* A wide variety of large scale experiments are currently accumulating  
* A wide variety of large scale experiments are currently accumulating data which are supposed to enhance our understanding of gene regulation and the organization of regulatory regions. Nevertheless, the conclusions drawn from these data are mostly vague. Computational models for the prediction of regulatory regions and their regulatory function are usually based on assumptions drawn from a small number of well studied examples for which large scale studies commonly fail to prove generality. Regulatory networks, often uncoupled of time and space, are built but the nature of the links is largely unspecified. Synergistic effects are often suggested where actual context/interaction information is missing.
data which are supposed to enhance our understanding of gene
regulation and the organization of regulatory regions. Nevertheless,
the conclusions drawn from these data are mostly vague. Computational
models for the prediction of regulatory regions and their regulatory
function are usually based on assumptions drawn from a small number of
well studied examples for which large scale studies commonly fail to
prove generality. Regulatory networks, often uncoupled of time and
space, are built but the nature of the links is largely
unspecified. Synergistic effects are often suggested where actual
context/interaction information is missing.


Are we still missing important facts or concepts of regulation to
Are we still missing important facts or concepts of regulation to fully describe a genes expression pattern?
fully describe a genes expression pattern?


We already know about many levels or regulation. How complicated can
We already know about many levels or regulation. How complicated can it get? (regulatory catastrophe)
it get? (regulatory catastrophe)


Which information do we really need to understand the mechanisms of
Which information do we really need to understand the mechanisms of gene regulation?
gene regulation?


* Regulation is currently viewed as a process that follows a
* Regulation is currently viewed as a process that follows a genetically determined regulatory program which, once initialized with the same input, originates in the same output, e.g. phenotype of identical twins. This suggest a tight control coded somehow in the genome. Nevertheless, expression profiles, and temporal and spatial expression patters seem to vary considerably among individuals. Furthermore, major changes in the phylogeny are now frequently attributed to changes in regulatory regions and therefore gene expression patters and profiles.
genetically determined regulatory program which, once initialized with
the same input, originates in the same output, e.g. phenotype of
identical twins. This suggest a tight control coded somehow in the
genome. Nevertheless, expression profiles, and temporal and spatial
expression patters seem to vary considerably among
individuals. Furthermore, major changes in the phylogeny are now
frequently attributed to changes in regulatory regions and therefor
gene expression patters and profiles.


How can tight control allow so much flexibility?
How can tight control allow so much flexibility?
Line 50: Line 60:
How do innovations arise from changes in regulatory regions?
How do innovations arise from changes in regulatory regions?


* Most of the assumptions about the organization of regulatory
* Most of the assumptions about the organization of regulatory elements seem reasonable from an engineering point of view
elements seem reasonable from an engineering point of view
(only). Among those are, e.g., absence of "unnecessary" elements, modularity of regulatory regions, usage of comparable regulatory elements by co-expressed genes, organization of joined genes in gene batteries etc. Did evolution really structure regulatory regions in this way?
(only). Among those are, e.g., absence of "unnecessary" elements,
modularity of regulatory regions, usage of comparable regulatory
elements by co-expressed genes, organization of joined genes in gene
batteries etc. Did evolution really structure regulatory regions in
this way?


How do gene expression patters/profiles evolve?
How do gene expression patters/profiles evolve?
Line 62: Line 67:
How does gene regulation evolve?
How does gene regulation evolve?


What can we learn about gene regulation from the evolutionary history
What can we learn about gene regulation from the evolutionary history of the gene and it's regulatory region?
of the gene and it's regulatory region?


How would an appropriate evolutionary model for regulatory regions
How would an appropriate evolutionary model for regulatory regions look like?
look like?


How can "old" genes follow new regulatory trends (emergence of CpG
How can "old" genes follow new regulatory trends (emergence of CpG island promotesr, miRNAs, binding site turnover, etc.)?
island promotesr, miRNAs, binding site turnover, etc.)?


How does a gene acquire a "second function" in terms of a new and
How does a gene acquire a "second function" in terms of a new and additional spatial and temporal expression pattern?
additional spatial and temporal expression pattern?


Do we expect to see co-evolution of gene function and gene regulation?
Do we expect to see co-evolution of gene function and gene regulation?




Line 82: Line 81:




Recent high-throughput transcriptomic projects, in particular the ENCODE
Recent high-throughput transcriptomic projects, in particular the ENCODE Pilot Project, have demonstrated beyond reasonable doubt that the transcriptional output of a mammalian genome is more complex than previously thought. Classical protein-coding genes cover only about 2% of the non-repetitive genome, while more than 80% of the genome is transcribed already in limited number of cell-types and circumstances. The transcriptome appears to have a "hierarchical structure": regions containing protein coding genes also produce alternative transcripts and anti-sense transcripts, most of which are devoid of protein coding capacity. The majority of these transcripts is processed in different mature products, which typically belong to many different classes of RNAs, from protein coding mRNAs to mRNA-like ncRNAs, microRNAs and snoRNAs. It seems impossible to assign a single coherent function the plethora of products that are eventually processed from a single DNA locus.  As a consequence, each locus of the genomic DNA is typically associated with a multitude of functions.  In addition, the discovery of distant enhancer sites that act across 100kb or even several MB of DNA show that functional units need not be local DNA regions.
Pilot Project, have demonstrated beyond reasonable doubt that the
transcriptional output of a mammalian genome is more complex than
previously thought. Classical protein-coding genes cover only about 2% of
the non-repetitive genome, while more than 80% of the genome is transcribed
already in limited number of cell-types and circumstances. The
transcriptome appears to have a "hierarchical structure": regions
containing protein coding genes also produce alternative transcripts and
anti-sense transcripts, most of which are devoid of protein coding
capacity. The majority of these transcripts is processed in different
mature products, which typically belong to many different classes of RNAs,
from protein coding mRNAs to mRNA-like ncRNAs, microRNAs and snoRNAs. It
seems impossible to assign a single coherent function the plethora of
products that are eventually processed from a single DNA locus.  As a
consequence, each locus of the genomic DNA is typically associated with a
multitude of functions.  In addition, the discovery of distant enhancer sites
that act across 100kb or even several MB of DNA show that functional units
need not be local DNA regions.


The "classical" notion of the gene as unit of inheritance has been
The "classical" notion of the gene as unit of inheritance has been identified more less with ''genomic locus'', i.e., an interval of RNA sequence. On the other hand, genetics thinks of a "gene" as transcript (usual an protein-coding one) together with its regulatory elements at the DNA, i.e., as a functional unit. At the end of the last millenium it seemed that the two notions more or less match - both refer to genomic sequence intervals. Now we know that functional units are non-local, interleaved, and overlapping. The recent results therefore call for a rethinking of the very notion of the ''gene''.  
identified more less with "genomic locus", i.e., an interval of RNA
sequence. On the other hand, genetics thinks of a "gene" as
transcript (usual an protein-coding one) together with its regulatory
elements at the DNA, i.e., as a functional unit. At the end of the last
millenium it seemed that the two notions more or less match - both refer
to genomic sequence intervals. Now we know that functional units are
non-local, interleaved, and overlapping.


The recent results therefore call for a rethinking of the very notion of
The real question is however, whether the changes in the gene concept have any ''real'' impact on models and/or experiments.
the "gene". The real question is however, whether the changes in the gene  
concept have any ''real'' impact on models and/or experiments.


''Biological Function''
''Biological Function''


What exactly do we mean by a biological function (as opposed to process)?
What exactly do we mean by a 'biological function' (as opposed to process)? Usually or a least often, we us a 'teleological' language (the purpose of the heart is pump blood), we keep reassuring ourselves that of course we don't mean it that way. So: ist there as meaningful way of speaking about function in an evolutionary context - there should be it think, since otherwise it
Usually or a least often, we us a 'teleological' language (the purpose of the
seems hard to speak about phenotypes and functional modules. AND - is this just a philosphical question (or one of language purism), or would a less 'teleological' notion of biological function also have a practical impact e.g. on designing experiments?
heart is pump blood), we keep reassuring ourselves that of course we don't
mean it that way. So: ist there as meaningful way of speaking about function
in an evolutionary context - there should be it think, since otherwise it
seems hard to speak about phenotypes and functional modules. AND - is this
just a philosphical question (or one of language purism), or would a less
'teleological' notion of biological function also have a practical impact
eg on designing experiments?

Revision as of 18:09, 6 August 2007

This will be a rather informal Work Group in the literal sense of the word. Thus there is no detailed schedule yet, this will be organized in the first session on Monday morning after breakfast.

Below is a list of discussion topics that have been contributed by the invited participants. The material below is structured vaguely into a set of very general topis and a few ``relatively concrete questions addressing specific topis/models/phenomena.

Theoretical Biology - Not even wrong?

A collection of questions addressing the relationship of Theoretical Biology and (minimal) mathematical models. The PDF is here: http://www.santafe.edu/events/workshops/index.php/Image:Not_even_wrong.pdf

A definition of scientific theory

  • One point of departure could be Ernst Mach's 1883 statement that, "Science itself should therefore be regarded as a minimal

problem consisting of the completest possible presentation of the facts with the least expenditure of thought". This found its equivalent in Walter Elsasser's statement: we reject the sole function of theory as 'explanation'; we consider it the main function of theory to furnish us with a concise, efficient and economical description of the phenomena of nature. Theory is to be distinguished from modeling, which consists of taking data points, and finding a pattern for their description. An exemplary representation of biological theory is constituted by Elsasser's three basic principles: ordered heterogeneity, creative selection and holistic memory, or memory without storage. These principles conform to his definition of science as that activity wherein bodies of information are represented by pure abstractions. His authority in defining theory rests on his three fundamental contributions in physical science: the interpretation of the Davisson-Kunsman experiment as an effect of electron waves, the shell structure of the atomic nucleus, and the dynamo theory of terrestrial magnetism.

There will likely be different views on the nature of theory, and perhaps we can forge a pluralistic definition to accommodate them. We could then move on to methodological approaches that are important for future advances in theoretical biology. Elsasser's basic assumption, after half a century of critical thought about a holistic approach to biology is that, an organism is a source of causal chains which cannot be traced beyond a terminal point because they are lost in the unfathomable complexity of the organism." Methodologically, he felt that biological theory must work hand in glove with experimentation and that verification of the holistic properties of organisms requires experimentation of a different type from that conventionally used by physicists and chemists. He was however unaware that the era of monolayer cell culture had introduced a methodology that would confirm his first principle in which there can be regularity in the large where there is heterogeneity in the small. This principle was related to the definition of life as the repetitive production of ordered heterogeneity proposed by the pioneer of molecular genetics Rollin Hotchkiss, and to "the invariance of systems over the variant fluxes of their constituents" as stated by the prominent neuroembryologist Paul Weiss.

Some experiments from cell culture could be described that verify the principle of ordered heterogeneity. They are basically operational in character, using only observable quantities obtained from the interactions of and between normal and neoplastic cells, rather than reducing those interactions to their subcellular or molecular constituents, as is the current convention. Only a beginning has been made in the operational mode in addressing the principle, with much more that can be done. This brings out what I consider the number one challenge to theoretical biology, how best to approach the problem of organismic complexity: by reductionist, systems analysis or by a more holistic operational analysis. Currently, the former is the predominant focus of activity, but the latter needs the most attention. For maximum insight, however, the two approaches should be considered complementary rather than competitive.

  • Theory in physics, one the other hand, is the 'mathematical modelling' of the phenomena --- Can we really distinguish Theory from Modelling 'in practice', or do we rather need to emphasize concepts 'in' our modelling? If so, how?

What kind of Mathematics Do we Need?

  • It seems obvious that the usual toolkit of mathematical techniques taught in the Physical Sciences (e.g.

Differential Equations) does not fit all to well with the "structure" of biological questions. So what would fit better?

  • What kind of Simulations Do we Need?
 This is in essence the complement to the topic above.
  • What is the relationship of biology to physics --complex systems and emergence?

Theory integration How can we do that across domains with different traditions, experimental systems and concepts? How does tht relate to the questions about tools above?

The Data Crisis in Biology

  • What have we learned from the immense stream of molecular data? How do we develop new concepts/insights from these data?
 (some more from Manfred L?)

Regulation and Evolution of Regulation

  • A wide variety of large scale experiments are currently accumulating data which are supposed to enhance our understanding of gene regulation and the organization of regulatory regions. Nevertheless, the conclusions drawn from these data are mostly vague. Computational models for the prediction of regulatory regions and their regulatory function are usually based on assumptions drawn from a small number of well studied examples for which large scale studies commonly fail to prove generality. Regulatory networks, often uncoupled of time and space, are built but the nature of the links is largely unspecified. Synergistic effects are often suggested where actual context/interaction information is missing.

Are we still missing important facts or concepts of regulation to fully describe a genes expression pattern?

We already know about many levels or regulation. How complicated can it get? (regulatory catastrophe)

Which information do we really need to understand the mechanisms of gene regulation?

  • Regulation is currently viewed as a process that follows a genetically determined regulatory program which, once initialized with the same input, originates in the same output, e.g. phenotype of identical twins. This suggest a tight control coded somehow in the genome. Nevertheless, expression profiles, and temporal and spatial expression patters seem to vary considerably among individuals. Furthermore, major changes in the phylogeny are now frequently attributed to changes in regulatory regions and therefore gene expression patters and profiles.

How can tight control allow so much flexibility?

How do innovations arise from changes in regulatory regions?

  • Most of the assumptions about the organization of regulatory elements seem reasonable from an engineering point of view

(only). Among those are, e.g., absence of "unnecessary" elements, modularity of regulatory regions, usage of comparable regulatory elements by co-expressed genes, organization of joined genes in gene batteries etc. Did evolution really structure regulatory regions in this way?

How do gene expression patters/profiles evolve?

How does gene regulation evolve?

What can we learn about gene regulation from the evolutionary history of the gene and it's regulatory region?

How would an appropriate evolutionary model for regulatory regions look like?

How can "old" genes follow new regulatory trends (emergence of CpG island promotesr, miRNAs, binding site turnover, etc.)?

How does a gene acquire a "second function" in terms of a new and additional spatial and temporal expression pattern?

Do we expect to see co-evolution of gene function and gene regulation?


The Gene Concept


Recent high-throughput transcriptomic projects, in particular the ENCODE Pilot Project, have demonstrated beyond reasonable doubt that the transcriptional output of a mammalian genome is more complex than previously thought. Classical protein-coding genes cover only about 2% of the non-repetitive genome, while more than 80% of the genome is transcribed already in limited number of cell-types and circumstances. The transcriptome appears to have a "hierarchical structure": regions containing protein coding genes also produce alternative transcripts and anti-sense transcripts, most of which are devoid of protein coding capacity. The majority of these transcripts is processed in different mature products, which typically belong to many different classes of RNAs, from protein coding mRNAs to mRNA-like ncRNAs, microRNAs and snoRNAs. It seems impossible to assign a single coherent function the plethora of products that are eventually processed from a single DNA locus. As a consequence, each locus of the genomic DNA is typically associated with a multitude of functions. In addition, the discovery of distant enhancer sites that act across 100kb or even several MB of DNA show that functional units need not be local DNA regions.

The "classical" notion of the gene as unit of inheritance has been identified more less with genomic locus, i.e., an interval of RNA sequence. On the other hand, genetics thinks of a "gene" as transcript (usual an protein-coding one) together with its regulatory elements at the DNA, i.e., as a functional unit. At the end of the last millenium it seemed that the two notions more or less match - both refer to genomic sequence intervals. Now we know that functional units are non-local, interleaved, and overlapping. The recent results therefore call for a rethinking of the very notion of the gene.

The real question is however, whether the changes in the gene concept have any real impact on models and/or experiments.

Biological Function

What exactly do we mean by a 'biological function' (as opposed to process)? Usually or a least often, we us a 'teleological' language (the purpose of the heart is pump blood), we keep reassuring ourselves that of course we don't mean it that way. So: ist there as meaningful way of speaking about function in an evolutionary context - there should be it think, since otherwise it seems hard to speak about phenotypes and functional modules. AND - is this just a philosphical question (or one of language purism), or would a less 'teleological' notion of biological function also have a practical impact e.g. on designing experiments?