Future Challenges in Theoretical Biology

This will be a rather informal Work Group in the literal sense of the word. Thus there is no detailed schedule yet, this will be organized in the first session on Monday morning after breakfast.

"List of Participants"

Collins, James ASU
Flack, Jessica SFI
Forst, Christian LANL
Gorelick, Root Carleton
Griesemer, Jim UC Davis
Kauffmann, Stuart
Krakauer, David SFI
Laubichler, Manfred ASU
Prohaska, Sonja J. ASU
Rubin, Dorothy UC Berkeley
Rubin, Harry UC Berekeley

guely into a set of very general topis and a few relatively concrete questions addressing specific topis/models/phenomena.

Theoretical Biology - Not even wrong?

A collection of questions addressing the relationship of Theoretical Biology and (minimal) mathematical models. The PDF is here: http://www.santafe.edu/events/workshops/index.php/Image:Not_even_wrong.pdf

 A major point of the Work Group should be to think about how 
 Theoretical Biology can have a larger impact on [molecular] biology

A definition of scientific theory

One point of departure could be Ernst Mach's 1883 statement that, "Science itself should therefore be regarded as a minimal problem consisting of the completest possible presentation of the facts with the least expenditure of thought". This found its equivalent in Walter Elsasser's statement: we reject the sole function of theory as 'explanation'; we consider it the main function of theory to furnish us with a concise, efficient and economical description of the phenomena of nature. Theory is to be distinguished from modeling, which consists of taking data points, and finding a pattern for their description. An exemplary representation of biological theory is constituted by Elsasser's three basic principles: ordered heterogeneity, creative selection and holistic memory, or memory without storage. These principles conform to his definition of science as that activity wherein bodies of information are represented by pure abstractions. His authority in defining theory rests on his three fundamental contributions in physical science: the interpretation of the Davisson-Kunsman experiment as an effect of electron waves, the shell structure of the atomic nucleus, and the dynamo theory of terrestrial magnetism.

There will likely be different views on the nature of theory, and perhaps we can forge a pluralistic definition to accommodate them. We could then move on to methodological approaches that are important for future advances in theoretical biology. Elsasser's basic assumption, after half a century of critical thought about a holistic approach to biology is that, an organism is a source of causal chains which cannot be traced beyond a terminal point because they are lost in the unfathomable complexity of the organism." Methodologically, he felt that biological theory must work hand in glove with experimentation and that verification of the holistic properties of organisms requires experimentation of a different type from that conventionally used by physicists and chemists. He was however unaware that the era of monolayer cell culture had introduced a methodology that would confirm his first principle in which there can be regularity in the large where there is heterogeneity in the small. This principle was related to the definition of life as the repetitive production of ordered heterogeneity proposed by the pioneer of molecular genetics Rollin Hotchkiss, and to "the invariance of systems over the variant fluxes of their constituents" as stated by the prominent neuroembryologist Paul Weiss.

Some experiments from cell culture could be described that verify the principle of ordered heterogeneity. They are basically operational in character, using only observable quantities obtained from the interactions of and between normal and neoplastic cells, rather than reducing those interactions to their subcellular or molecular constituents, as is the current convention. Only a beginning has been made in the operational mode in addressing the principle, with much more that can be done. This brings out what I consider the number one challenge to theoretical biology, how best to approach the problem of organismic complexity: by reductionist, systems analysis or by a more holistic operational analysis. Currently, the former is the predominant focus of activity, but the latter needs the most attention. For maximum insight, however, the two approaches should be considered complementary rather than competitive.

Theory in physics, one the other hand, is the 'mathematical modelling' of the phenomena --- 
Can we really distinguish Theory from Modelling 'in practice', 
or do we rather need to emphasize concepts 'in' our modelling? 
If so, how?

What kind of Mathematics Do we Need?

It seems obvious that the usual toolkit of mathematical techniques taught in the Physical Sciences (e.g. differential equations) does not fit all to well with the "structure" of biological questions. So what would fit better?

What kind of Simulations Do we Need? This is in essence the complement to the topic above.

What is the relationship of biology to physics --complex systems and emergence?

Theory integration

How can we do that across domains with different traditions, experimental systems and concepts? How does tht relate to the questions about tools above?

The Data Crisis in Biology

What have we learned from the immense stream of molecular data? How do we develop new concepts/insights from these data?

 some more from Manfred L?

Regulation and Evolution of Regulation

A wide variety of large scale experiments are currently accumulating data which are supposed to enhance our understanding of gene regulation and the organization of regulatory regions. Nevertheless, the conclusions drawn from these data are mostly vague. Computational models for the prediction of regulatory regions and their regulatory function are usually based on assumptions drawn from a small number of well studied examples for which large scale studies commonly fail to prove generality. Regulatory networks, often uncoupled of time and space, are built but the nature of the links is largely unspecified. Synergistic effects are often suggested where actual context/interaction information is missing.

Are we still missing important facts or concepts of regulation to fully describe a genes expression pattern?

We already know about many levels or regulation. How complicated can it get? (regulatory catastrophe)

Which information do we really need to understand the mechanisms of gene regulation?

Regulation is currently viewed as a process that follows a genetically determined regulatory program which, once initialized with the same input, originates in the same output, e.g. phenotype of identical twins. This suggest a tight control coded somehow in the genome. Nevertheless, expression profiles, and temporal and spatial expression patters seem to vary considerably among individuals. Furthermore, major changes in the phylogeny are now frequently attributed to changes in regulatory regions and therefore gene expression patters and profiles.

How can tight control allow so much flexibility?

How do innovations arise from changes in regulatory regions?

Most of the assumptions about the organization of regulatory elements seem reasonable from an engineering point of view (only). Among those are, e.g., absence of "unnecessary" elements, modularity of regulatory regions, usage of comparable regulatory elements by co-expressed genes, organization of joined genes in gene batteries etc. Did evolution really structure regulatory regions in this way?

How do gene expression patters/profiles evolve?

How does gene regulation evolve?

What can we learn about gene regulation from the evolutionary history of the gene and it's regulatory region?

How would an appropriate evolutionary model for regulatory regions look like?

How can "old" genes follow new regulatory trends (emergence of CpG island promotesr, miRNAs, binding site turnover, etc.)?

How does a gene acquire a "second function" in terms of a new and additional spatial and temporal expression pattern?

Do we expect to see co-evolution of gene function and gene regulation?

The Gene Concept

Recent high-throughput transcriptomic projects, in particular the ENCODE Pilot Project, have demonstrated beyond reasonable doubt that the transcriptional output of a mammalian genome is more complex than previously thought. Classical protein-coding genes cover only about 2% of the non-repetitive genome, while more than 80% of the genome is transcribed already in limited number of cell-types and circumstances. The transcriptome appears to have a "hierarchical structure": regions containing protein coding genes also produce alternative transcripts and anti-sense transcripts, most of which are devoid of protein coding capacity. The majority of these transcripts is processed in different mature products, which typically belong to many different classes of RNAs, from protein coding mRNAs to mRNA-like ncRNAs, microRNAs and snoRNAs. It seems impossible to assign a single coherent function the plethora of products that are eventually processed from a single DNA locus. As a consequence, each locus of the genomic DNA is typically associated with a multitude of functions. In addition, the discovery of distant enhancer sites that act across 100kb or even several MB of DNA show that functional units need not be local DNA regions. The "classical" notion of the gene as unit of inheritance has been identified more less with genomic locus, i.e., an interval of RNA sequence. On the other hand, genetics thinks of a "gene" as transcript (usual an protein-coding one) together with its regulatory elements at the DNA, i.e., as a functional unit. At the end of the last millenium it seemed that the two notions more or less match - both refer to genomic sequence intervals. Now we know that functional units are non-local, interleaved, and overlapping. The recent results therefore call for a rethinking of the very notion of the gene.

The real question is however, whether the changes in the gene concept have any real impact on models and/or experiments.

Biological Function

What exactly do we mean by a 'biological function' (as opposed to process)? Usually or a least often, we us a 'teleological' language (the purpose of the heart is pump blood), we keep reassuring ourselves that of course we don't mean it that way. So: ist there as meaningful way of speaking about function in an evolutionary context - there should be it think, since otherwise it seems hard to speak about phenotypes and functional modules. AND - is this just a philosphical question (or one of language purism), or would a less 'teleological' notion of biological function also have a practical impact e.g. on designing experiments?

Concurrent Processes

Life cycles are coherent collections ofconcurrent processes, not merely sequences of (simple) events. Many concurrent events at the molecular level are required for cells to live, many concurrent cellular events are required for tissues and organs to function, many concurrent organ system events are required for coordinated organism function. Can formal models and methods of concurrent processing in computer science be applied to understand biological processes better than models of sequential processes borrowed from physics (PDEs) for the transformation of quantities? Are there more and less instructive approaches available for this kind of problem: actor model, Petri nets, process calculi?

Units as invariants to biological processes

Although evolutionary biology and philosophies of its units focus on change and transformation of biological entities, evolutionary theory requires unchanging units as a conceptual basis for the varying populations that undergo evolutionary (and other sorts of biological) change. The invariance of these units has been conceived in a variety of ways, but primarily as units of structure (e.g. unchanging gene sequences or phenotypic states) or as units of function (e.g. unchanging status as replicators or interactors in the determination of fitness value). But life is a process, bound to matter and since it is a dissipative process, with matter turning over in the course of developing and sustaining stability of form, it may make sense to consider the relevant invariants for theoretical biology to be process invariants rather than (or in addition to) structural or functional invariants. Consider the conceptual problem of heritability. Transmission of structure unchanged is not a sufficient condition for heritability. Not only must A give rise to more A, but when A mutates to A’, it must also be the case that A’ gives rise to more A’, not more A. Structural stability may capture a feature of “inheritance” in the traditional meaning of property inheritance, but it fails to capture the conditions necessary for a process to transmit the relevant capacity of heritability.

Relations among theories of quantity and quality

Population biology borrows methods and ideas from physics, which is a science of quantity and transformation of quantities. Most of the rest of biology, like parts of chemistry, concerns qualities and the interaction of objects. The relation between physical theories of quantity, chemical theories of objects, and biological theories of whatever-it-is-that-distinguishes-biology-from-chemistry, seems to me a pressing problem for putting biological theory on a sound footing, though I cannot give an argument for this.

The nature of theory in biology

The previous three items raise questions about the nature of theory in biology, whether it is like or unlike theory in other scientific domains, and whether new kinds of theory are needed to put biology on a proper theoretical basis of its own. These are old and philosophical questions. I think that these are important questions for a theoretical biology to answer, but they must be answered in a way that helps empirical biologists do their work better or the exercise will be pointless.

Future Challenges in Theoretical Biology

From Santa Fe Institute Events Wiki