Complex Systems Summer School 2015-Projects & Working Groups

From Santa Fe Institute Events Wiki

Complex Systems Summer School 2015


The 2014-15 Ebola virus disease (EVD) outbreak in West Africa presented both unique opportunities and unique challenges to the epidemiological modeling community. For the first time during an emerging infectious disease outbreak, high resolution data--from a variety of sources--were made available to the academic community and many public health decision makers genuinely engaged with mathematical and computational modelers. However, the popular and scientific press were highly critical of most models ability to project the outbreak's course. The following key and open questions seem ripe for investigation using a complex adaptive systems lens:
1) What features of EVD transmission are most problematic for reliable, robust forecasting: changing behavior, intervention, viral evolution, complex social networks, etc?
2) How/can we use digital data to either improve forecasts or inform model selection?
3) Can one quantify the value of additional information in real-time?
Contact: Samuel Scarpino, SFI Omidyar Fellow, Santa Fe Institute -

Marie-Pierre Hasne
Chris Verzijl
Junming Huang
'Sola Omoju
Christine Harvey

Homeostatic Dynamics and the Optimality of Behavior

The survival of all organisms is predicated on occupying a small subspace of internal states, the long-run regulation of which is contingent on behaviour. Currently most models of reinforcement learning and decision-making make the assumption that behaviour is optimal insofar as it maximises reward acquisition by maximising the expectation value of reward. An often unchallenged assumption of this approach is that the target variable to be maximized is an ergodic observable. An ergodic observable is characterised by the time-average converging to the expectation value. Recent work by Peters and co-workers on dynamics in decision making [1] [2] show that the underlying dynamics of a process should govern the objective function that is optimised; the expectation operator for purely additive dynamics and the time average for purely multiplicative dynamics.

In this project I will ask two questions: First, what are the characteristic dynamics of homeostatic variables? Second, how do these dynamics constrain the objective function that biological agents must maximise? I will investigate the degree to which such dynamics are ergodic, or not. Non-ergodic processes are likely common in homeostatic systems. For instance, reaction rates of biochemical networks typically grow by a constant multiplicative factor for every stepwise change in core temperature. Any biological agent engaging in behavioural thermoregulation of such products thus faces multiplicative dynamics, and as such according to the framework should maximise time average growth, not the expectation value. I will survey extant literatures on homeostatic systems, looking for cases in which the underlying dynamics are clearly characterised, and for which there is a plausible and unambiguous path to how such a system can be behaviourally regulated.

As a trained economist and neuroscientist working with computational models of decision making under evolutionary constraints, I am especially interested in the dynamics that govern homeostatic processes that are optimised via overt regulatory behaviour - such as temperature, hydration, and energy regulation, such that experimentally testable predictions can be specified.

Interested: <br

[1] O. Peters and M. Gell-Mann, “Evaluating gambles using dynamics,” 2014.
[2] O. Peters, “The time resolution of the St Petersburg paradox,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 369, no. 1956, pp. 4913–4931, Oct. 2011.

Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe

Many communities in Africa have been surprisingly resilient in the face of a host of devastating challenges. The people of Mazvihwa Communal Area in Zimbabwe have lived through more than a century of rapid change through the colonial, liberation war, and post-colonial periods. There have been dramatic changes in public health (ranging from better control of communicable diseases after World War II, to child vaccination programs after independence, to the AIDS pandemic especially from the mid-1990s to the end of the 2000s) and in land access and use (with repeated removals, resistance, and returns of communities to land designated for white settlement). These shifts in population distribution have interacted with rapid natural increase in population (especially in the period 1950-1990) driven by high fertility and declining mortality; followed by recent decades of declining fertility and high AIDS-related mortality. Differences in religious beliefs mean that these changes are uneven across households and areas. The country's economy has meanwhile gone through a series of long cycles of boom and busts, and during the 2000s experienced inflation reaching a billion billion billion per cent.

The Muonde Trust is a Zimbabwean non-governmental organization established to help support the community in Mazvihwa to continue developing and deploying bottom-up solutions in response to these challenges. Mazvihwa has a semi-arid subtropical climate with remnant woodlands and a combination of largely subsistence agriculture and livestock production. From the point of view of most of the people in Mazvihwa, and as taken up by the community network of the Muonde Trust, the “sustainability” of their area now requires a series of linked changes in land use and investments in natural capital.

Data and Questions
The data we have on this community and ecosystem originates from an ongoing community-based participatory research project originally begun in the 1980s and since continued by the Muonde Trust. It includes robust quantitative data on human demography, health, nutrition, agricultural practices, rainfall, land use choices, woodland dynamics, household assets, and land tenure. Our goal at SFI is to develop theoretical or simulation studies which would help us to better understand the resilience and sustainability of this system, which would eventually be informed by the data. Questions we might address using complex systems methods include:

1) How do individuals and resources flow through households and communities? (Empirical data shows that the composition of households changes rapidly, even though most analyses of these societies tends to assume they are static and natural units of analysis). It is clear that individuals are variously strategizing through households as well as within other kin, religious and clan groups. At the same time households also have emergent properties. In contexts of rapidly shifting demography and changing resource access, are there ways that we can use network analysis to illuminate these complexities?

2) How best can community as a whole allocate their land to agriculture, pasture, and woodland when these components interact and feedback to each other? One of the main land-use decisions facing the community is the trade-off between agricultural cultivation (which requires fencing to keep out livestock as well as water harvesting techniques) and retaining woodland areas that have cultural value as well as providing grazing space and forage for livestock (and many other economic benefits). This relationship is complex, with livestock providing benefits to agriculture (manure for fertilizer and draft power for cultivation), and vice versa (well-tended fields provide considerable feed for livestock). The community derives benefits from all these land uses, including food for subsistence from agriculture, meat and milk from livestock, and cultural values and a wide variety of benefits from woodland (including fuelwood, construction materials, a variety of foods and medicines, and improved soil characteristics). In addition, community members may sell livestock, as well as using them for bridewealth and compensation in the case of some deaths. How can this system be represented and manipulated in a model to create optimal strategies for the well-being of the system?

Possible methods
Our methodology is open to what we learn during the summer school, but some ideas include: network analysis to study the way people and resources connect and flow through the households and other components of the system; an analytical mathematical model of the interacting components of the system, e.g. coupled differential equations; cellular automata which can represent the land use category of each part of a farmer's land and underlie a decision support tool.


Interested: 'Sola Omoju

Mapping Complexity/Human Knowledge as a Complex Adaptive System‏

Ants leave pheromone trail patterns which they are aware of only in a local sense. They do not have the cognitive faculties to step back and look at the trails and grasp the ant-trail network as a totality. Also, the artifacts they leave behind are physical entities which then provides the aggregate feedback to the aggregate ant body to then feed the evolution of the ant body as a CAS system. In contrast, humans do have the requisite cognitive abilities. The "pheromone trails" we leave behind are the knowledge trails coded in symbolic knowledge artifacts. In contrast to the physical artifacts that ants leave behind, the knowledge artifacts that we leave behind are far more flexible and potent, both at the aggregate as well as at the individual levels. But like the ants, until recently, we did not have the means to step back and map the knowledge "pheromone trails" to obtain the big picture and its global/local dynamics. The burgeoning field of scientometrics is making available visualization tools to help us map and study the evolutionary dynamics of the knowledge network structures.
Data and Questions
The goals of this project include

  1. Extract the terms from approximately 1600 working papers published by SFI
  2. Map the intra/inter conceptual network structures
  3. Study the evolution of these structures across time
  4. High-light the gap-closure of knowledge reverse-salients (if any)
  5. Capture any of the network patterns that repeat
  6. Study the diffusion of concepts across the network
  7. Provide visualization tools for navigating the complexity corpus, etc

Possible methods
Latent Semantic Analysis (LSA) and Latent Document Analysis (LDA)

  1. SFI Working Papers:
  2. Atlas-Science-Visualizing:
  3. Atlas-Science-Visualizing WebSite:
  4. Mapping-Scientific-Frontiers-Knowledge-Visualization:
  5. Katy Börner presents at Science of Science:
  6. Scholarly Data, Network Science, and (Google) Maps:
  7. LSA Video Lect:
  8. What is LSA:
  9. LSA Wiki:
  10. LDA:


  • John Thomas (
  • Laura Condon (
  • Haitao Shang (
  • Sharon Greenblum (
  • Christopher Verzijl (
  • Nilton Cardoso (
  • Glenn Magerman (
  • Emilia Wysocka (
  • Laurence Brandenberger (
  • Matthew Histen (
  • María Pereda (

Navigating Music, Brain and The Edge of Chaos ‏

Brain Sciences have revealed that we/it lives "on the edge of chaos" exhibiting "self-organized criticality" that is tentatively balanced between normalcy and madness. Over the course of history, humans have used various agents and activities to shape, influence and control this living-chaos ranging from substances such as caffeine, sugar, drugs etc., to activities such as the arts (including music), social-discourse/therapy, meditation etc. Of these, music has a distinct role in shaping our moods and helping us transition between different mental states, as well as maintain it for extended periods of time. Clearly we have been using music to help us control and shape the internal chaos. But until recently, the quantitative instrumentation of this massively complex system that comprises of close to a 100-billion neurons networked into a 1000-trillion synaptic edifice has not been available to the common man. But of late, affordable, wearable EEG's are available on the market, thus making the quantitative study of the influence of music in brain dynamics feasible on a large-scale/crowd-sourcing sense. To help come to terms with the complexity of our 1000-trillion synaptic edifice, we need to gather data on a vast scale. The proposed research is a proof-of-concept, exploratory foray into making this happen.

Data and Questions
The goals of this project include

  1. Evaluate and purchase a wearable EEG
  2. Set up the instrumentation for data capture
  3. Set up the experimentation/data-capture plan
  4. Recruit Subjects
  5. Perform Data Capture
  6. Analyze Results
  7. Propose pathways to take this to the market by embedding it as an app

Possible methods References:

  1. Hacking Your Brain Waves: Wearable Meditation Headsets:
  2. Measure your brainwaves and modify your mind:
  3. This wearable device reads your brain waves. Is there a market for it?:
  4. Your-brain-is-on-the-brink-of-chaos:
  5. Two Decades of Search for Chaos in Brain:
  6. How You Are Who You Are--in Chaos Theory:
  7. Diana Dabby Links:
  8. Diana Dabby Links:
  9. Liz Bradley/Diana Dabby Links:
  10. Music and the Brain:
  11. Why Music Moves Us:
  12. Measuring musical expressivity:
  13. The World In Six Songs:
  14. This-Your-Brain-Music-Obsession:
  15. World-Six-Songs:
  16. Computer Music and the Importance of Fractals, Chaos, and Complexity Theory:
  17. The_Complexity_of_Songs:
  18. Stefan-Koelsch Papers:
  19. Grammar Based Music Composition:


  • Sara Lumbrera (
  • Ilaria b (
  • Braun, Urs (
  • Emilia Wysocka (
  • J Bruineberg (
  • William Kurtis Chang (
  • John Thomas (
  • Christopher Verzijl (
  • Glenn Magerman (
  • Sahil Garg (
  • Daniel Hedblom (
  • Christine Harvey (

Powerlaw fitting and alternative distributions - Theory/statistics

Clauset, Shalizi and Newman (2009) propose a maximum likelihood method to estimate the powerlaw exponent of a variable of interest. This is a great improvement on earlier methods such as OLS that dominated the literature up to then. However, one can fit a powerlaw to any dataset and the most we can say is that our observations are consistent with the hypothesis that x is drawn from a powerlaw distribution. One easily implementable method to compare the powerlaw fit to other fits is then a likelihood ratio test for both models.

One particular discussion is the distinction between a powerlaw (PL) and a log-normal (LN) fit. For an avid discussion between both fits on city size and Zipf's law, see Eeckhout (2004, 2009) and Levy (2009), where the discussion now settled on city sizes following a log-normal distribution instead of a powerlaw. Similarly for the discussion on firm size distribution: Simon and Bonini, 1958; Ijri and Simon, 1977; Stanley et al., 1995; Sutton, 1997; Axtell, 2001; Okuyama et al., 1999; Cabral and Mata, 2003; Gaffeo et al., 2003; Aoyama et al., 2004; Fujiwara et al., 2004a,b; Kaizoji et al., 2006; Takayasu et al., 2008; Duchin and Levy, 2008; Schwarzkopf and Farmer, 2008, ...

This distinction matters for several reasons: - PL and LN come from very similar model and differences in initial conditions can lead to very different outcomes. PL exhibits a choice for x_min, below which the unit of observation is not feasible to exist (eg minimum city size, firm size, word length, ...). LN has no minimum size. - PL and LN that look similar are the difference between infinite (PL) and large but finite variance (LN) - shock propagation: when unit-level shocks are large enough to show aggregate perturbances if the distribution is powerlaw with infinite variance, while these shocks wash out fast when the distribution is log-normal. (Gabaix, 1999; Gabaix 2009; Acemoglu et al. 2012, ...).

I have encountered some issues which I would like to explore further: 1. The distinction between lognormal and powerlaw in the data is very sensitive to data truncation: in the above discussions, researchers have slightly different datasets, covering more or less of the population at hand. Left-truncation (i.e. observations not in the dataset because they are too small to be reported) can strongly drive the outcome of the fit, even when endogenizing the x_min cutoff. I have data on the universe of Belgian firms, much more complete than e.g. US Census data, where I have done some preliminary tests on this. The question is then: how to formalise this distinction and what are the theoretical and practical caveats to look out for when applying this method. 2. MLE fitting seems to be sensitive to the choice of units as well: rescaling a variable by a factor 1000, 1000000 etc seems to influence the endogenous x_min choice and hence the estimated parameter. This reminds me of some work on scaling invariance in negative binomial estimators. What is going on here? 3. Can we set up a model that generalises both? I've been looking at Levy stable distributions, but did not do anything with it yet.


App design for interaction registration

I would like to see a simple smartphone app that can track connections being made between people at events. This would allow to map the evolution of a network at eg a network meeting, SFI 2015, social events etc. I know some people at MIT have been working on ID badges for nurses and doctors to track interaction in a hospital, and there are some business apps that show a plethora of features to enjoy a network event (eg like the Yapp app). However, it would be nice to have a simple app that just registers a link between people when their phones are close enough for a certain interval of time. Additionally, it might record some conversation as to create edge information as well. Unfortunately, I'm not a wizzkid and would need help from an apps programmer to work this out. If it is feasible, I think SFI CSSS 2015 would be a great test case!



  • Laurence Brandenberger (

    Organ Transplant Analysis

    Over 120,000 people in the United States are currently on the waiting list for an organ transplant. The size of this waiting list relates to over 6,000 deaths a year while waiting for a transplant and tens of billions of dollars in government spending. I have access to the following data sets:

    • All transplants performed in the US from October 1987 to June 2014 (including follow-up data)
    • All living and deceased organ donors in the same time period (with follow-up data on living donors)
    • Waiting list data for everyone who signed up for the list
    • 2012 National Survey on Attitudes and Behaviors on Organ Donation
    • Social media data relating to organ donation since 2008

    Open to ideas and suggestions for the topic, there are a lot of interesting questions to investigate including cultural/racial/gender differences in organ donation. I have several preliminary reports and exploratory analysis done on differences in donors.

    If interested, please list your name below.


  • Matthew Histen (

    Preliminary Discussion Meetings

    Exchange-Company Networks (

    11:30am in the coffee shop

    Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe (Melissa -

    10:45am in the senior common room (the room behind our lecture hall)

    Network Analysis of Arxiv (Daniel -

    10:45am in the lecture hall

    Also interested in combining multiplex/multilayer networks

    Ebola virus disease spread (Junming -

    11:00am in the coffee shop

    Scaling effects in bodies, communities, ecosystems (Cobain -

    11:30am in the coffee shop

    Also tying in prehistoric hunting populations

    Dynamics of homicide (Matthew Ingram -

    Integrating temporal, spatial, and multi-level concepts

    1:30pm in the coffee shop

    Am interested in the discussion on this project...Sola Omoju

    Organ Transplant (Christine -

    2pm in the coffee shop Interested: Sahil Garg

    Multi-dimensional social networks in the evolution, development and resilience of informal economies

    2:00pm in the lecture hall

    Eloy & Carolina -,

    City resilience // Evolutionary stable states in trees (Richard -

    3pm in the coffee shop

    Modeling brain diseases (or cancerous bio pathways) (Sahil -

    Interested people can put a meeting time as per their convenience here.
    No meeting time indicated
    One potential meeting time can be 3pm in coffee shop ?
    Information theoretic algorithms can also be explored for the problem.
    Discovering Structure in High-Dimensional Data Through Correlation Explanation.
    Maximally Informative Hierarchical Representations of High-Dimensional Data.


    • Emilia Wysocka (
    • Maggie Simon (

    Analysis of UK parliament speeches 1935-2014 (Stefano -

    No meeting time indicated

    Mapping Complexity/Human Knowledge as a Complex Adaptive System (

    2pm/Wed 6/10 in Conference Room (Tentative)

    Resource allocation trade-offs (Andre -

    Using evolutionary algorithms to investigate trade-offs in allocating goods among agents. This could be potentially done on some real dataset (if you have any) or somehow parametrized synthetic data. Also this could be envisioned as a strategic bargaining between agents, which would introduce some dynamics into the process.

    Email me or sign-up if interested and we'll setup a meeting time.

    Interested: Sola Omoju, Christine Harvey (

    Navigating Music, Brain and The Edge of Chaos (

    10:45am/Wed 6/10 in Conference Room

    Interested: Sahil Garg

    Rule-based modeling for brain diseases (molecular level) (

    Rule-based modeling features:

    • biological systems as concurrent processes
    • dynamics of post-translational modifications
    • domain availability
    • competitive binding
    • causality and intrinsic structure
    • binding sites
    • interaction rules replace reaction equations
    • infinite number of reactions with a small and finite number of rules
    • reduction of parameter space
    • “don't care don't write” - adjustable rule contextualization
    • single reaction rule and parameters generalize classes of multiple rules
    • modular and extensible language
    • specification language & simulation/integration environment
    • static and causal analysis

    Kappa/KaSim & BioNetGen/NFsim -- specification language/network-free simulator

    Meeting -> if anybody is interested - contact me.

    Some refs:

    • Danos, V., & Laneve, C. (2004). Formal Molecular Biology. Theoretical Computer Science, 325.
    • Sorokina, O., Sorokin, A., & Armstrong, J. D. (2011). Towards a quantitative model of the post-synaptic proteome. Molecular bioSystems, 7(10), 2813–2823. doi:10.1039/c1mb05152k
    • Suderman, R., & Deeds, E. J. (2013). Machines vs. Ensembles: Effective MAPK Signaling through Heterogeneous Sets of Protein Complexes. PLoS Computational Biology, 9(10), e1003278. doi:10.1371/journal.pcbi.1003278
    • Chylek, Lily A. and Harris, Leonard A. and Tung, Chang-Shung and Faeder, James R. and Lopez, Carlos F. and Hlavacek, W. S. (2014). Rule-based modeling: a computational approach for studying biomolecular site dynamics in cell signaling systems. Wiley Interdisciplinary Reviews. Systems Biology and Medicine, 6(1), 13–36. doi:10.1002/wsbm.1245
    • Chylek, Lily A. and Stites, Edward C. and Posner, Richard G. and Hlavacek, W. S. (2013). Innovations of the rule-based modeling approach. Systems Biology: Integrative Biology and Simulation Tools. Retrieved March 11, 2014, from
    • Danos, V. (2007). Rule-Based Modelling of Cellular Signalling. Lecture Notes in Computer Science (Vol. 4703). Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from