Complex Systems Summer School 2015-Projects & Working Groups

From Santa Fe Institute Events Wiki

Complex Systems Summer School 2015


Reservations for Evans Science rm. 215

Click on the title link to reserve Evans Science rm. 215 for group usage.

Californian Drought Model

Problem definition: Water is an ongoing problem in California. Although corrective measures are taken for some years, periods of droughts are depleting water ground levels to not sustainable levels. Aims: A Netlogo simulation on how likely Californian agents evolve with regards to drought. The ABM could be used to explore the potential impacts of forms of regulation. Outcomes: Create an exploratory agent based model based on real data, enabling the possibility of simulating different methods of regulation, including feedbacks between the economic and hydrologic systems.

Extension 1: We will conduct a network analysis of Twitter data (hashtags #drought #California) to explore the social dynamics and information flows between stakeholders.

Extension 2: Phase-space reconstruction of groundwater level time series data, comparing the dynamics in different parts of the Central Valley.

City Resilience

Summary: This group aims to develop metrics of cities' resilience to various types of disaster, empirically verify this method using information from recent disasters, and compare resilience between a large number of global cities.
Contact: Richard Barnes (
Participants: Alex Tejedor, Laurence Brandenberger, Masa Haraguchi, Matthew Histen, Will Chang, Martina Steffen, Juan Carlos Castilla, Brent Schneeman
Wiki page: Resilience of Cities

Complex Adaptive Systems and the Narrative approach: transdisciplinary methodologies for Complexity Science

The narrative approach and it causality is different from causality in logico-scientific approaches. Making bridges among methodologies transfers knowledge, encourages important questions and switches philosophical paradigms. Formal frameworks of C(A)S can realy be complemented with the narrative, and actually should. The narrative approaches in Complexity Science are an emerging trend for fund-granting, and is really up to date to process documentary and speech-based data, reveal hidden meanings, distinguish causes from effects in overlapping systems of realms - "at the edge" of technology, social, economic, scientific, legal and other domains. The role of time in coding, intuitively generated conditional rules and computational paths. Combinations of C(A)S and the narrative is combination of quantitative and qualitative data and thinking in original - out of convertation and information loss. Synthesis of sciences is what it is about.

Questions: How to combine CS approaches (in particular CAS) and the narrative ones in a rule-based way? What to start from? What vizualizations there can be designed and used? Other questions at your descrete.

Some references: - Non-Equilibrium Social Science in ICT and Economics, CORDIS, EU:,

- Combining Complexity Theory with Narrative Research:

- Haridimos Tsoukas and Mary Jo Hatch (2001), "Complex thinking, complex practice: The case for a narrative approach to organizational complexity:

- David Christian, "Big History", Astrophysics, Chemistry, Biology, Information, emergence of life, technology:

Contact: Anna (

Meet on Monday over lunch

Interested: Marie Pierre, Jeroen, Jim, Melissa, William (maybe)

Improving the design of the power grid using our knowledge about network structure

Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

I have data originating in a European project from the FP7 programme.

There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

To know more...
I uploaded a ppt with some initial ideas to my page on the wiki.

Interested: 'Sara '
'Jean Gab'
'Daniel T'
'Sahil Garg


The 2014-15 Ebola virus disease (EVD) outbreak in West Africa presented both unique opportunities and unique challenges to the epidemiological modeling community. For the first time during an emerging infectious disease outbreak, high resolution data--from a variety of sources--were made available to the academic community and many public health decision makers genuinely engaged with mathematical and computational modelers. However, the popular and scientific press were highly critical of most models ability to project the outbreak's course. The following key and open questions seem ripe for investigation using a complex adaptive systems lens:
1) What features of EVD transmission are most problematic for reliable, robust forecasting: changing behavior, intervention, viral evolution, complex social networks, etc?
2) How/can we use digital data to either improve forecasts or inform model selection?
3) Can one quantify the value of additional information in real-time?
Contact: Samuel Scarpino, SFI Omidyar Fellow, Santa Fe Institute -

Marie-Pierre Hasne
Chris Verzijl
Junming Huang
'Sola Omoju
Christine Harvey
Daniel Citron (
William Chang (williamkurtischang at gee-mail)

Effect of landscape topography on vegetation connectivity and navigability

Landscape topography influences the dynamics of the processes that take place on it. Evolution of ecosystems networks, river networks, vegetation cover type, microclimate cycles are all interlinked deeply to the local landscape topography.

In addition to the landscape these networks are also influenced by each other conditioning the emergence and stability of ecosystems, and subsequently the behaviour of agents in the ecosystem like migration pathways of animal herds, human settlement patterns etc.

Connectivity patterns emerge from the interaction of these processes, and thus a better understanding and quantification of those patterns is critical to understanding the dynamics of the system.

How do we want to approach the problem
In this project, we will randomly generate landscape topography and subsequent vegetation cover from a set of parameters from known geological and biological processes. The generated data set will then be used to investigate the following questions:
(1) What is the connectivity of landscape patterns that emerge at different scales using different techniques such as clustering analysis, percolation theory or network theory, and can we quantify them ?
(2) What is the navigability of biological agents (e.g animals, humans, robots!) under such landscape patterns. We can compare mobility trails in existing landscapes to validate our hypothesis. We propose to use the tools like ABMs to simulate and characterise mobility success.
(3) We further aim to compare the measures of navigability with the metrics of connectivity, establishing a framework of comparison.


Meetings (1) Wednesday 1pm Coffee shop

People Interested

Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe

Click on the link to go to the project page for meetings, project details, and progress.

Mapping Complexity/Human Knowledge as a Complex Adaptive System‏

Ants leave pheromone trail patterns which they are aware of only in a local sense. They do not have the cognitive faculties to step back and look at the trails and grasp the ant-trail network as a totality. Also, the artifacts they leave behind are physical entities which then provides the aggregate feedback to the aggregate ant body to then feed the evolution of the ant body as a CAS system. In contrast, humans do have the requisite cognitive abilities. The "pheromone trails" we leave behind are the knowledge trails coded in symbolic knowledge artifacts. In contrast to the physical artifacts that ants leave behind, the knowledge artifacts that we leave behind are far more flexible and potent, both at the aggregate as well as at the individual levels. But like the ants, until recently, we did not have the means to step back and map the knowledge "pheromone trails" to obtain the big picture and its global/local dynamics. The burgeoning field of scientometrics is making available visualization tools to help us map and study the evolutionary dynamics of the knowledge network structures.
Data and Questions
The goals of this project include

  1. Extract the terms from approximately 1600 working papers published by SFI
  2. Map the intra/inter conceptual network structures
  3. Study the evolution of these structures across time
  4. High-light the gap-closure of knowledge reverse-salients (if any)
  5. Capture any of the network patterns that repeat
  6. Study the diffusion of concepts across the network
  7. Provide visualization tools for navigating the complexity corpus, etc

Possible methods
Latent Semantic Analysis (LSA) and Latent Document Analysis (LDA)

  1. SFI Working Papers:
  2. Atlas-Science-Visualizing:
  3. Atlas-Science-Visualizing WebSite:
  4. Mapping-Scientific-Frontiers-Knowledge-Visualization:
  5. Katy Börner presents at Science of Science:
  6. Scholarly Data, Network Science, and (Google) Maps:
  7. LSA Video Lect:
  8. What is LSA:
  9. LSA Wiki:
  10. LDA:
  11. Fusari, A. (2014). Methodological Misconceptions in the Social Sciences: Rethinking Social Thought and Social Processes:
  12. Kirsh, D. (2013). Thinking with external representations. Cognition Beyond the Brain:
  13. Holland, J. H. (1992). Adaptation in natural and artificial systems:
  14. Where to start with text mining.
  15. Singular Value Decomposition Tutorial
  16. Latent Semantic Analysis (LSA) Tutorial:
  17. LSA in Detail:
  18. Web Based LSA:
  19. Another Technique: t-Distributed Stochastic Neighbor Embedding (t-SNE):
  20. tf–idf:


  • John Thomas (
  • Haitao Shang (
  • Christopher Verzijl (
  • Anna Zaytseva (
  • Penny Mealy (
  • William Leibzon (

Navigating Music, Brain and The Edge of Chaos ‏

Navigating Music, Brain and The Edge of Chaos
Brain Sciences have revealed that we/it lives "on the edge of chaos" exhibiting "self-organized criticality" that is tentatively balanced between normalcy and madness. Over the course of history, humans have used various agents and activities to shape, influence and control this living-chaos ranging from substances such as caffeine, sugar, drugs etc., to activities such as the arts (including music), social-discourse/therapy, meditation etc. Of these, music has a distinct role in shaping our moods and helping us transition between different mental states, as well as maintain it for extended periods of time. Clearly we have been using music to help us control and shape the internal chaos. But until recently, the quantitative instrumentation of this massively complex system that comprises of close to a 100-billion neurons networked into a 1000-trillion synaptic edifice has not been available to the common man. But of late, affordable, wearable EEG's are available on the market, thus making the quantitative study of the influence of music in brain dynamics feasible on a large-scale/crowd-sourcing sense. To help come to terms with the complexity of our 1000-trillion synaptic edifice, we need to gather data on a vast scale. The proposed research is a proof-of-concept, exploratory foray into making this happen.

History of music in 5 min


  • Sara Lumbreras (
  • Ilaria b (
  • Braun, Urs (
  • John Thomas (
  • Glenn Magerman (
  • Daniel Friedman (

Powerlaw fitting and alternative distributions - Theory/statistics

Clauset, Shalizi and Newman (2009) propose a maximum likelihood method to estimate the powerlaw exponent of a variable of interest. This is a great improvement on earlier methods such as OLS that dominated the literature up to then. However, one can fit a powerlaw to any dataset and the most we can say is that our observations are consistent with the hypothesis that x is drawn from a powerlaw distribution. One easily implementable method to compare the powerlaw fit to other fits is then a likelihood ratio test for both models.

One particular discussion is the distinction between a powerlaw (PL) and a log-normal (LN) fit. For an avid discussion between both fits on city size and Zipf's law, see Eeckhout (2004, 2009) and Levy (2009), where the discussion now settled on city sizes following a log-normal distribution instead of a powerlaw. Similarly for the discussion on firm size distribution: Simon and Bonini, 1958; Ijri and Simon, 1977; Stanley et al., 1995; Sutton, 1997; Axtell, 2001; Okuyama et al., 1999; Cabral and Mata, 2003; Gaffeo et al., 2003; Aoyama et al., 2004; Fujiwara et al., 2004a,b; Kaizoji et al., 2006; Takayasu et al., 2008; Duchin and Levy, 2008; Schwarzkopf and Farmer, 2008, ...

This distinction matters for several reasons: - PL and LN come from very similar model and differences in initial conditions can lead to very different outcomes. PL exhibits a choice for x_min, below which the unit of observation is not feasible to exist (eg minimum city size, firm size, word length, ...). LN has no minimum size. - PL and LN that look similar are the difference between infinite (PL) and large but finite variance (LN) - shock propagation: when unit-level shocks are large enough to show aggregate perturbances if the distribution is powerlaw with infinite variance, while these shocks wash out fast when the distribution is log-normal. (Gabaix, 1999; Gabaix 2009; Acemoglu et al. 2012, ...).

I have encountered some issues which I would like to explore further: 1. The distinction between lognormal and powerlaw in the data is very sensitive to data truncation: in the above discussions, researchers have slightly different datasets, covering more or less of the population at hand. Left-truncation (i.e. observations not in the dataset because they are too small to be reported) can strongly drive the outcome of the fit, even when endogenizing the x_min cutoff. I have data on the universe of Belgian firms, much more complete than e.g. US Census data, where I have done some preliminary tests on this. The question is then: how to formalise this distinction and what are the theoretical and practical caveats to look out for when applying this method. 2. MLE fitting seems to be sensitive to the choice of units as well: rescaling a variable by a factor 1000, 1000000 etc seems to influence the endogenous x_min choice and hence the estimated parameter. This reminds me of some work on scaling invariance in negative binomial estimators. What is going on here? 3. Can we set up a model that generalises both? I've been looking at Levy stable distributions, but did not do anything with it yet.

Interested people:

  • Corbain
  • Laura
  • Binyang
  • Sahil
  • Tirtha
  • Glenn
  • Junming
  • ---
  • William Lebzon Don't count me as interested (I probably would join another project but could potentially check in with what you do), but I have read almost all papers you posted before because I needed a way to check if what I had was scale free or not. I ended up using matlab script I found written by one of the previous SFI postdocs (will try to find a link). I'd be interested in what you come up with, but I'm myself fairly bad with pure statistics theory. I may or may not have a dataset with me (I will check) that I collected and analyzed for scale-free properties and found to be super-linear with alpha>1. Any power-law fit you do should similarly show if something sublinear, superlinear, or scale-free. Another issue is being able to get statistics on likelyhood of data having small-world properties - while theoretically scale-free networks in the limit would approach small-world, in practice this may or may not be true, if statistics on this could be done together, it could be helpful.

Literature: Powerlaw fitting in empirical data - Clauset, Shalizi and Newman (2009):

Organ Transplant Analysis

Over 120,000 people in the United States are currently on the waiting list for an organ transplant. The size of this waiting list relates to over 6,000 deaths a year while waiting for a transplant and tens of billions of dollars in government spending. I have access to the following data sets:

  • All transplants performed in the US from October 1987 to June 2014 (including follow-up data)
  • All living and deceased organ donors in the same time period (with follow-up data on living donors)
  • Waiting list data for everyone who signed up for the list
  • 2012 National Survey on Attitudes and Behaviors on Organ Donation
  • Social media data relating to organ donation since 2008

Open to ideas and suggestions for the topic, there are a lot of interesting questions to investigate including cultural/racial/gender differences in organ donation. I have several preliminary reports and exploratory analysis done on differences in donors.

Second Meeting Thursday, June 11th 4:15 in the coffee shop!

Matching Game

Rules for the matching game:

Objective: Create the largest chain possible using blood types in the room.

Follow the blood type guidelines for donation to form a donor chain.

  • Assume that self-matches are not possible, for example someone with a donor of type AB and a recipient of type AB can not match themselves and needs to find someone else to complete their chain.
  • Create a donor chain to help as many people as possible.
  • Post results below for the prize
  • There will be several "good samaritan" donors which will donate and need nothing in return, this is a great way to start your chain. Using a good samaritan chain also means you don't need a donor for the end point.

Game Workspace

Use this workspace to post for and find matches!

Donor Chains

Post your chain here. Example is given below:

Example Chain 1
Person Donor Type Recipient Type
Christine O A
Laura AB B
Matt A AB

Example Chain 2
Person Donor Type Recipient Type
Christine O None
Laura A B
Matt None A
Richard's Current Table

Iterations: 3,610 Length: 37 (everyone that needed an organ got one)

Person Donor Type Recipient Type
LauraCondon O X
Jeroen O O
Keith AB O
Brent O AB
DanielF AB O
Jelle O AB
AndySchauf A A
Chris O A
MatthewO AB O
Sahil O AB
Maria O A
MatthIn AB O
Susanne A AB
Juan AB A
Alejandro O AB
Urs B A
Sam O B
Jakub O A
Michael O A
WillChang O A
Laurence O B
James AB O
Vanessa A AB
Richard A A
Jae B A
Melissa A B
JamesCuton O A
Masa A O
Cobain B A
Charon B B
Nilton O B
Martina A O
Carolina O A
Glenn O A
MariePierre A O
Kleber O A
Alex AB O


Person Donor Type Recipient Type
Alice AB O
Christine O X
Havier AB O
JGab O X
Maggie O X
Matthew O X
Sander A X

Multi-dimensional social networks in the evolution, development and resilience of informal economies.

Informal economies are defined as economic activities that occur outside the purview of corporate public and private institutions. These types of economies proliferate where traditional economic actors are unable to productively exercise their activities, especially due to costly constraints (De Soto 2003). Firms may face adverse incentives to expand production by hiring more workers or incorporating more capital due to the low productivity of their workers or the predatory practices of rapacious elites or corrupt governments. Labor itself, i.e. people, may face hurdles in trying to make the jump towards entrepreneurial activities or jobs in the formal economy which allow the accumulation of experience, retirement savings, access to insurance or precautionary savings to face unexpected events (like disease, disability, etc.) due to lack of capital access (whether human and financial).

For this project, we propose a different interpretation: We seek to understand and model how multi-faceted social networks provide robust alternatives to formal economies, which far from being seen as degenerate forms of social organization, in many instances co-exist, challenge and compete with formal employment and economic activities. While some types of informal economies operate under adverse contexts, their resiliency may be understood as a type of adaptive fitness and not the mere result of stubborn cultural path-dependency.

For the most part, informal economies have been analysed as counterpoint to an ideal type of a formal economy with low levels of trust, respect of property rights, access to credit and public institutions - like the court systems (see Losby et al. 2002 for a review). However, informal economies have been coupled with their formal counterparts since the development of capitalism in the 19th Century. To name a famous historical example, Old London's East End's informal sector was described harshly by Engels (1844) in his famous tract, "The Condition of the Working Class in England". But almost fifty years later, in a new preface to the book, Engels recognized the progress that took place there under the aegis of working class organizations in the area.

Research has begun to catch on to this idea – developing a literature in the areas of informal risk sharing and remittances, which are in some sense informal counterparts to insurance and banking. Consumption patterns in poor, rural villages are remarkably smooth suggesting that risk-sharing measures are prevalent, although imperfect (Townsend). Field studies of networks underlying village risk sharing systems have found that households primarily receive help from existing social connections, such as friends and relatives, in the form of informal loans or transfers. Theoretical work in network economics, by authors such as Matt Jackson, found that certain empirically prevalent network structures may directly benefit the stability of favor exchange systems. Another line of inquiry focuses on remittance income, generally informal transfers across kinship ties. World Bank researchers have found that remittances from overseas migrants respond dramatically to regional income shocks, replacing upwards of 60% of lost income in households with international migrants. Further studies, cited below, have generalized those results. Furthermore, reductions in the cost of sending remittances, which occurs with the advent of mobile money, further mitigates exposure to risk in receiving households.

Also recently, authors have written on how communities come together in situations where formal economies are unavailable by external shocks (like disasters) or due to lack of access (due to conditions of abject poverty). In the case of the latter, Venkatesh (2008) provided gripping testimonies of how informal economies arose and developed in the South Side of Chicago in low trust environments via cash transactions and informal service contracts enforced by (and for the benefit of) local criminal gangs. These gangs were seen, surprisingly, as alternative coordinating mechanisms to settle claims between neighbors. In the case of the former, Storr and Grube (2013) in a series of papers has argued how "shared histories and perspectives, and the stability of social networks within the community" allows communities who suffered disasters to cope and endogenously resolve immediate and complex problems. Providing an example of these social networks in action, research has found that remittances flow quickly into areas affected by natural disasters when there is a technology in place for it to happen.

Taking these lines of inquiry together, the findings suggest that informal handling of risk takes place on a large scale and that social networks informally connect communities in ways that impact their economies. Recent trends suggest that the interaction of formal and informal economic processes is growing on a nationwide scale. First, the aggregate value of remittance flows is large and growing, nearing the value of foreign direct investment. Second, advancements that formalize previously informal transactions are expanding dramatically in emerging economies through various forms of branchless banking. National economies may be embedded in profoundly influential informal systems that have never been holistically studied.

On a more general note, this project touches lingering questions in economics. For example, some might argue that more stringent labor regulations should have increased informal employment, but in fact the opposite happened. And while the economic rationale for the former statement remains valid, we suggest that unions, as other types of social organizations, as examples of ways whereas people interact via organized networks, provide richer dimensions than those suggested by their interaction as mere economic agents. Hence, unions, as well as other types of faith-based, ethnic, community and other interest groups may provide ways of interaction that escape narrow economic outcomes.


• De Soto, Hernando (2000/2003) The Mystery of Capital. Basic Books. • Losby, Jan; Else, John; Kingslow, Marcia; Edgcomb, Elaine; Malm, Erika and Vivian Kao (2002) The Informal Economy: A Literature Review. ISED Consulting and Research and the Aspen Institute Working Paper. • Townsend, Robert M. "Risk and Insurance in Village India." (1994) • Fafchamps, Marcel, and Susan Lund. "Risk-sharing Networks in Rural Philippines." (2003) • Weerdt, Joachim De, and Stefan Dercon. "Risk-sharing Networks and Insurance against Illness." (2006) • Matthew O. Jackson, Tomas Rodriguez-Barraquer and Xu Tan. “Social Capital and Social Quilts: Network Patterns of Favor Exchange.” (2012) • Yang D and H Choi."Are Remittances Insurance? Evidence from Rainfall Shocks in the Philippines"(2007) • Kurosaki, Takashi. "Consumption vulnerability to risk in rural Pakistan." (2006) • Jack, W, and T. Suri. "Risk Sharing and Transactions Costs: Evidence from Kenya's Mobile Money Revolution” (2014) • Engels, Friedrich (1844/1892) The Condition of the Working Class in England. • Venkatesh, Sudhir (2008) Gang Leader for a Day: A Rogue Sociologist Takes to the Streets. Penguin Books. • Storr, Virgil and Laura Grube (2013) The Capacity for Self-Governance and Post-Disaster Resiliency. George Mason University, Department of Economics Working Paper 13-37. • Blumenstock, Joshua Evan and Fafchamps, Marcel and Eagle, Nathan. “Risk and Reciprocity Over the Mobile Phone Network: Evidence from Rwanda” (2011). • Ratha, Dilip. "Workers’ remittances: an important and stable source of external development finance." (2005). • Pénicaud, Claire, and Arunjay Katakam. State of the Industry 2013: Mobile Financial Services for the Unbanked. Rep. N.p.: GSMA MMU, 2013.

If interested, please list your name below.


  • Eloy Fisher (
  • Carolina Mattsson (
  • Sharon Greenblum (
  • Jakub Rojcek (
  • Exploring Community Formation through Analysis of Scholarly Corpus (ArXiv)

    The Arxiv [1] is a free online repository of scientific preprint articles, mostly from physics and mathematics. It currently contains over 800,000 articles, dating back to 1991. Currently, a lot of the data associated with this repository is completely available: paper submission dates; full texts of papers; author names and coauthorship information. Additionally, there is some citations data available. (If necessary, I can also find papers' subdiscipline labels; submitting authors' email address domain names; other things.)

    In the past I have been using these data to try and explore how communities of authors who have shared interests grow over time. For example, we can pinpoint the first ever paper about topological insulators, and search for all subsequent papers on that topic. As more papers are written, authors begin to join the field of research and to form strong ties by collaborating with one another. We can use the ArXiv data to visualize and analyze exactly how these communities form and grow.

    Brainstorming notes

    Isolation between topics, crossing interdisciplinary boundaries, finding (topological) separation distance between disciplines or research groups

    Tipping points - can we look for separation of or mergers of two groups or disciplines?

    Can we identify key papers (or groups of papers) that initiate ties between fields.

    Sentiment analysis: can we identify when a paper's citation is an endorsement or a refutation?

    Tools for Analysis: Change point detection; relational event models

    Can we find more comprehensive/better citation data?

    Lucene and indexing tools: - Can we re-index our text database after removing stop words? - Can we index the titles and abstracts? - Elastic Search - tool for interacting with Lucene

    This set of text-processing tutorials is pretty handy for background info and inspiration. The software package doesn't simply plug-into a Lucene/Solr/Elastic Search index, but it could be done. This package from Standford seems very capable.

    Next meeting

    Thursday @ 9AM in Coffee Shop

    PolComplex - Epistemic communities and policy dynamics in the UK Parliament

    Over the last twenty years, there has been extensive debate about what the core topics in public policy are, and if they change according to type and number. Furthermore, it is still not clear how policy topics mutate and diffuse over time. Human-based text analysis of governmental documents has shed only some light on these research questions: 21 general topics have been proposed, while other accounts tend to restrict it to about five main clusters. Yet, this approach has not been able to devise a reliable and systemic method to capture topic dynamics, diffusion, and their relation to the human actors that talk about them.

    This project is an extension of an exploratory research that intends to overcome such limitations, through the application of natural language processing (more specifically, dynamic topic modeling and sentiment analysis), topological and network analysis on a dataset containing UK House of Commons debates ranging from 1975 to 2014. The aim is manyfold. We hope to identify the structure and dynamics of epistemic policy communities - i.e. of political actors revolving around similar policy interest -, and how these relate to factors such as party membership, seniority, relevant historical events.

    Some references:

    Taddy (2013), "Multinomial Inverse Regression for Text Analysis"

    Blei and Lafferty (2007), "A Correlated Topic Model of Science"

    Genomic variations from a chaotic mapping

    Following Dabby CHAOS 6 (2), 1996 Musical variations from a chaotic mapping, this project will explore genomic variations in key metabolic genes that are known to be widely spread across bacterial phylogeny (e.g. the Nif cluster which is used in nitrogen fixation). There are several ways that genomic data can be treated as "music":

    1) Nucleotide: 4 notes - A, G, C, T
    2) Amino Acids: 20 notes
    3) codon triplets: 64 notes
    4) tetra nucleotides: 256 notes.

    One possible goal is to take a seed sequence from some random organism and generate new variations. Then using homology search (e.g. BLAST) or phylogenetics, determine if that variation exists in nature? If not, perhaps model the potential protein folding (if possible?) to determine if that variation could exist in nature. This is VERY preliminary and can go in many directions.

    If interested please list name below or contact Jarrod at


    Free Energy Theory

    The Free Energy (FE) minimisation framework tries to explain how biological systems (such as a cell or a brain) self-organise in order to occupy the (often very limited number of) non-equilibrium states that minimise free energy. This is also known as active inference. A simple corollary of active inference is that agents behave as to minimise their prediction error, or the difference between prediction and reality. Thermodynamic free energy is a measure of energy available in a system to do useful work. This can be framed in an information theoretic setting, as the difference between how the world is being represented and how it actually is. A better fit means a lower information-theoretic free energy, as more resources are being put to ‘good use’ in representing the world. The overarching logic of FE theory is that a better model of the world help maintain structure and organisation, which ultimately helps the system resist increases in entropy.

    This is not a set-in-stone project with any concrete aim (yet). We are a few people interested in exploring the theoretical and practical implications of these ideas, and you're more than welcome to join in!


    • Jelle Bruineberg ( )
    • Tobias Morville (
    • Susanne Pettersson (
    • Maggie Simon (
    • Martina Steffen (
    • Sahil (
    • Tirtha (
    • Anna (
    • Matt O (
    • Will Chang (williamkurtischang at gmail dot com)
    • Cobain (
    • Thomas Fajergen (


    Improving the design of the power grid using our knowledge about network structure

    Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

    I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

    I have data originating in a European project from the FP7 programme.

    There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

    To know more...
    I uploaded a ppt with some initial ideas to my page on the wiki.

    Interested: 'Sara ' 'Carolina' 'Alice' 'Federico' 'Jean Gab' 'Sola' 'Ilaria' 'Daniel T'

    Comparison of Network vs Scaling Theory based Models in Ecology (Cobain -

    One of the biggest difficulties ecologists face is trying to understand the ecosystem dynamics based on very little biological information (compared to the size of the system) - observational data is logistically difficult and can be very expensive to acquire, as well as often being very time consuming. In terms of modelling, to overcome this problem, two approaches have been utilised.

    The first is a network based approach whereby the nodes represent biomass density of a particular species (or a higher level taxanomic group and/or resources) and edges as the trophic interactions (who eats who). This method depends on what we know about the trophic behaviour of the species involved (which is often very limited and there can be many species to parameterise), and represents only the central tendency of what could potentially be very diverse behaviour within and between populations.

    The second approach is to use scaling theory to describe average trophic interactions and other biological processes based on individual organism body size along the size continuum, viewing the community as a very size structured dynamical system. This requires less specific knowledge about the organisms in the community per se, however this again represents only the central tendency of a given size class of what could be very diverse behaviour. Functional differences can be potentially introduced (coupled benthic-pelagic systems for example), but to the detriment of braking down the predictability of the well known allometric scaling laws with size as specificity increases.

    There has been little to no comparison of these two methods of modelling ecological systems (at least to my knowledge) and the question arises that, given the same starting information, how well do both approaches model the dynamics of an ecosystem, given the limited biological information we have? Is one approach better at capturing energy flow through the system / community structure and stability etc. compared to the other? If the models vary then such information will better inform ecologists on which to use depending on what type of questions they are trying to answer.

    Therefore, the project proposed here aims to investigate these two methods. The community that is used to seed an experimental mesocosm setup (mesocosms are large tank set-ups designed to reflect the complexity of natural ecosystems but still being able to be artificially controlled and are still relatively simple), will be input into two models based on the separate approaches and then run to steady state. The model outputs will then be compared and contrasted with eachother as well as the mesocosm community at steady state. Further work may include examining perturbation dynamics, dependent on what data we have available.

    Interested: Name / Email
    Martina Steffen (

    Dynamics of homicide (Matthew Ingram -

    Integrating temporal, spatial, and multi-level concepts

    1:30pm in the coffee shop

    Am interested in the discussion on this project...Sola Omoju

    Interested - Nilton Cardoso

    Multiplex Adaptive Networks (Daniel -

    7 pm in the coffee shop

    Modeling brain diseases (or cancerous bio pathways) (Sahil -

    Interested people can put a meeting time as per their convenience here.
    No meeting time indicated
    One potential meeting time can be 3pm in coffee shop ?
    Information theoretic algorithms can also be explored for the problem.
    Discovering Structure in High-Dimensional Data Through Correlation Explanation.
    Maximally Informative Hierarchical Representations of High-Dimensional Data.


    • Emilia Wysocka (
    • Laura Condon (

    Analysis of UK parliament speeches 1935-2014 (Stefano -

    No meeting time indicated

    Mapping Complexity/Human Knowledge as a Complex Adaptive System (

    2pm/Wed 6/10 in Conference Room (Tentative)

    Navigating Music, Brain and The Edge of Chaos (

    10:45am/Wed 6/10 in Conference Room 9:00am/Thu 7/10 at the Coffee Shop

    Interested: Sahil Garg

    Analysis of rule-based modeling for dopamine-dependent synaptic plasticity (

    Modeling and analysis of phenomenons and evolution of stochastic, combinatorially complex signalling systems in a qualitative (directed acyclic graphs, networks) and quantitative way (time series generated for all agent/species formed/destroyed in the system).

    Rule-based modeling features:

    • biological systems as concurrent processes
    • dynamics of post-translational modifications
    • domain availability
    • competitive binding
    • causality and intrinsic structure
    • binding sites
    • interaction rules replace reaction equations
    • infinite number of reactions with a small and finite number of rules
    • reduction of parameter space
    • “don't care don't write” - adjustable rule contextualization
    • single reaction rule and parameters generalize classes of multiple rules
    • modular and extensible language
    • specification language & simulation/integration environment
    • static and causal analysis
    • Kappa/KaSim & BioNetGen/NFsim -- specification language/network-free simulator


    • WEDNESDAY 11/06/2015

    Addressing problems in terms of the other aspects of the projects apart from the biological questions and verification (matching results to some published experiments or known behaviour). So my plan is as follows:

    • divide the group (roughly) into people who look in to biological meaning and validation and the one which tries to do the analysis of system phenomenons and evolution of stochastic, combinatorially complex signalling systems in both a qualitative (directed acyclic graphs, networks) and quantitative way (time series generated for all agent/species formed/destroyed in the system) - all possible states, scenarios of the system abstracted from the biological meaning.

    Things that could be modelled (look in dropbox folder:

    • spontaneous flipping of interactions (phosphorylations and others) between proteins, described by the god of non-linear dynamics (S. Strogatz)

    Some refs:

    Ecology non-working group (williamkurtischang at gee-mail dot com)

    Informal ecology interest group.

    Making Supply Chains Resilient to Disasters (mh2905 att columbia dott edu)

    Key words: resilience, natural hazards, supply chains, interdependency, interconnected risks, cascading failures

    Rational: I am interested in examining what kinds of network structures and features contribute to increasing resilience of supply chains to natural disasters. I believe this area of work is important because regional disasters negatively impact the global economy through disruptions in supply chain networks. The pioneer study published in Nature urges the need for making supply chains climate-smart (Levermann 2014). Also in the industry, the World Economic Forum published a report to address this issue in 2013. Few researches, however, assess and model the impacts of adverse weather on supply chains. I would like to evaluate the impacts based on modeling.

    Data set: Supply chains data of a multinational manufacturing company.Global climate data, disaster data, etc.

    Possible techniques: Agent-Based Model, Complexity science, network theory and evolution, complex adaptive systems, GIS, operations research, manufacturing engineering, and I am open to any techniques.

    Myself: As I joined the program yesterday, please find my background here.

    Please put your names below if you want to be informed about this project by email

    Interested: Masa Haraguchi

    From Ethnic Diversity to Religious Zeal: Retrospective/Predictive Construction of the World Ethnic Map (

    There is a new dataset on ethnic groups (2014), which claims to include all ethnic groups of the world There are also several cross national datasets on religion and other variables of interest, normally, socio-economic Coming from social sciences I thought it would be really interesting to apply methods and perspectives from other disciplines to social science data.

    One idea that I have in mind is to explore how ethnicity and religion interact.

    Quick empirical checks indicate that Subsaharan Africa is home to about 1,500 ethnic and subethnic groups, in comparison to about 90 ethnic groups in the Middle East and North Africa. India alone accounts for nearly 2,000 ethnic and subethnic groups, while Europe, including all the countries of the former USSR and large immigrant groups from other parts of the world, has only about 260 ethnic groups. The picture that emerges from this simple comparison is that the spread of Abrahamic religions appears to be associated with the high depletion rate of ethnic groups. The exception of China, which has little more than 50 ethnic groups, can be explained by the country’s long history of a centralized state.

    The idea then is to assume that we have had the control group of nations that did not experience the influx of Abrahamic religions (or more precisely received limited exposure to them) and the experimental group that was exposed to the spread of Abrahamic religions. Since we know what the distribution of ethnic groups in the control group is we can project what the distribution of ethnic groups in the experimental group would be, if the group were not exposed to Abrahamic religions.

    Of course, we could go in the other direction too and make a projection about future – what would happen to ethnic groups if all of them adopted an Abrahamic religion.

    One of the challenges in this project is absence of a dynamical system, I.e. We don’t have data on how these things changed over time, but I still think that projection about the future and the past would be kinda cool:)


    • --
    • William Leibzon ( - anthropologist in me is interested :) but unfortunately mathematician in me wants to join another group at CSSS. If this project goes forward somewhere else in the future, let me know as data scientist in me wouldn't me digesting the data for patterns once I have more time.

    Nationalist vs religious rebel violence (

    I would like to propose yet another possible project. This time it’s on rebel violence. I have a large dataset on rebel (and government) violence (~1 million observations). The paper that used this data was published couple of months ago (Islamists and Nationalists: Rebel Motivation and Counterinsurgency in Russia’s North Caucasus). The dataset contains event counts, GIS data, demographic and economic characteristics and some other stuff. The primary hunch of the paper was to discover the differences between nationalist and Islamist rebels.

    Given all the cool methods we are learning here, my first instinct is to use the methods and tools (TISEAN?) to explore and discover whether nationalist violence is substantively different from religious violence and try to answer why.

    One problem that I can see is that the event coding was done by a machine, so this may have “contaminated” the data. But if both religious and nationalist data were contaminated equally then the difference should still be evident.


    Archived Projects