Complex Systems Summer School 2015-Projects & Working Groups

From Santa Fe Institute Events Wiki

Complex Systems Summer School 2015

City Resilience

Summary: This group aims to develop metrics of cities' resilience to various types of disaster, empirically verify this method using information from recent disasters, and compare resilience between a large number of global cities.
Contact: Richard Barnes (
Participants: Alex Tejedor, Laurence Brandenberger, Masa Haraguchi, Matthew Histen, Will Chang, Juan Carlos Castilla, Brent Schneeman
Wiki page: Resilience of Cities

Complex Adaptive Systems and the Narrative approach: transdisciplinary methodologies for Complexity Science

The narrative approach and it causality is different from causality in logico-scientific approaches. Making bridges among methodologies transfers knowledge, encourages important questions and switches philosophical paradigms. Formal frameworks of C(A)S can realy be complemented with the narrative, and actually should. The narrative approaches in Complexity Science are an emerging trend for fund-granting, and is really up to date to process documentary and speech-based data, reveal hidden meanings, distinguish causes from effects in overlapping systems of realms - "at the edge" of technology, social, economic, scientific, legal and other domains. The role of time in coding, intuitively generated conditional rules and computational paths. Combinations of C(A)S and the narrative is combination of quantitative and qualitative data and thinking in original - out of convertation and information loss. Synthesis of sciences is what it is about.

Questions: How to combine CS approaches (in particular CAS) and the narrative ones in a rule-based way? What to start from? What vizualizations there can be designed and used? Other questions at your descrete.

Some references: - Non-Equilibrium Social Science in ICT and Economics, CORDIS, EU:,

- Combining Complexity Theory with Narrative Research:

- Haridimos Tsoukas and Mary Jo Hatch (2001), "Complex thinking, complex practice: The case for a narrative approach to organizational complexity:

- David Christian, "Big History", Astrophysics, Chemistry, Biology, Information, emergence of life, technology:

Contact: Anna (

Meet on Monday over lunch

Interested: Marie Pierre, Jeroen, Jim, Melissa

Improving the design of the power grid using our knowledge about network structure

Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

I have data originating in a European project from the FP7 programme.

There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

To know more...
I uploaded a ppt with some initial ideas to my page on the wiki.

Interested: 'Sara '
'Jean Gab'
'Daniel T'
'Sahil Garg


The 2014-15 Ebola virus disease (EVD) outbreak in West Africa presented both unique opportunities and unique challenges to the epidemiological modeling community. For the first time during an emerging infectious disease outbreak, high resolution data--from a variety of sources--were made available to the academic community and many public health decision makers genuinely engaged with mathematical and computational modelers. However, the popular and scientific press were highly critical of most models ability to project the outbreak's course. The following key and open questions seem ripe for investigation using a complex adaptive systems lens:
1) What features of EVD transmission are most problematic for reliable, robust forecasting: changing behavior, intervention, viral evolution, complex social networks, etc?
2) How/can we use digital data to either improve forecasts or inform model selection?
3) Can one quantify the value of additional information in real-time?
Contact: Samuel Scarpino, SFI Omidyar Fellow, Santa Fe Institute -

Marie-Pierre Hasne
Chris Verzijl
Junming Huang
'Sola Omoju
Christine Harvey
Daniel Citron (
William Chang (williamkurtischang at gee-mail)

Effect of landscape topography on vegetation connectivity and navigability

Landscape topography influences the dynamics of the processes that take place on it. Evolution of ecosystems networks, river networks, vegetation cover type, microclimate cycles are all interlinked deeply to the local landscape topography.

In addition to the landscape these networks are also influenced by each other conditioning the emergence and stability of ecosystems, and subsequently the behaviour of agents in the ecosystem like migration pathways of animal herds, human settlement patterns etc.

Connectivity patterns emerge from the interaction of these processes, and thus a better understanding and quantification of those patterns is critical to understanding the dynamics of the system.

How do we want to approach the problem
In this project, we will randomly generate landscape topography and subsequent vegetation cover from a set of parameters from known geological and biological processes. The generated data set will then be used to investigate the following questions:
(1) What is the connectivity of landscape patterns that emerge at different scales using different techniques such as clustering analysis, percolation theory or network theory, and can we quantify them ?
(2) What is the navigability of biological agents (e.g animals, humans, robots!) under such landscape patterns. We can compare mobility trails in existing landscapes to validate our hypothesis. We propose to use the tools like ABMs to simulate and characterise mobility success.
(3) We further aim to compare the measures of navigability with the metrics of connectivity, establishing a framework of comparison.


Meetings (1) Wednesday 1pm Coffee shop

People Interested

Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe

Many communities in Africa have been surprisingly resilient in the face of a host of devastating challenges. The people of Mazvihwa Communal Area in Zimbabwe have lived through more than a century of rapid change through the colonial, liberation war, and post-colonial periods. There have been dramatic changes in public health (ranging from better control of communicable diseases after World War II, to child vaccination programs after independence, to the AIDS pandemic especially from the mid-1990s to the end of the 2000s) and in land access and use (with repeated removals, resistance, and returns of communities to land designated for white settlement). These shifts in population distribution have interacted with rapid natural increase in population (especially in the period 1950-1990) driven by high fertility and declining mortality; followed by recent decades of declining fertility and high AIDS-related mortality. Differences in religious beliefs mean that these changes are uneven across households and areas. The country's economy has meanwhile gone through a series of long cycles of boom and busts, and during the 2000s experienced inflation reaching a billion billion billion per cent.

The Muonde Trust is a Zimbabwean non-governmental organization established to help support the community in Mazvihwa to continue developing and deploying bottom-up solutions in response to these challenges. Mazvihwa has a semi-arid subtropical climate with remnant woodlands and a combination of largely subsistence agriculture and livestock production. From the point of view of most of the people in Mazvihwa, and as taken up by the community network of the Muonde Trust, the “sustainability” of their area now requires a series of linked changes in land use and investments in natural capital.

Data and Questions
The data we have on this community and ecosystem originates from an ongoing community-based participatory research project originally begun in the 1980s and since continued by the Muonde Trust. It includes robust quantitative data on human demography, health, nutrition, agricultural practices, rainfall, land use choices, woodland dynamics, household assets, and land tenure. Our goal at SFI is to develop theoretical or simulation studies which would help us to better understand the resilience and sustainability of this system, which would eventually be informed by the data. Questions we might address using complex systems methods include:

1) How do individuals and resources flow through households and communities? (Empirical data shows that the composition of households changes rapidly, even though most analyses of these societies tends to assume they are static and natural units of analysis). It is clear that individuals are variously strategizing through households as well as within other kin, religious and clan groups. At the same time households also have emergent properties. In contexts of rapidly shifting demography and changing resource access, are there ways that we can use network analysis to illuminate these complexities?

2) How best can community as a whole allocate their land to agriculture, pasture, and woodland when these components interact and feedback to each other? One of the main land-use decisions facing the community is the trade-off between agricultural cultivation (which requires fencing to keep out livestock as well as water harvesting techniques) and retaining woodland areas that have cultural value as well as providing grazing space and forage for livestock (and many other economic benefits). This relationship is complex, with livestock providing benefits to agriculture (manure for fertilizer and draft power for cultivation), and vice versa (well-tended fields provide considerable feed for livestock). The community derives benefits from all these land uses, including food for subsistence from agriculture, meat and milk from livestock, and cultural values and a wide variety of benefits from woodland (including fuelwood, construction materials, a variety of foods and medicines, and improved soil characteristics). In addition, community members may sell livestock, as well as using them for bridewealth and compensation in the case of some deaths. How can this system be represented and manipulated in a model to create optimal strategies for the well-being of the system?

Possible methods
Our methodology is open to what we learn during the summer school, but some ideas include: network analysis to study the way people and resources connect and flow through the households and other components of the system; an analytical mathematical model of the interacting components of the system, e.g. coupled differential equations; cellular automata which can represent the land use category of each part of a farmer's land and underlie a decision support tool.


1) Wednesday June 10th at 10:45 am in the Senior Commons Room
2) Thursday June 11th at 4:15 in the Senior Commons Room
3) Friday June 12th at 10:45 at SFI (specific location to be determined; Skype with collaborators from community)
4) Friday June 12th after lunch (1:30 pm) meet to discuss results of phone call earlier that morning

Interested: 'Sola Omoju

Mapping Complexity/Human Knowledge as a Complex Adaptive System‏

Ants leave pheromone trail patterns which they are aware of only in a local sense. They do not have the cognitive faculties to step back and look at the trails and grasp the ant-trail network as a totality. Also, the artifacts they leave behind are physical entities which then provides the aggregate feedback to the aggregate ant body to then feed the evolution of the ant body as a CAS system. In contrast, humans do have the requisite cognitive abilities. The "pheromone trails" we leave behind are the knowledge trails coded in symbolic knowledge artifacts. In contrast to the physical artifacts that ants leave behind, the knowledge artifacts that we leave behind are far more flexible and potent, both at the aggregate as well as at the individual levels. But like the ants, until recently, we did not have the means to step back and map the knowledge "pheromone trails" to obtain the big picture and its global/local dynamics. The burgeoning field of scientometrics is making available visualization tools to help us map and study the evolutionary dynamics of the knowledge network structures.
Data and Questions
The goals of this project include

  1. Extract the terms from approximately 1600 working papers published by SFI
  2. Map the intra/inter conceptual network structures
  3. Study the evolution of these structures across time
  4. High-light the gap-closure of knowledge reverse-salients (if any)
  5. Capture any of the network patterns that repeat
  6. Study the diffusion of concepts across the network
  7. Provide visualization tools for navigating the complexity corpus, etc

Possible methods
Latent Semantic Analysis (LSA) and Latent Document Analysis (LDA)

  1. SFI Working Papers:
  2. Atlas-Science-Visualizing:
  3. Atlas-Science-Visualizing WebSite:
  4. Mapping-Scientific-Frontiers-Knowledge-Visualization:
  5. Katy Börner presents at Science of Science:
  6. Scholarly Data, Network Science, and (Google) Maps:
  7. LSA Video Lect:
  8. What is LSA:
  9. LSA Wiki:
  10. LDA:
  11. Fusari, A. (2014). Methodological Misconceptions in the Social Sciences: Rethinking Social Thought and Social Processes:
  12. Kirsh, D. (2013). Thinking with external representations. Cognition Beyond the Brain:
  13. Holland, J. H. (1992). Adaptation in natural and artificial systems:
  14. Where to start with text mining.
  15. Singular Value Decomposition Tutorial
  16. Latent Semantic Analysis (LSA) Tutorial:
  17. LSA in Detail:
  18. Web Based LSA:
  19. Another Technique: t-Distributed Stochastic Neighbor Embedding (t-SNE):
  20. tf–idf:


  • John Thomas (
  • Haitao Shang (
  • Christopher Verzijl (
  • Anna Zaytseva (
  • Penny Mealy (

Navigating Music, Brain and The Edge of Chaos ‏

Brain Sciences have revealed that we/it lives "on the edge of chaos" exhibiting "self-organized criticality" that is tentatively balanced between normalcy and madness. Over the course of history, humans have used various agents and activities to shape, influence and control this living-chaos ranging from substances such as caffeine, sugar, drugs etc., to activities such as the arts (including music), social-discourse/therapy, meditation etc. Of these, music has a distinct role in shaping our moods and helping us transition between different mental states, as well as maintain it for extended periods of time. Clearly we have been using music to help us control and shape the internal chaos. But until recently, the quantitative instrumentation of this massively complex system that comprises of close to a 100-billion neurons networked into a 1000-trillion synaptic edifice has not been available to the common man. But of late, affordable, wearable EEG's are available on the market, thus making the quantitative study of the influence of music in brain dynamics feasible on a large-scale/crowd-sourcing sense. To help come to terms with the complexity of our 1000-trillion synaptic edifice, we need to gather data on a vast scale. The proposed research is a proof-of-concept, exploratory foray into making this happen.

Data and Questions
The goals of this project include

  1. Evaluate and purchase a wearable EEG
  2. Set up the instrumentation for data capture
  3. Set up the experimentation/data-capture plan
  4. Recruit Subjects
  5. Perform Data Capture
  6. Analyze Results
  7. Propose pathways to take this to the market by embedding it as an app

Possible methods References:

  1. Hacking Your Brain Waves: Wearable Meditation Headsets:
  2. Measure your brainwaves and modify your mind:
  3. This wearable device reads your brain waves. Is there a market for it?:
  4. Your-brain-is-on-the-brink-of-chaos:
  5. Two Decades of Search for Chaos in Brain:
  6. How You Are Who You Are--in Chaos Theory:
  7. Diana Dabby Links:
  8. Diana Dabby Links:
  9. Liz Bradley/Diana Dabby Links:
  10. Music and the Brain:
  11. Why Music Moves Us:
  12. Measuring musical expressivity:
  13. The World In Six Songs:
  14. This-Your-Brain-Music-Obsession:
  15. World-Six-Songs:
  16. Computer Music and the Importance of Fractals, Chaos, and Complexity Theory:
  17. The_Complexity_of_Songs:
  18. Stefan-Koelsch Papers:
  19. Grammar Based Music Composition:

Four principles of bio-musicology, W. Tecumseh Fitch 2015
This paper is a prospectus for biomusical research. Main points:
i) musical is complex (yes!)
ii) questions must be asked from a Tinbergean perspective (i.e. mechanism, ontogeny, phylogeny and function),
iii) comparative between animals (not relevant for us), and
iiii) "ecologically motivated," i.e. not just Western skilled musicians.

Neurological implications and neuropsychological considerations on folk music and dance, Sironi V.A., Riva M.A., 2015
Calls for "Interdisciplinary research on these subjects (ethnomusicology and cultural anthropology, clinical neurology and dynamic psychology, neuroradiology and neurophysiology, and socioneurology and neuromusicology)"

History of music in 5 min


  • Sara Lumbreras (
  • Ilaria b (
  • Braun, Urs (
  • John Thomas (
  • Christopher Verzijl (
  • Glenn Magerman (
  • Daniel Hedblom (
  • Christine Harvey (
  • Vanessa Chioffi (
  • Daniel Friedman (
  • Sharon Greenblum (

Powerlaw fitting and alternative distributions - Theory/statistics

Clauset, Shalizi and Newman (2009) propose a maximum likelihood method to estimate the powerlaw exponent of a variable of interest. This is a great improvement on earlier methods such as OLS that dominated the literature up to then. However, one can fit a powerlaw to any dataset and the most we can say is that our observations are consistent with the hypothesis that x is drawn from a powerlaw distribution. One easily implementable method to compare the powerlaw fit to other fits is then a likelihood ratio test for both models.

One particular discussion is the distinction between a powerlaw (PL) and a log-normal (LN) fit. For an avid discussion between both fits on city size and Zipf's law, see Eeckhout (2004, 2009) and Levy (2009), where the discussion now settled on city sizes following a log-normal distribution instead of a powerlaw. Similarly for the discussion on firm size distribution: Simon and Bonini, 1958; Ijri and Simon, 1977; Stanley et al., 1995; Sutton, 1997; Axtell, 2001; Okuyama et al., 1999; Cabral and Mata, 2003; Gaffeo et al., 2003; Aoyama et al., 2004; Fujiwara et al., 2004a,b; Kaizoji et al., 2006; Takayasu et al., 2008; Duchin and Levy, 2008; Schwarzkopf and Farmer, 2008, ...

This distinction matters for several reasons: - PL and LN come from very similar model and differences in initial conditions can lead to very different outcomes. PL exhibits a choice for x_min, below which the unit of observation is not feasible to exist (eg minimum city size, firm size, word length, ...). LN has no minimum size. - PL and LN that look similar are the difference between infinite (PL) and large but finite variance (LN) - shock propagation: when unit-level shocks are large enough to show aggregate perturbances if the distribution is powerlaw with infinite variance, while these shocks wash out fast when the distribution is log-normal. (Gabaix, 1999; Gabaix 2009; Acemoglu et al. 2012, ...).

I have encountered some issues which I would like to explore further: 1. The distinction between lognormal and powerlaw in the data is very sensitive to data truncation: in the above discussions, researchers have slightly different datasets, covering more or less of the population at hand. Left-truncation (i.e. observations not in the dataset because they are too small to be reported) can strongly drive the outcome of the fit, even when endogenizing the x_min cutoff. I have data on the universe of Belgian firms, much more complete than e.g. US Census data, where I have done some preliminary tests on this. The question is then: how to formalise this distinction and what are the theoretical and practical caveats to look out for when applying this method. 2. MLE fitting seems to be sensitive to the choice of units as well: rescaling a variable by a factor 1000, 1000000 etc seems to influence the endogenous x_min choice and hence the estimated parameter. This reminds me of some work on scaling invariance in negative binomial estimators. What is going on here? 3. Can we set up a model that generalises both? I've been looking at Levy stable distributions, but did not do anything with it yet.

Interested people:

  • Corbain
  • Laura
  • Binyang
  • Sahil
  • Tirtha
  • Glenn
  • Junming

Literature: Powerlaw fitting in empirical data - Clauset, Shalizi and Newman (2009):

Organ Transplant Analysis

Over 120,000 people in the United States are currently on the waiting list for an organ transplant. The size of this waiting list relates to over 6,000 deaths a year while waiting for a transplant and tens of billions of dollars in government spending. I have access to the following data sets:

  • All transplants performed in the US from October 1987 to June 2014 (including follow-up data)
  • All living and deceased organ donors in the same time period (with follow-up data on living donors)
  • Waiting list data for everyone who signed up for the list
  • 2012 National Survey on Attitudes and Behaviors on Organ Donation
  • Social media data relating to organ donation since 2008

Open to ideas and suggestions for the topic, there are a lot of interesting questions to investigate including cultural/racial/gender differences in organ donation. I have several preliminary reports and exploratory analysis done on differences in donors.

Second Meeting Thursday, June 11th 4:15 in the coffee shop!

First Meeting Notes


  • Look into the differences in waiting time geographically and map how it changes over time.
  • Differences in cities compared to suburban areas.
  • What are the critical links to perform matches for multiple people.
  • Look at scaling laws in the data, if the number of donors grow, how does the number of successful transplants also grow. This includes the correlation between wait time and organ donation or size of waiting list.
  • Investigate trends and dynamical systems, review the following dynamics as a function of time:
    • Entry rate to the waiting list system
    • Exit rate from the waiting list (transplant, death, other)

There is a Google Doc with space for notes. Please email Christine Harvey ( for access.

Multi-dimensional social networks in the evolution, development and resilience of informal economies.

Informal economies are defined as economic activities that occur outside the purview of corporate public and private institutions. These types of economies proliferate where traditional economic actors are unable to productively exercise their activities, especially due to costly constraints (De Soto 2003). Firms may face adverse incentives to expand production by hiring more workers or incorporating more capital due to the low productivity of their workers or the predatory practices of rapacious elites or corrupt governments. Labor itself, i.e. people, may face hurdles in trying to make the jump towards entrepreneurial activities or jobs in the formal economy which allow the accumulation of experience, retirement savings, access to insurance or precautionary savings to face unexpected events (like disease, disability, etc.) due to lack of capital access (whether human and financial).

For this project, we propose a different interpretation: We seek to understand and model how multi-faceted social networks provide robust alternatives to formal economies, which far from being seen as degenerate forms of social organization, in many instances co-exist, challenge and compete with formal employment and economic activities. While some types of informal economies operate under adverse contexts, their resiliency may be understood as a type of adaptive fitness and not the mere result of stubborn cultural path-dependency.

For the most part, informal economies have been analysed as counterpoint to an ideal type of a formal economy with low levels of trust, respect of property rights, access to credit and public institutions - like the court systems (see Losby et al. 2002 for a review). However, informal economies have been coupled with their formal counterparts since the development of capitalism in the 19th Century. To name a famous historical example, Old London's East End's informal sector was described harshly by Engels (1844) in his famous tract, "The Condition of the Working Class in England". But almost fifty years later, in a new preface to the book, Engels recognized the progress that took place there under the aegis of working class organizations in the area.

Research has begun to catch on to this idea – developing a literature in the areas of informal risk sharing and remittances, which are in some sense informal counterparts to insurance and banking. Consumption patterns in poor, rural villages are remarkably smooth suggesting that risk-sharing measures are prevalent, although imperfect (Townsend). Field studies of networks underlying village risk sharing systems have found that households primarily receive help from existing social connections, such as friends and relatives, in the form of informal loans or transfers. Theoretical work in network economics, by authors such as Matt Jackson, found that certain empirically prevalent network structures may directly benefit the stability of favor exchange systems. Another line of inquiry focuses on remittance income, generally informal transfers across kinship ties. World Bank researchers have found that remittances from overseas migrants respond dramatically to regional income shocks, replacing upwards of 60% of lost income in households with international migrants. Further studies, cited below, have generalized those results. Furthermore, reductions in the cost of sending remittances, which occurs with the advent of mobile money, further mitigates exposure to risk in receiving households.

Also recently, authors have written on how communities come together in situations where formal economies are unavailable by external shocks (like disasters) or due to lack of access (due to conditions of abject poverty). In the case of the latter, Venkatesh (2008) provided gripping testimonies of how informal economies arose and developed in the South Side of Chicago in low trust environments via cash transactions and informal service contracts enforced by (and for the benefit of) local criminal gangs. These gangs were seen, surprisingly, as alternative coordinating mechanisms to settle claims between neighbors. In the case of the former, Storr and Grube (2013) in a series of papers has argued how "shared histories and perspectives, and the stability of social networks within the community" allows communities who suffered disasters to cope and endogenously resolve immediate and complex problems. Providing an example of these social networks in action, research has found that remittances flow quickly into areas affected by natural disasters when there is a technology in place for it to happen.

Taking these lines of inquiry together, the findings suggest that informal handling of risk takes place on a large scale and that social networks informally connect communities in ways that impact their economies. Recent trends suggest that the interaction of formal and informal economic processes is growing on a nationwide scale. First, the aggregate value of remittance flows is large and growing, nearing the value of foreign direct investment. Second, advancements that formalize previously informal transactions are expanding dramatically in emerging economies through various forms of branchless banking. National economies may be embedded in profoundly influential informal systems that have never been holistically studied.

On a more general note, this project touches lingering questions in economics. For example, some might argue that more stringent labor regulations should have increased informal employment, but in fact the opposite happened. And while the economic rationale for the former statement remains valid, we suggest that unions, as other types of social organizations, as examples of ways whereas people interact via organized networks, provide richer dimensions than those suggested by their interaction as mere economic agents. Hence, unions, as well as other types of faith-based, ethnic, community and other interest groups may provide ways of interaction that escape narrow economic outcomes.


• De Soto, Hernando (2000/2003) The Mystery of Capital. Basic Books. • Losby, Jan; Else, John; Kingslow, Marcia; Edgcomb, Elaine; Malm, Erika and Vivian Kao (2002) The Informal Economy: A Literature Review. ISED Consulting and Research and the Aspen Institute Working Paper. • Townsend, Robert M. "Risk and Insurance in Village India." (1994) • Fafchamps, Marcel, and Susan Lund. "Risk-sharing Networks in Rural Philippines." (2003) • Weerdt, Joachim De, and Stefan Dercon. "Risk-sharing Networks and Insurance against Illness." (2006) • Matthew O. Jackson, Tomas Rodriguez-Barraquer and Xu Tan. “Social Capital and Social Quilts: Network Patterns of Favor Exchange.” (2012) • Yang D and H Choi."Are Remittances Insurance? Evidence from Rainfall Shocks in the Philippines"(2007) • Kurosaki, Takashi. "Consumption vulnerability to risk in rural Pakistan." (2006) • Jack, W, and T. Suri. "Risk Sharing and Transactions Costs: Evidence from Kenya's Mobile Money Revolution” (2014) • Engels, Friedrich (1844/1892) The Condition of the Working Class in England. • Venkatesh, Sudhir (2008) Gang Leader for a Day: A Rogue Sociologist Takes to the Streets. Penguin Books. • Storr, Virgil and Laura Grube (2013) The Capacity for Self-Governance and Post-Disaster Resiliency. George Mason University, Department of Economics Working Paper 13-37. • Blumenstock, Joshua Evan and Fafchamps, Marcel and Eagle, Nathan. “Risk and Reciprocity Over the Mobile Phone Network: Evidence from Rwanda” (2011). • Ratha, Dilip. "Workers’ remittances: an important and stable source of external development finance." (2005). • Pénicaud, Claire, and Arunjay Katakam. State of the Industry 2013: Mobile Financial Services for the Unbanked. Rep. N.p.: GSMA MMU, 2013.

If interested, please list your name below.


  • Eloy Fisher (
  • Carolina Mattsson (
  • Sharon Greenblum (
  • Jakub Rojcek (
  • Exploring Community Formation through Analysis of Scholarly Corpus (ArXiv)

    The Arxiv [1] is a free online repository of scientific preprint articles, mostly from physics and mathematics. It currently contains over 800,000 articles, dating back to 1991. Currently, a lot of the data associated with this repository is completely available: paper submission dates; full texts of papers; author names and coauthorship information. Additionally, there is some citations data available. (If necessary, I can also find papers' subdiscipline labels; submitting authors' email address domain names; other things.)

    In the past I have been using these data to try and explore how communities of authors who have shared interests grow over time. For example, we can pinpoint the first ever paper about topological insulators, and search for all subsequent papers on that topic. As more papers are written, authors begin to join the field of research and to form strong ties by collaborating with one another. We can use the ArXiv data to visualize and analyze exactly how these communities form and grow.

    Brainstorming notes

    Isolation between topics, crossing interdisciplinary boundaries, finding (topological) separation distance between disciplines or research groups

    Tipping points - can we look for separation of or mergers of two groups or disciplines?

    Can we identify key papers (or groups of papers) that initiate ties between fields.

    Sentiment analysis: can we identify when a paper's citation is an endorsement or a refutation?

    Tools for Analysis: Change point detection; relational event models

    Can we find more comprehensive/better citation data?

    Lucene and indexing tools: - Can we re-index our text database after removing stop words? - Can we index the titles and abstracts? - Elastic Search - tool for interacting with Lucene

    This set of text-processing tutorials is pretty handy for background info and inspiration. The software package doesn't simply plug-into a Lucene/Solr/Elastic Search index, but it could be done. This package from Standford seems very capable.

    Next meeting

    Thursday @ 9AM in Coffee Shop

    Political Speech

    Some references:

    Taddy (2013), "Multinomial Inverse Regression for Text Analysis"

    Blei and Lafferty (2007), "A Correlated Topic Model of Science"

    Genomic variations from a chaotic mapping

    Following Dabby CHAOS 6 (2), 1996 Musical variations from a chaotic mapping, this project will explore genomic variations in key metabolic genes that are known to be widely spread across bacterial phylogeny (e.g. the Nif cluster which is used in nitrogen fixation). There are several ways that genomic data can be treated as "music":

    1) Nucleotide: 4 notes - A, G, C, T
    2) Amino Acids: 20 notes
    3) codon triplets: 64 notes
    4) tetra nucleotides: 256 notes.

    One possible goal is to take a seed sequence from some random organism and generate new variations. Then using homology search (e.g. BLAST) or phylogenetics, determine if that variation exists in nature? If not, perhaps model the potential protein folding (if possible?) to determine if that variation could exist in nature. This is VERY preliminary and can go in many directions.

    If interested please list name below or contact Jarrod at



    Preliminary Discussion Meetings

    Free Energy Theory

    The Free Energy (FE) minimisation framework tries to explain how biological systems (such as a cell or a brain) self-organise in order to occupy the (often very limited number of) non-equilibrium states that minimise free energy. This is also known as active inference. A simple corollary of active inference is that agents behave as to minimise their prediction error, or the difference between prediction and reality. Thermodynamic free energy is a measure of energy available in a system to do useful work. This can be framed in an information theoretic setting, as the difference between how the world is being represented and how it actually is. A better fit means a lower information-theoretic free energy, as more resources are being put to ‘good use’ in representing the world. The overarching logic of FE theory is that a better model of the world help maintain structure and organisation, which ultimately helps the system resist increases in entropy.

    This is not a set-in-stone project with any concrete aim (yet). We are a few people interested in exploring the theoretical and practical implications of these ideas, and you're more than welcome to join in!


    • Jelle Bruineberg ( )
    • Tobias Morville (
    • Susanne Pettersson (
    • Maggie Simon (
    • Sahil (
    • Tirtha (
    • Anna (
    • Matt O (
    • Will Chang (williamkurtischang at gmail dot com)


    Improving the design of the power grid using our knowledge about network structure

    Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

    I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

    I have data originating in a European project from the FP7 programme.

    There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

    To know more...
    I uploaded a ppt with some initial ideas to my page on the wiki.

    Interested: 'Sara ' 'Carolina' 'Alice' 'Federico' 'Jean Gab' 'Sola' 'Ilaria' 'Daniel T'

    Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe (Melissa -

    10:45am in the senior common room (the room behind our lecture hall)

    Ebola virus disease spread (Junming -

    11:00am in the coffee shop

    Comparison of Network vs Scaling Theory based Models in Ecology (Cobain -

    One of the biggest difficulties ecologists face is trying to understand the ecosystem dynamics based on very little biological information (compared to the size of the system) - observational data is logistically difficult and can be very expensive to acquire, as well as often being very time consuming. In terms of modelling, to overcome this problem, two approaches have been utilised.

    The first is a network based approach whereby the nodes represent biomass density of a particular species (or a higher level taxanomic group and/or resources) and edges as the trophic interactions (who eats who). This method depends on what we know about the trophic behaviour of the species involved (which is often very limited and there can be many species to parameterise), and represents only the central tendency of what could potentially be very diverse behaviour within and between populations.

    The second approach is to use scaling theory to describe average trophic interactions and other biological processes based on individual organism body size along the size continuum, viewing the community as a very size structured dynamical system. This requires less specific knowledge about the organisms in the community per se, however this again represents only the central tendency of a given size class of what could be very diverse behaviour. Functional differences can be potentially introduced (coupled benthic-pelagic systems for example), but to the detriment of braking down the predictability of the well known allometric scaling laws with size as specificity increases.

    There has been little to no comparison of these two methods of modelling ecological systems (at least to my knowledge) and the question arises that, given the same starting information, how well do both approaches model the dynamics of an ecosystem, given the limited biological information we have? Is one approach better at capturing energy flow through the system / community structure and stability etc. compared to the other? If the models vary then such information will better inform ecologists on which to use depending on what type of questions they are trying to answer.

    Therefore, the project proposed here aims to investigate these two methods. The community that is used to seed an experimental mesocosm setup (mesocosms are large tank set-ups designed to reflect the complexity of natural ecosystems but still being able to be artificially controlled and are still relatively simple), will be input into two models based on the separate approaches and then run to steady state. The model outputs will then be compared and contrasted with eachother as well as the mesocosm community at steady state. Further work may include examining perturbation dynamics, dependent on what data we have available.

    Interested: Name / Email

    Dynamics of homicide (Matthew Ingram -

    Integrating temporal, spatial, and multi-level concepts

    1:30pm in the coffee shop

    Am interested in the discussion on this project...Sola Omoju

    Interested - Nilton Cardoso

    Organ Transplant (Christine -

    2pm in the coffee shop Interested: Sahil Garg

    Multi-dimensional social networks in the evolution, development and resilience of informal economies

    9:00am in the lecture hall!!

    Eloy & Carolina -,

    Multiplex Adaptive Networks (Daniel -

    7 pm in the coffee shop

    Modeling brain diseases (or cancerous bio pathways) (Sahil -

    Interested people can put a meeting time as per their convenience here.
    No meeting time indicated
    One potential meeting time can be 3pm in coffee shop ?
    Information theoretic algorithms can also be explored for the problem.
    Discovering Structure in High-Dimensional Data Through Correlation Explanation.
    Maximally Informative Hierarchical Representations of High-Dimensional Data.


    • Emilia Wysocka (
    • Laura Condon (

    Analysis of UK parliament speeches 1935-2014 (Stefano -

    No meeting time indicated

    Mapping Complexity/Human Knowledge as a Complex Adaptive System (

    2pm/Wed 6/10 in Conference Room (Tentative)

    Resource allocation trade-offs (Andre -

    Using evolutionary algorithms to investigate trade-offs in allocating goods among agents. This could be potentially done on some real dataset (if you have any) or somehow parametrized synthetic data. Also this could be envisioned as a strategic bargaining between agents, which would introduce some dynamics into the process.

    Email me or sign-up if interested and we'll setup a meeting time.

    Meeting Thursday 9:30, coffee shop

    Interested: Sola Omoju, Christine Harvey (

    Navigating Music, Brain and The Edge of Chaos (

    10:45am/Wed 6/10 in Conference Room 9:00am/Thu 7/10 at the Coffee Shop

    Interested: Sahil Garg

    Rule-based modeling for brain diseases (molecular level) (

    Rule-based modeling features:

    • biological systems as concurrent processes
    • dynamics of post-translational modifications
    • domain availability
    • competitive binding
    • causality and intrinsic structure
    • binding sites
    • interaction rules replace reaction equations
    • infinite number of reactions with a small and finite number of rules
    • reduction of parameter space
    • “don't care don't write” - adjustable rule contextualization
    • single reaction rule and parameters generalize classes of multiple rules
    • modular and extensible language
    • specification language & simulation/integration environment
    • static and causal analysis
    • Kappa/KaSim & BioNetGen/NFsim -- specification language/network-free simulator


    • WEDNESDAY 11/06/2015

    Addressing problems in terms of the other aspects of the projects apart from the biological questions and verification (matching results to some published experiments or known behaviour). So my plan is as follows:

    • divide the group (roughly) into people who look in to biological meaning and validation and the one which tries to do the analysis of system phenomenons and evolution of stochastic, combinatorially complex signalling systems in both a qualitative (directed acyclic graphs, networks) and quantitative way (time series generated for all agent/species formed/destroyed in the system) - all possible states, scenarios of the system abstracted from the biological meaning.

    Things that could be modelled (look in dropbox folder:

    • spontaneous flipping of interactions (phosphorylations and others) between proteins, described by the god of non-linear dynamics (S. Strogatz)

    Some refs:

    Ecology non-working group (williamkurtischang at gee-mail dot com)

    Informal ecology interest group.

    Making Supply Chains Resilient to Disasters (mh2905 att columbia dott edu)

    Key words: resilience, natural hazards, supply chains, interdependency, interconnected risks, cascading failures

    Rational: I am interested in examining what kinds of network structures and features contribute to increasing resilience of supply chains to natural disasters. I believe this area of work is important because regional disasters negatively impact the global economy through disruptions in supply chain networks. The pioneer study published in Nature urges the need for making supply chains climate-smart (Levermann 2014). Also in the industry, the World Economic Forum published a report to address this issue in 2013. Few researches, however, assess and model the impacts of adverse weather on supply chains. I would like to evaluate the impacts based on modeling.

    Data set: Supply chains data of a multinational manufacturing company.Global climate data, disaster data, etc.

    Possible techniques: Agent-Based Model, Complexity science, network theory and evolution, complex adaptive systems, GIS, operations research, manufacturing engineering, and I am open to any techniques.

    Myself: As I joined the program yesterday, please find my background here.

    Please put your names below if you want to be informed about this project by email

    Interested: Masa Haraguchi

    From Ethnic Diversity to Religious Zeal: Retrospective/Predictive Construction of the World Ethnic Map (

    There is a new dataset on ethnic groups (2014), which claims to include all ethnic groups of the world There are also several cross national datasets on religion and other variables of interest, normally, socio-economic Coming from social sciences I thought it would be really interesting to apply methods and perspectives from other disciplines to social science data.

    One idea that I have in mind is to explore how ethnicity and religion interact.

    Quick empirical checks indicate that Subsaharan Africa is home to about 1,500 ethnic and subethnic groups, in comparison to about 90 ethnic groups in the Middle East and North Africa. India alone accounts for nearly 2,000 ethnic and subethnic groups, while Europe, including all the countries of the former USSR and large immigrant groups from other parts of the world, has only about 260 ethnic groups. The picture that emerges from this simple comparison is that the spread of Abrahamic religions appears to be associated with the high depletion rate of ethnic groups. The exception of China, which has little more than 50 ethnic groups, can be explained by the country’s long history of a centralized state.

    The idea then is to assume that we have had the control group of nations that did not experience the influx of Abrahamic religions (or more precisely received limited exposure to them) and the experimental group that was exposed to the spread of Abrahamic religions. Since we know what the distribution of ethnic groups in the control group is we can project what the distribution of ethnic groups in the experimental group would be, if the group were not exposed to Abrahamic religions.

    Of course, we could go in the other direction too and make a projection about future – what would happen to ethnic groups if all of them adopted an Abrahamic religion.

    One of the challenges in this project is absence of a dynamical system, I.e. We don’t have data on how these things changed over time, but I still think that projection about the future and the past would be kinda cool:)