Complex Systems Summer School 2015-Projects & Working Groups

From Santa Fe Institute Events Wiki

Complex Systems Summer School 2015

City Resilience

Summary: This group aims to develop metrics of cities' resilience to various types of disaster, empirically verify this method using information from recent disasters, and compare resilience between a large number of global cities.
Contact: Richard Barnes (
Participants: Alex Tejedor, Laurence Brandenberger, Masa Haraguchi, Matthew Histen, Will Chang, Juan Carlos Castilla
Wiki page: Resilience of Cities

Improving the design of the power grid using our knowledge about network structure

Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

I have data originating in a European project from the FP7 programme.

There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

To know more...
I uploaded a ppt with some initial ideas to my page on the wiki.

Interested: 'Sara '
'Jean Gab'
'Daniel T'
'Sahil Garg


The 2014-15 Ebola virus disease (EVD) outbreak in West Africa presented both unique opportunities and unique challenges to the epidemiological modeling community. For the first time during an emerging infectious disease outbreak, high resolution data--from a variety of sources--were made available to the academic community and many public health decision makers genuinely engaged with mathematical and computational modelers. However, the popular and scientific press were highly critical of most models ability to project the outbreak's course. The following key and open questions seem ripe for investigation using a complex adaptive systems lens:
1) What features of EVD transmission are most problematic for reliable, robust forecasting: changing behavior, intervention, viral evolution, complex social networks, etc?
2) How/can we use digital data to either improve forecasts or inform model selection?
3) Can one quantify the value of additional information in real-time?
Contact: Samuel Scarpino, SFI Omidyar Fellow, Santa Fe Institute -

Marie-Pierre Hasne
Chris Verzijl
Junming Huang
'Sola Omoju
Christine Harvey
Daniel Citron (
William Chang (williamkurtischang at gee-mail)

Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe

Many communities in Africa have been surprisingly resilient in the face of a host of devastating challenges. The people of Mazvihwa Communal Area in Zimbabwe have lived through more than a century of rapid change through the colonial, liberation war, and post-colonial periods. There have been dramatic changes in public health (ranging from better control of communicable diseases after World War II, to child vaccination programs after independence, to the AIDS pandemic especially from the mid-1990s to the end of the 2000s) and in land access and use (with repeated removals, resistance, and returns of communities to land designated for white settlement). These shifts in population distribution have interacted with rapid natural increase in population (especially in the period 1950-1990) driven by high fertility and declining mortality; followed by recent decades of declining fertility and high AIDS-related mortality. Differences in religious beliefs mean that these changes are uneven across households and areas. The country's economy has meanwhile gone through a series of long cycles of boom and busts, and during the 2000s experienced inflation reaching a billion billion billion per cent.

The Muonde Trust is a Zimbabwean non-governmental organization established to help support the community in Mazvihwa to continue developing and deploying bottom-up solutions in response to these challenges. Mazvihwa has a semi-arid subtropical climate with remnant woodlands and a combination of largely subsistence agriculture and livestock production. From the point of view of most of the people in Mazvihwa, and as taken up by the community network of the Muonde Trust, the “sustainability” of their area now requires a series of linked changes in land use and investments in natural capital.

Data and Questions
The data we have on this community and ecosystem originates from an ongoing community-based participatory research project originally begun in the 1980s and since continued by the Muonde Trust. It includes robust quantitative data on human demography, health, nutrition, agricultural practices, rainfall, land use choices, woodland dynamics, household assets, and land tenure. Our goal at SFI is to develop theoretical or simulation studies which would help us to better understand the resilience and sustainability of this system, which would eventually be informed by the data. Questions we might address using complex systems methods include:

1) How do individuals and resources flow through households and communities? (Empirical data shows that the composition of households changes rapidly, even though most analyses of these societies tends to assume they are static and natural units of analysis). It is clear that individuals are variously strategizing through households as well as within other kin, religious and clan groups. At the same time households also have emergent properties. In contexts of rapidly shifting demography and changing resource access, are there ways that we can use network analysis to illuminate these complexities?

2) How best can community as a whole allocate their land to agriculture, pasture, and woodland when these components interact and feedback to each other? One of the main land-use decisions facing the community is the trade-off between agricultural cultivation (which requires fencing to keep out livestock as well as water harvesting techniques) and retaining woodland areas that have cultural value as well as providing grazing space and forage for livestock (and many other economic benefits). This relationship is complex, with livestock providing benefits to agriculture (manure for fertilizer and draft power for cultivation), and vice versa (well-tended fields provide considerable feed for livestock). The community derives benefits from all these land uses, including food for subsistence from agriculture, meat and milk from livestock, and cultural values and a wide variety of benefits from woodland (including fuelwood, construction materials, a variety of foods and medicines, and improved soil characteristics). In addition, community members may sell livestock, as well as using them for bridewealth and compensation in the case of some deaths. How can this system be represented and manipulated in a model to create optimal strategies for the well-being of the system?

Possible methods
Our methodology is open to what we learn during the summer school, but some ideas include: network analysis to study the way people and resources connect and flow through the households and other components of the system; an analytical mathematical model of the interacting components of the system, e.g. coupled differential equations; cellular automata which can represent the land use category of each part of a farmer's land and underlie a decision support tool.


1) Wednesday June 10th at 10:45 am in the Senior Commons Room
2) Thursday June 11th at 4:15 in the Senior Commons Room
3) Friday June 12th at 10:45 at SFI (specific location to be determined; Skype with collaborators from community)
4) Friday June 12th after lunch (1:30 pm) meet to discuss results of phone call earlier that morning

Interested: 'Sola Omoju

Mapping Complexity/Human Knowledge as a Complex Adaptive System‏

Ants leave pheromone trail patterns which they are aware of only in a local sense. They do not have the cognitive faculties to step back and look at the trails and grasp the ant-trail network as a totality. Also, the artifacts they leave behind are physical entities which then provides the aggregate feedback to the aggregate ant body to then feed the evolution of the ant body as a CAS system. In contrast, humans do have the requisite cognitive abilities. The "pheromone trails" we leave behind are the knowledge trails coded in symbolic knowledge artifacts. In contrast to the physical artifacts that ants leave behind, the knowledge artifacts that we leave behind are far more flexible and potent, both at the aggregate as well as at the individual levels. But like the ants, until recently, we did not have the means to step back and map the knowledge "pheromone trails" to obtain the big picture and its global/local dynamics. The burgeoning field of scientometrics is making available visualization tools to help us map and study the evolutionary dynamics of the knowledge network structures.
Data and Questions
The goals of this project include

  1. Extract the terms from approximately 1600 working papers published by SFI
  2. Map the intra/inter conceptual network structures
  3. Study the evolution of these structures across time
  4. High-light the gap-closure of knowledge reverse-salients (if any)
  5. Capture any of the network patterns that repeat
  6. Study the diffusion of concepts across the network
  7. Provide visualization tools for navigating the complexity corpus, etc

Possible methods
Latent Semantic Analysis (LSA) and Latent Document Analysis (LDA)

  1. SFI Working Papers:
  2. Atlas-Science-Visualizing:
  3. Atlas-Science-Visualizing WebSite:
  4. Mapping-Scientific-Frontiers-Knowledge-Visualization:
  5. Katy Börner presents at Science of Science:
  6. Scholarly Data, Network Science, and (Google) Maps:
  7. LSA Video Lect:
  8. What is LSA:
  9. LSA Wiki:
  10. LDA:
  11. Fusari, A. (2014). Methodological Misconceptions in the Social Sciences: Rethinking Social Thought and Social Processes:
  12. Kirsh, D. (2013). Thinking with external representations. Cognition Beyond the Brain:
  13. Holland, J. H. (1992). Adaptation in natural and artificial systems:
  14. Where to start with text mining.
  15. Singular Value Decomposition Tutorial
  16. Latent Semantic Analysis (LSA) Tutorial:


  • John Thomas (
  • Haitao Shang (
  • Sharon Greenblum (
  • Christopher Verzijl (
  • Nilton Cardoso (
  • Glenn Magerman (
  • Emilia Wysocka (
  • Laurence Brandenberger (
  • Matthew Histen (
  • Anna Zaytseva (
  • Song Binyang (
  • Penny Mealy (

Navigating Music, Brain and The Edge of Chaos ‏

Brain Sciences have revealed that we/it lives "on the edge of chaos" exhibiting "self-organized criticality" that is tentatively balanced between normalcy and madness. Over the course of history, humans have used various agents and activities to shape, influence and control this living-chaos ranging from substances such as caffeine, sugar, drugs etc., to activities such as the arts (including music), social-discourse/therapy, meditation etc. Of these, music has a distinct role in shaping our moods and helping us transition between different mental states, as well as maintain it for extended periods of time. Clearly we have been using music to help us control and shape the internal chaos. But until recently, the quantitative instrumentation of this massively complex system that comprises of close to a 100-billion neurons networked into a 1000-trillion synaptic edifice has not been available to the common man. But of late, affordable, wearable EEG's are available on the market, thus making the quantitative study of the influence of music in brain dynamics feasible on a large-scale/crowd-sourcing sense. To help come to terms with the complexity of our 1000-trillion synaptic edifice, we need to gather data on a vast scale. The proposed research is a proof-of-concept, exploratory foray into making this happen.

Data and Questions
The goals of this project include

  1. Evaluate and purchase a wearable EEG
  2. Set up the instrumentation for data capture
  3. Set up the experimentation/data-capture plan
  4. Recruit Subjects
  5. Perform Data Capture
  6. Analyze Results
  7. Propose pathways to take this to the market by embedding it as an app

Possible methods References:

  1. Hacking Your Brain Waves: Wearable Meditation Headsets:
  2. Measure your brainwaves and modify your mind:
  3. This wearable device reads your brain waves. Is there a market for it?:
  4. Your-brain-is-on-the-brink-of-chaos:
  5. Two Decades of Search for Chaos in Brain:
  6. How You Are Who You Are--in Chaos Theory:
  7. Diana Dabby Links:
  8. Diana Dabby Links:
  9. Liz Bradley/Diana Dabby Links:
  10. Music and the Brain:
  11. Why Music Moves Us:
  12. Measuring musical expressivity:
  13. The World In Six Songs:
  14. This-Your-Brain-Music-Obsession:
  15. World-Six-Songs:
  16. Computer Music and the Importance of Fractals, Chaos, and Complexity Theory:
  17. The_Complexity_of_Songs:
  18. Stefan-Koelsch Papers:
  19. Grammar Based Music Composition:

Four principles of bio-musicology, W. Tecumseh Fitch 2015
This paper is a prospectus for biomusical research. Main points:
i) musical is complex (yes!)
ii) questions must be asked from a Tinbergean perspective (i.e. mechanism, ontogeny, phylogeny and function),
iii) comparative between animals (not relevant for us), and
iiii) "ecologically motivated," i.e. not just Western skilled musicians.

Neurological implications and neuropsychological considerations on folk music and dance, Sironi V.A., Riva M.A., 2015
Calls for "Interdisciplinary research on these subjects (ethnomusicology and cultural anthropology, clinical neurology and dynamic psychology, neuroradiology and neurophysiology, and socioneurology and neuromusicology)"

History of music in 5 min


  • Sara Lumbreras (
  • Ilaria b (
  • Braun, Urs (
  • Emilia Wysocka (
  • J Bruineberg (
  • William Kurtis Chang (
  • John Thomas (
  • Christopher Verzijl (
  • Glenn Magerman (
  • Sahil Garg (
  • Daniel Hedblom (
  • Christine Harvey (
  • Vanessa Chioffi (
  • Daniel Friedman (
  • Sharon Greenblum (

Powerlaw fitting and alternative distributions - Theory/statistics

Clauset, Shalizi and Newman (2009) propose a maximum likelihood method to estimate the powerlaw exponent of a variable of interest. This is a great improvement on earlier methods such as OLS that dominated the literature up to then. However, one can fit a powerlaw to any dataset and the most we can say is that our observations are consistent with the hypothesis that x is drawn from a powerlaw distribution. One easily implementable method to compare the powerlaw fit to other fits is then a likelihood ratio test for both models.

One particular discussion is the distinction between a powerlaw (PL) and a log-normal (LN) fit. For an avid discussion between both fits on city size and Zipf's law, see Eeckhout (2004, 2009) and Levy (2009), where the discussion now settled on city sizes following a log-normal distribution instead of a powerlaw. Similarly for the discussion on firm size distribution: Simon and Bonini, 1958; Ijri and Simon, 1977; Stanley et al., 1995; Sutton, 1997; Axtell, 2001; Okuyama et al., 1999; Cabral and Mata, 2003; Gaffeo et al., 2003; Aoyama et al., 2004; Fujiwara et al., 2004a,b; Kaizoji et al., 2006; Takayasu et al., 2008; Duchin and Levy, 2008; Schwarzkopf and Farmer, 2008, ...

This distinction matters for several reasons: - PL and LN come from very similar model and differences in initial conditions can lead to very different outcomes. PL exhibits a choice for x_min, below which the unit of observation is not feasible to exist (eg minimum city size, firm size, word length, ...). LN has no minimum size. - PL and LN that look similar are the difference between infinite (PL) and large but finite variance (LN) - shock propagation: when unit-level shocks are large enough to show aggregate perturbances if the distribution is powerlaw with infinite variance, while these shocks wash out fast when the distribution is log-normal. (Gabaix, 1999; Gabaix 2009; Acemoglu et al. 2012, ...).

I have encountered some issues which I would like to explore further: 1. The distinction between lognormal and powerlaw in the data is very sensitive to data truncation: in the above discussions, researchers have slightly different datasets, covering more or less of the population at hand. Left-truncation (i.e. observations not in the dataset because they are too small to be reported) can strongly drive the outcome of the fit, even when endogenizing the x_min cutoff. I have data on the universe of Belgian firms, much more complete than e.g. US Census data, where I have done some preliminary tests on this. The question is then: how to formalise this distinction and what are the theoretical and practical caveats to look out for when applying this method. 2. MLE fitting seems to be sensitive to the choice of units as well: rescaling a variable by a factor 1000, 1000000 etc seems to influence the endogenous x_min choice and hence the estimated parameter. This reminds me of some work on scaling invariance in negative binomial estimators. What is going on here? 3. Can we set up a model that generalises both? I've been looking at Levy stable distributions, but did not do anything with it yet.

Interested people:

  • Corbain
  • Laura
  • Binyang
  • Sahil
  • Tirtha
  • Glenn
  • Junming

Literature: Powerlaw fitting in empirical data - Clauset, Shalizi and Newman (2009):

App design for interaction registration

I would like to see a simple smartphone app that can track connections being made between people at events. This would allow to map the evolution of a network at eg a network meeting, SFI 2015, social events etc. I know some people at MIT have been working on ID badges for nurses and doctors to track interaction in a hospital, and there are some business apps that show a plethora of features to enjoy a network event (eg like the Yapp app). However, it would be nice to have a simple app that just registers a link between people when their phones are close enough for a certain interval of time. Additionally, it might record some conversation as to create edge information as well. Unfortunately, I'm not a wizzkid and would need help from an apps programmer to work this out. If it is feasible, I think SFI CSSS 2015 would be a great test case!



  • Laurence Brandenberger (

    Organ Transplant Analysis

    Over 120,000 people in the United States are currently on the waiting list for an organ transplant. The size of this waiting list relates to over 6,000 deaths a year while waiting for a transplant and tens of billions of dollars in government spending. I have access to the following data sets:

    • All transplants performed in the US from October 1987 to June 2014 (including follow-up data)
    • All living and deceased organ donors in the same time period (with follow-up data on living donors)
    • Waiting list data for everyone who signed up for the list
    • 2012 National Survey on Attitudes and Behaviors on Organ Donation
    • Social media data relating to organ donation since 2008

    Open to ideas and suggestions for the topic, there are a lot of interesting questions to investigate including cultural/racial/gender differences in organ donation. I have several preliminary reports and exploratory analysis done on differences in donors.

    Second Meeting Thursday, June 11th 4:15 in the coffee shop!

    First Meeting Notes


    • Look into the differences in waiting time geographically and map how it changes over time.
    • Differences in cities compared to suburban areas.
    • What are the critical links to perform matches for multiple people.
    • Look at scaling laws in the data, if the number of donors grow, how does the number of successful transplants also grow. This includes the correlation between wait time and organ donation or size of waiting list.
    • Investigate trends and dynamical systems, review the following dynamics as a function of time:
      • Entry rate to the waiting list system
      • Exit rate from the waiting list (transplant, death, other)

    There is a Google Doc with space for notes. Please email Christine Harvey ( for access.

    Multi-dimensional social networks in the evolution, development and resilience of informal economies.

    Informal economies are defined as economic activities that occur outside the purview of corporate public and private institutions. These types of economies proliferate where traditional economic actors are unable to productively exercise their activities, especially due to costly constraints (De Soto 2003). Firms may face adverse incentives to expand production by hiring more workers or incorporating more capital due to the low productivity of their workers or the predatory practices of rapacious elites or corrupt governments. Labor itself, i.e. people, may face hurdles in trying to make the jump towards entrepreneurial activities or jobs in the formal economy which allow the accumulation of experience, retirement savings, access to insurance or precautionary savings to face unexpected events (like disease, disability, etc.) due to lack of capital access (whether human and financial).

    For this project, we propose a different interpretation: We seek to understand and model how multi-faceted social networks provide robust alternatives to formal economies, which far from being seen as degenerate forms of social organization, in many instances co-exist, challenge and compete with formal employment and economic activities. While some types of informal economies operate under adverse contexts, their resiliency may be understood as a type of adaptive fitness and not the mere result of stubborn cultural path-dependency.

    For the most part, informal economies have been analysed as counterpoint to an ideal type of a formal economy with low levels of trust, respect of property rights, access to credit and public institutions - like the court systems (see Losby et al. 2002 for a review). However, informal economies have been coupled with their formal counterparts since the development of capitalism in the 19th Century. To name a famous historical example, Old London's East End's informal sector was described harshly by Engels (1844) in his famous tract, "The Condition of the Working Class in England". But almost fifty years later, in a new preface to the book, Engels recognized the progress that took place there under the aegis of working class organizations in the area.

    Research has begun to catch on to this idea – developing a literature in the areas of informal risk sharing and remittances, which are in some sense informal counterparts to insurance and banking. Consumption patterns in poor, rural villages are remarkably smooth suggesting that risk-sharing measures are prevalent, although imperfect (Townsend). Field studies of networks underlying village risk sharing systems have found that households primarily receive help from existing social connections, such as friends and relatives, in the form of informal loans or transfers. Theoretical work in network economics, by authors such as Matt Jackson, found that certain empirically prevalent network structures may directly benefit the stability of favor exchange systems. Another line of inquiry focuses on remittance income, generally informal transfers across kinship ties. World Bank researchers have found that remittances from overseas migrants respond dramatically to regional income shocks, replacing upwards of 60% of lost income in households with international migrants. Further studies, cited below, have generalized those results. Furthermore, reductions in the cost of sending remittances, which occurs with the advent of mobile money, further mitigates exposure to risk in receiving households.

    Also recently, authors have written on how communities come together in situations where formal economies are unavailable by external shocks (like disasters) or due to lack of access (due to conditions of abject poverty). In the case of the latter, Venkatesh (2008) provided gripping testimonies of how informal economies arose and developed in the South Side of Chicago in low trust environments via cash transactions and informal service contracts enforced by (and for the benefit of) local criminal gangs. These gangs were seen, surprisingly, as alternative coordinating mechanisms to settle claims between neighbors. In the case of the former, Storr and Grube (2013) in a series of papers has argued how "shared histories and perspectives, and the stability of social networks within the community" allows communities who suffered disasters to cope and endogenously resolve immediate and complex problems. Providing an example of these social networks in action, research has found that remittances flow quickly into areas affected by natural disasters when there is a technology in place for it to happen.

    Taking these lines of inquiry together, the findings suggest that informal handling of risk takes place on a large scale and that social networks informally connect communities in ways that impact their economies. Recent trends suggest that the interaction of formal and informal economic processes is growing on a nationwide scale. First, the aggregate value of remittance flows is large and growing, nearing the value of foreign direct investment. Second, advancements that formalize previously informal transactions are expanding dramatically in emerging economies through various forms of branchless banking. National economies may be embedded in profoundly influential informal systems that have never been holistically studied.

    On a more general note, this project touches lingering questions in economics. For example, some might argue that more stringent labor regulations should have increased informal employment, but in fact the opposite happened. And while the economic rationale for the former statement remains valid, we suggest that unions, as other types of social organizations, as examples of ways whereas people interact via organized networks, provide richer dimensions than those suggested by their interaction as mere economic agents. Hence, unions, as well as other types of faith-based, ethnic, community and other interest groups may provide ways of interaction that escape narrow economic outcomes.


    • De Soto, Hernando (2000/2003) The Mystery of Capital. Basic Books. • Losby, Jan; Else, John; Kingslow, Marcia; Edgcomb, Elaine; Malm, Erika and Vivian Kao (2002) The Informal Economy: A Literature Review. ISED Consulting and Research and the Aspen Institute Working Paper. • Townsend, Robert M. "Risk and Insurance in Village India." (1994) • Fafchamps, Marcel, and Susan Lund. "Risk-sharing Networks in Rural Philippines." (2003) • Weerdt, Joachim De, and Stefan Dercon. "Risk-sharing Networks and Insurance against Illness." (2006) • Matthew O. Jackson, Tomas Rodriguez-Barraquer and Xu Tan. “Social Capital and Social Quilts: Network Patterns of Favor Exchange.” (2012) • Yang D and H Choi."Are Remittances Insurance? Evidence from Rainfall Shocks in the Philippines"(2007) • Kurosaki, Takashi. "Consumption vulnerability to risk in rural Pakistan." (2006) • Jack, W, and T. Suri. "Risk Sharing and Transactions Costs: Evidence from Kenya's Mobile Money Revolution” (2014) • Engels, Friedrich (1844/1892) The Condition of the Working Class in England. • Venkatesh, Sudhir (2008) Gang Leader for a Day: A Rogue Sociologist Takes to the Streets. Penguin Books. • Storr, Virgil and Laura Grube (2013) The Capacity for Self-Governance and Post-Disaster Resiliency. George Mason University, Department of Economics Working Paper 13-37. • Blumenstock, Joshua Evan and Fafchamps, Marcel and Eagle, Nathan. “Risk and Reciprocity Over the Mobile Phone Network: Evidence from Rwanda” (2011). • Ratha, Dilip. "Workers’ remittances: an important and stable source of external development finance." (2005). • Pénicaud, Claire, and Arunjay Katakam. State of the Industry 2013: Mobile Financial Services for the Unbanked. Rep. N.p.: GSMA MMU, 2013.

    If interested, please list your name below.


  • Eloy Fisher (
  • Carolina Mattsson (
  • Sharon Greenblum (
  • Jakub Rojcek (
  • Exploring Community Formation through Analysis of Scholarly Corpus (ArXiv)

    The Arxiv [1] is a free online repository of scientific preprint articles, mostly from physics and mathematics. It currently contains over 800,000 articles, dating back to 1991. Currently, a lot of the data associated with this repository is completely available: paper submission dates; full texts of papers; author names and coauthorship information. Additionally, there is some citations data available. (If necessary, I can also find papers' subdiscipline labels; submitting authors' email address domain names; other things.)

    In the past I have been using these data to try and explore how communities of authors who have shared interests grow over time. For example, we can pinpoint the first ever paper about topological insulators, and search for all subsequent papers on that topic. As more papers are written, authors begin to join the field of research and to form strong ties by collaborating with one another. We can use the ArXiv data to visualize and analyze exactly how these communities form and grow.

    Brainstorming notes

    Isolation between topics, crossing interdisciplinary boundaries, finding (topological) separation distance between disciplines or research groups

    Tipping points - can we look for separation of or mergers of two groups or disciplines?

    Can we identify key papers (or groups of papers) that initiate ties between fields.

    Sentiment analysis: can we identify when a paper's citation is an endorsement or a refutation?

    Tools for Analysis: Change point detection; relational event models

    Can we find more comprehensive/better citation data?

    Lucene and indexing tools: - Can we re-index our text database after removing stop words? - Can we index the titles and abstracts? - Elastic Search - tool for interacting with Lucene

    This set of text-processing tutorials is pretty handy for background info and inspiration. The software package doesn't simply plug-into a Lucene/Solr/Elastic Search index, but it could be done. This package from Standford seems very capable.

    Next meeting

    Thursday @ 9AM in Coffee Shop

    Political Speech

    Some references:

    Taddy (2013), "Multinomial Inverse Regression for Text Analysis"

    Blei and Lafferty (2007), "A Correlated Topic Model of Science"

    Genomic variations from a chaotic mapping

    Following Dabby CHAOS 6 (2), 1996 Musical variations from a chaotic mapping, this project will explore genomic variations in key metabolic genes that are known to be widely spread across bacterial phylogeny (e.g. the Nif cluster which is used in nitrogen fixation). There are several ways that genomic data can be treated as "music":

    1) Nucleotide: 4 notes - A, G, C, T
    2) Amino Acids: 20 notes
    3) codon triplets: 64 notes
    4) tetra nucleotides: 256 notes.

    One possible goal is to take a seed sequence from some random organism and generate new variations. Then using homology search (e.g. BLAST) or phylogenetics, determine if that variation exists in nature? If not, perhaps model the potential protein folding (if possible?) to determine if that variation could exist in nature. This is VERY preliminary and can go in many directions.

    If interested please list name below or contact Jarrod at



    Preliminary Discussion Meetings

    Free Energy Theory

    The Free Energy (FE) minimisation framework tries to explain how biological systems (such as a cell or a brain) self-organise in order to occupy the (often very limited number of) non-equilibrium states that minimise free energy. This is also known as active inference. A simple corollary of active inference is that agents behave as to minimise their prediction error, or the difference between prediction and reality. Thermodynamic free energy is a measure of energy available in a system to do useful work. This can be framed in an information theoretic setting, as the difference between how the world is being represented and how it actually is. A better fit means a lower information-theoretic free energy, as more resources are being put to ‘good use’ in representing the world. The overarching logic of FE theory is that a better model of the world help maintain structure and organisation, which ultimately helps the system resist increases in entropy.

    This is not a set-in-stone project with any concrete aim (yet). We are a few people interested in exploring the theoretical and practical implications of these ideas, and you're more than welcome to join in!


    • Jelle Bruineberg ( )
    • Tobias Morville (
    • Susanne Pettersson (
    • Maggie Simon (
    • Sahil (
    • Tirtha (
    • Anna (


    Improving the design of the power grid using our knowledge about network structure

    Context Given all the renewable energy generation that is being installed and the increasing levels of uncertainty about the future power system, power transmission expansion planning is becoming more and more challenging. There is a lot of literature being published in the field, but it always applies"blind" techniques to the design, such as optimization where the possible lines to add to the system are represented as binary variables. This leads to optimization problems that are too large for real networks. As part of the European Comission FP7 project e-Highway, my team and I have developed methods to reduce the complexity of the network and work with a smaller system.

    I would like to explore a different avenue. Maybe it is possible to describe the structure of good network designs in terms of global parameters. For instance, how does the degree distribution look like for efficient power networks? Then, we could feed that information into the optimization problem, reducing the search space.

    I have data originating in a European project from the FP7 programme.

    There is lots of interest in this field, a lot being published, lots of money going into projects and many research grants. Nobody seems to be looking at it from a structure perspective though.

    To know more...
    I uploaded a ppt with some initial ideas to my page on the wiki.

    Interested: 'Sara ' 'Carolina' 'Alice' 'Federico' 'Jean Gab' 'Sola' 'Ilaria' 'Daniel T'

    Decision support/network analysis of a complex socio-ecosystem in rural Zimbabwe (Melissa -

    10:45am in the senior common room (the room behind our lecture hall)

    Ebola virus disease spread (Junming -

    11:00am in the coffee shop

    Scaling effects in bodies, communities, ecosystems (Cobain -

    11:30am in the coffee shop

    Also tying in prehistoric hunting populations

    Dynamics of homicide (Matthew Ingram -

    Integrating temporal, spatial, and multi-level concepts

    1:30pm in the coffee shop

    Am interested in the discussion on this project...Sola Omoju

    Interested - Nilton Cardoso

    Organ Transplant (Christine -

    2pm in the coffee shop Interested: Sahil Garg

    Multi-dimensional social networks in the evolution, development and resilience of informal economies

    9:00am in the lecture hall!!

    Eloy & Carolina -,

    Multiplex Adaptive Networks (Daniel -

    7 pm in the coffee shop

    Modeling brain diseases (or cancerous bio pathways) (Sahil -

    Interested people can put a meeting time as per their convenience here.
    No meeting time indicated
    One potential meeting time can be 3pm in coffee shop ?
    Information theoretic algorithms can also be explored for the problem.
    Discovering Structure in High-Dimensional Data Through Correlation Explanation.
    Maximally Informative Hierarchical Representations of High-Dimensional Data.


    • Emilia Wysocka (
    • Laura Condon (

    Analysis of UK parliament speeches 1935-2014 (Stefano -

    No meeting time indicated

    Mapping Complexity/Human Knowledge as a Complex Adaptive System (

    2pm/Wed 6/10 in Conference Room (Tentative)

    Resource allocation trade-offs (Andre -

    Using evolutionary algorithms to investigate trade-offs in allocating goods among agents. This could be potentially done on some real dataset (if you have any) or somehow parametrized synthetic data. Also this could be envisioned as a strategic bargaining between agents, which would introduce some dynamics into the process.

    Email me or sign-up if interested and we'll setup a meeting time.

    Meeting Thursday 9:30, coffee shop

    Interested: Sola Omoju, Christine Harvey (

    Navigating Music, Brain and The Edge of Chaos (

    10:45am/Wed 6/10 in Conference Room 9:00am/Thu 7/10 at the Coffee Shop

    Interested: Sahil Garg

    Rule-based modeling for brain diseases (molecular level) (

    Rule-based modeling features:

    • biological systems as concurrent processes
    • dynamics of post-translational modifications
    • domain availability
    • competitive binding
    • causality and intrinsic structure
    • binding sites
    • interaction rules replace reaction equations
    • infinite number of reactions with a small and finite number of rules
    • reduction of parameter space
    • “don't care don't write” - adjustable rule contextualization
    • single reaction rule and parameters generalize classes of multiple rules
    • modular and extensible language
    • specification language & simulation/integration environment
    • static and causal analysis
    • Kappa/KaSim & BioNetGen/NFsim -- specification language/network-free simulator


    • WEDNESDAY 11/06/2015

    Addressing problems in terms of the other aspects of the projects apart from the biological questions and verification (matching results to some published experiments or known behaviour). So my plan is as follows:

    • divide the group (roughly) into people who look in to biological meaning and validation and the one which tries to do the analysis of system phenomenons and evolution of stochastic, combinatorially complex signalling systems in both a qualitative (directed acyclic graphs, networks) and quantitative way (time series generated for all agent/species formed/destroyed in the system) - all possible states, scenarios of the system abstracted from the biological meaning.

    Things that could be modelled (look in dropbox folder:

    • spontaneous flipping of interactions (phosphorylations and others) between proteins, described by the god of non-linear dynamics (S. Strogatz)

    Some refs:

    Ecology non-working group (williamkurtischang at gee-mail dot com)

    Informal ecology interest group.