Difference between revisions of "Complex Systems Summer School 2016-Projects & Working Groups"

From Santa Fe Institute Events Wiki

Tags: mobile edit mobile web edit
(No difference)

Revision as of 19:39, 14 June 2016

Complex Systems Summer School 2016


Network and language dynamics of Reddit (Inter-community conflict and anti-social behavior in social media)


I have a cleaned dataset of all Reddit ( comments from 2009 through 2014 (Reddit is a very popular social media forum with >30 million users; it is organized into thousands of user-moderated "subreddits", which are topically-focused communities). The data includes all the text of the comments, usernames, the upvotes, and the thread structures (so networks can be constructed). There are many interesting questions that could be investigated with the data, and I would love to hear ideas! (The data is totally public, and I am willing to share even if I don't work on the project).

I would be really interested in at looking at the network and language properties associated with inter-community conflict and anti-social behavior, but my current thoughts are very high-level (any concrete modeling suggestions are greatly appreciated). For example, I think it would be really interesting to measure the network and language dynamics of brigading/trolling (e.g., how do trolls behaviors differ between their "home" communities/subreddits and the communities/subreddits they are attacking?)

I have done a lot of pre-processing and preliminary analysis on a manageable and very clean subset of all 2014 comments for 1200 mid-to-large subreddits (about 50Gb). My background is in natural language processing and network mining, and I have lots of preliminary analysis/machinery for measuring linguistic signals in the data.

Group Contact

Will Hamilton (

Interested participants, please sign up below

The assembly of plant-pollinator networks


networks, mutualisms, ecology, restoration, succession, assembly


I have a 10 year dataset of ~1500 observations of pollinators (bees, flies, butterflies, wasps etc.) visiting plants in a native plant restorations (hedgerows) in the Central Valley of CA. The assembling communities are paired with unrestored field margins (controls) and mature (non-assembling) hedgerows. The goal would be to examine how and why the structure of the network is changing through time. How are the individual species changing their interaction patterns? What does this mean for the topology/resilience of the network? There is also a spatial dimension (the meta-population dynamics of the networks?) that could be explored.

For more information on the dataset, please see

Group Contact

Lauren Ponisio (

Interested participants, please sign up below

Ryan McGee (

Modeling prestige good economies


Prestige goods, in a nutshell, are goods whose value is in conveying social status/ranking, and there is a great deal of speculation about how they contribute to increasing social complexity. It would be interesting to explore these dynamics via an ABM.
A couple of ideas:

1) Kantner from the Santa Fe School of Advanced Studies has an interesting game-theory model on how people decide to invest in prestige goods. He includes some data on turquoise in archeological contexts in the Chacoan culture ( It would be interesting to a) implement this as an agent-based model, and b) explore some additional dynamics.
2) How does a prestige economy respond when the prestige goods all come from outside and the supply dries up?
An interesting case study (and what inspired the question) is the Late Yayoi-period Japanese islands (roughly 100-300 CE), a network of chiefdoms closely connected to the Korean peninsula (as a source of ore) and the Chinese mainland (as a "tributary state" of the Han Dynasty). The chief status markers in Western Yayoi society were Chinese-produced bronze mirrors and swords -- so that when the Han Dynasty fell apart and the supplies of these goods dried up, we see evidence in the archeological record of upheaval (including an attempt to make homegrown replicas of the prized items). This may also have contributed to the subsequent formation of more complex chiefdoms. It would be interesting to put together an exploratory model of this (don't have data, at least not for the Japan example).
I know some folks at SFI have been working on the dynamics of prestige.

Group Contact

Ellen Badgley (

Interested participants, please sign up below

  • Simon Carrignon: I wrote already a general ABM to study that kind of things, but would be really happy to start from scratch something new in python or whatever!
    • Awesome! I started reading your paper and it looks like a great starting point - maybe we could extend the model to distinguish between common goods (subsistence-level, go away after each season) and permanent prestige goods that convey social value, as well as looking into the implementations of prestige on agent actions. If you would like to continue working on this for CSSS let's talk.

Tracking the migrations of urban hipsters


This is linked to the "Viscosity of Labor" question on the MITRE Challenge Questions list - see the separate link lower down.
In brief: looking at labor categories from US Census collected data and observing how they shift across spatial/temporal dimensions. Does this change with the scale of the urban area, and is there a link to income or do the labor distributions hold static?

Group Contact

External to CSSS (but here in Santa Fe): Matt Koehler (
Attending CSSS: Ellen Badgley (,

Interested participants, please sign up below

A preliminary model of the coupled human-natural system of swidden agriculture


Swidden agriculture, also known as Slash-and-burn is about as old as agriculture itself. It exists in diverse variants practiced by 200 to 500 million people in different regions of the world; all of these forms involve the slashing and burning of portions of forest, hence its name. There has been a historic controversy regarding swidden agriculture, with some publications presenting it as a destructive force that contributes to global deforestation and other publications highlighting its sustainability and ecological benefits when practiced as a means of subsistence. This controversy shows the need of tools to further research and assess the benefits and costs of swidden agriculture.

This project idea will address the following research question: How do human activities interact with the ecological landscape and the sustainability of swidden agriculture?

This project idea aims to produce a simple, preliminary model with the following components:

  • A simplified social network of swidden farmers exchanging agricultural labor, inspired by Downey (2010)
  • A landscape in the form of a 2-D grid in which cells are patches of forest / crops

Possible model outputs include:

  • Yearly harvest
  • Biodiversity
  • Yearly biomass production in forest and fallows
  • A representative indicator of the "net production" of the complete system
  • Sustainability of the complete system

Biodiversity, biomass production, and sustainability will need reasonable, simple definitions that follow modern literature on the subject.

Software and language

Every idea is worth discussing. NetLogo might be an option that fits with our limited calendar. Git and GitLab would be used to manage the source code.

Group contact

Fabio Correa <>

Interested participants, please sign up below

  • Julia Adams
  • Ellen Badgley (potentially, depends if anybody jumps on the MITRE spatial viscosity thing)

Modeling a City ('s Traffic?) In-Silico


I have a few datasets laying around that I cleaned for my thesis (and I’m not using for that purpose) and a few more that I obtained but never cleaned. All of them are data for the city of Madrid (Spain) on various geographical and time scales/resolutions, most of them related to social/environmental/demographic variables.

The one I’m most interested in using (and I have not cleaned yet) is a dataset of traffic intensity in Madrid. This uses sensors placed in most traffic lights (~3600), and the dataset provides count of vehicles (and associated % capacity used) every 15 minutes, from 2013 to March 2016. Some of the data is messy (but they have a flag for unreliable data). This data is freely available at (look for “intensidad de trafico”).

Not sure how to insert an image here, but here's a link to a plot created with 45 days of this data on a single intersection. You can guess when the Holy Week and Holy Thursday/Friday happened this year: here

I somehow have the sense that using this data may be a cool experience, but I have not figured out yet what to do with it.

I have cleaned data on sociodemographic/socioeconomic characteristics at the census section level (~1500 people), geocoded commercial spaces data and more stuff that we can add to this. This also includes historical data on all properties (including houses, commercial spaces, etc.)

Contact me (or talk to me anytime!) if you are interested or want to discuss some ideas! Usama Bilal <>

Interested please sign up below

  • Ellen Badgley - I'm interested but don't have a question to go with this yet - will think about it more!

Can we use metabolic networks to predict the next beneficial mutation?


Richard Lenski (who is also associated with SFI) has evolved an coli population to use a new sugar source, and tracked their changes for 40000 generations. This is a widely studied dataset (>200 papers published on it already...) in one of the most widely studied biological system but there is still a lot we do not understand about it. For example although we know what genes evolved during the experiments, we do not know why it is these particular genes that changed. I think it would be cool to try to use genetic network (which is also very well understood) to try to understand how new mutations rising in frequency changes the performance of a bacteria, and how the previous state of performance changes what the next mutation should be. The genetic interactions are pretty well mapped in e.coli.(Regulondb) Also they claim that the performance plateaued, but the mutations that accumulated were still beneficial in the following paper. How did that happen? (Genome-wide Mutational Diversity in an Evolving Population of Escherichia coli) From a more computer science perspective: 40000 generation for an iterative process doesn't seem like very long. Does the network structure of metabolism pathways allow rapid adaptation? What can we learn from these networks to apply to computer science problems? These ideas are pretty raw... but I think something evolution related, that looks not just at a property of a system but also allow the system to change would be really cool. My email is Chenling Xu <>.

Interested please sign up below

  • Ryan McGee (

Data Sets

MITRE Data Sets

The two data sets we have access to are Defense of the Ancients 2 (DOTA 2) and Polish Power grids. To access the data please contact Juniper she has it on a hard drive. If you have any specific questions about the data you can contact Matt Koehler at


MITRE Challenge Questions and Powergrid Data Overview

MITRE Challenge Questions Overview