Complex Systems Summer School 2018

Projects

Characterizing the spatiotemporal transmission dynamics of smallpox in the United States prior to eradication

Small pox is a highly contagious infectious disease eradicated through vaccination and social-distancing interventions. However, the city-to-city spatial transmission of smallpox is not well characterized. Understanding how smallpox moves between cities can have important implications for understanding how re-emerging vaccine-preventable infections, such as measles, can potentially spread, and subsequently controlled in the future.

This project aims to apply a metapopulation model to weekly case data from a number of cases in the US to estimate the rate of transmission between cities, determine if certain (i.e. larger) cities seeded epidemics to others (i.e. traveling waves), characterize any synchrony of epidemics across geographic regions, and to examine the effects of vaccination on transmission.

Suggested papers

Grenfell BT, Bjornstad ON, Kappey J. Travelling waves and spatial hierarchies in measles epidemics. Nature 2001;414:716- 23.

Potential data

Project Tycho (data repository of MMWR notifiable diseases: https://www.tycho.pitt.edu/dataset/US.67924001/)

Interested Participants

talia

Meeting time

friday @ SFI after 1st lecture (10:00 am)

Understanding and creating music

This project has two direction:

1) Understanding music from a complex system point of view
2) Creating new music via neural style transformation

The two directions are not separated, if lucky enough, we hope to see them feeding each other :)

Understanding music from a complex system point of view

General Idea

Music is definitely very complex. It is a combination of time (eg. melody) and space (eg. harmony structure across instruments). With all beautiful music in the world including profound and somewhat mathematic ones like Bach as well as inspiring ones as Beethoven, from rock and roll to electronic music, we don’t have a lot of understanding in them.

In this project, we aim to understand music from a complex system point of view, whether we could define the “style” for each music genre or era and composer, or whether we could quantitatively analyze the structure of a music piece. Music is composed with note sequences of different “layer”, including temporal information as well as notes interacting each other in time. Though there are only finite number of notes available, but the sequence it generated is infinite. Mathematically, music could potentially be described as a “network”, but a very complex one which is temporal, multilayer, higher-order(dyad may not be the best representation here).

One kind of a detailed idea/question: Using network theory including multilayer networks, higher-order networks and temporal networks, could we figure out how each music genre differs from others and how each composer become characteristic?

Novelty: Representing music as a network is not new, however, among the literatures, there is not many representing music as a network which is temporal, multilayer, potentially higher-order, which would add a whole new level of complexity in the study.

Relevant papers

Me and my friend have done a very simple course project related to this, where we cluster 330 classical music pieces and found they corresponds to music era. We also found Bach fugues has distinct look using some matrix: link to paper
Some one in Italy did this, the thing I don’t like is that he abandoned the time information in music, which is vital: link to paper
Complex network structure of musical compositions: Algorithmic generation of appealing music
There are also work done on relationship between music and psychology: link to paper
Scaling in music! Multiple scaling behaviour and nonlinear traits in music scores
A Music-generating System Based on Network Theory
Complex Networks of Harmonic Structure in Classical Music
Complex network approach to classifying classical piano compositions
Musical rhythmic pattern extraction using relevance of communities in networks

Neural style transfer in music styles via interacting agents

General idea

A) learn generative models of different music styles using neural networks.
B) let these networks ('agents') interact and see what `fusion' music styles result.

Novelty lies in having the a) multiple agents learn multiple styles independently then letting them exchange information in a meaningful way (probably the trickiest bit) and b) letting these fusion music styles evolve in a network etc. and see what "world-music" results at the end for example.

Details will come...

Relevant papers

neural style transfer for images (make images look like Van Gough paintings etc.) : https://tinyurl.com/ybpq5agm
neural nets for music: https://tinyurl.com/yb2qdqbq and http://imanmalik.com/cs/2017/06/05/neural-style.html, https://magenta.tensorflow.org/performance-rnn
bunch of theories of how music styles are results of combination: https://tinyurl.com/y723ugyo
music recommendation using neural networks (from Spotify): http://benanne.github.io/2014/08/05/spotify-cnns.html#predicting, https://papers.nips.cc/paper/5004-deep-content-based-music-recommendation

Potential Data

MusicNet, lots of information for each pieces, but only 330 pieces and biased on composers
MIDI corpus
- Largest midi collection on the internet
- MIDI world

Packages to handle MIDI/music (based on python)

Python-based toolkit for computer-aided musicology: music21
Mido is a library for working with MIDI messages and ports
Python MIDI, not maintained in a good way though...

Thoughts?

For the generation one, we could also use text corpora instead? Shakespeare etc., creating like Shakespear + Tolstoy for example :D
...

Interested Participants

(Please denote your background and your potentially interested direction, or providing a ranking if interested in both: A.understanding B. generating)

Yuki
Vandana
Xindi (good at network science, data mining, a little bit machine learning, ranking: 1.A 2.B)
R Maria
Kevin
Priya
Ricky (multilayer networks, machine learning, data mining. ranking: 1A 2B)
Chris

Optimal representations of high dimensional data in deep learning and biological systems:

What is the best way for a system to represent very high dimensional data? For example, how should the retina encode visual stimuli in neuron firing patterns? How does the immune system encode the space of antigens it might encounter? In each case, it would not be feasible (or efficient) to create a unique tag for each input. Rather, the systems in question must decide which features in the stimuli are most relevant, and trade off between specificity and generality.

Along these lines, there are two more specific questions to investigate:

-It has recently been conjectured that the success of deep learning networks is related to their optimization of a specific informational quantity in each layer https://arxiv.org/abs/1710.11324. Unfortunately this paper is not very clearly written, but basically the idea is that when binning inputs into representations, the distribution of bin sizes should be given by a specific power law, which optimizes the aforementioned information measure. Do biological systems employ the same strategy? With access to the right data, this idea should be straightforward to test. For example, if we have a list of antibodies together with the set of antigens they react to, we can compute this quantity and see whether the antigen "bins" are indeed distributed according to the predicted power law.

-A diverse collection of biological systems that are faced with this task seem to be well-modeled by maximum entropy distributions, with a constraint on pairwise correlations and parameters (i.e. lagrange multipliers) set near a critical point https://arxiv.org/pdf/1012.2242.pdf. This has been applied to the previously given examples of the retina and the immune system, as well as flocking in birds. As far as I know, it is not yet known with certainty whether this kind of encoding scheme is optimal in some sense (like in the previous bullet), or if it is an artifact of our own inference methods, but I think the answer is interesting either way. An immediate question is, if these maximum entropy models are a powerful tool for humans to model high dimensional systems, might biological systems also be producing their own maximum entropy models of environmental variables? That is, are maximum entropy models with constraints on pairwise correlations optimal in some information-theoretic sense, which can be made precise? For example, would this be a particularly useful way to model the distribution of natural images one might encounter? While less straightforward than the previous bullet, I think these are questions well-suited to the skills of the people here, and I think we could make significant progress!

If anyone has expertise to offer, your feedback/participation would be very much appreciated! In particular, I think this project would greatly benefit from those of you that have knowledge in machine learning and biology (my own area is physics and information theory). Feel free to email me at e.stopnitzky@gmail.com

Thoughts? Recommended Papers?

A genetic model with the resulting protein products could also be useful here (e.g. looking at expression levels and/or variants in a particular gene or set of genes as it pertains to the protein(s) coded by the aforementioned gene(s). In sum, can we find/demonstrate an algorithmic basis for gene expression and/or protein coding? - Kofi

1. Castillo et al. "The Network Structure of Cancer Ecosystems." SFI WORKING PAPER: (2017)

Interested participants:

- Kofi Khamit-Kush (Background in Biology, specifically Cancer Genomics and data mining). kkhamitk@gmail.com
-Jacob
- Yuki
- Alice
- Sarah B. (experience with sequencing data/gene expression)
- Subash

The Emergence and Evolution of Legal Systems as Pertaining to Water Distribution

General Idea

There are numerous legal systems that have been identified, broadly categorized into large families – Common Law (Anglosphere and Commonwealth nations), Civil Law (Romance Language nations, Germany, China), Islamic law (most Muslim nations), Customary Law (India, sub-Saharan Africa). More importantly, most nations do not purely lie in one category, but tend to combine elements of multiple systems, either due to merging (i.e. German law combining Germanic tradition with Civil traditions), or through subsidiarity (i.e. Louisiana having Napoleonic law, despite being in a Common Law nation). We are interested in determining how these legal systems by nations and states emerged, influenced each other, and interact over national boundaries.

This is an immense task, so to scope it, one idea has been to limit this project to laws pertaining to water distribution. This is of particular interest when looking at states of nations that have different legal systems, such as Louisiana in the U.S., Quebec in Canada, and Scotland in the U.K. For international interactions, sub-Saharan African nations might also be of value in assessing, as many nations border nations with different legal systems, and water is often a scarse resource in these areas.

If anyone has interest in this topic, and/or expertise in either legal systems or water distribution, feel free to sign up or discuss.

Recommended Papers

Energy and Efficiency in the Realignment of Common-Law Water Rights, Carol M. Rose, The Journal of Legal Studies 1990 19:2, 261-296
Theories of Water Law, Samuel C. Wiel, Harvard Law Review, Vol. 27, No. 6 (Apr., 1914), pp. 530-544

Interested Participants

1. Kevin Comer
2. Cedric Perret 3. Chris Fussner 4. Jared Edgerton

Academic hiring networks

General idea:

I am thinking about doing something around academic hiring networks in different disciplines and to play around with idea of multilevel networks (e.g. look at the interplay between different institutional norms in various disciplines and hiring dynamics). Also, would be cool to have a look on interplay between publishing / hiring networks. We could also explore other ideas related to the academia theme like exploring factors that excellence / equality tradeoffs, or factors that promote gender balance in science, etc.

Who would be interested?

I've created the channel #hiring_networks at slack.

Literature:

*  A. Clauset, S. Arbesman and D.B. Larremore. 2015. Systematic inequality and hierarchy in faculty hiring networks. Science Advances 1(1), e1400005 (2015).

Interested Participants

1. Evgenia (Background in social network dynamics, psychology and organisation science)
2. Ricky (Background in multilayer networks, network resilience, machine learning)
3. Allie (Background in networks, science of science, gender)
4. Carlos Marino.(Background in network optimization)
5 . Andrea (Background in networks, multiplex and multilayer networks, information theory)
6. Sanna (Background in networks and various social science disciplines)
7. Conor (Background in information theory)
8. Subash (Background in information theory - transfer entropy in specific, experimental design)
9. Patricia (Background in modeling dynamical systems, agent-based modeling, experience working on academic search committees)
10. Javier
11. Saska

Make deep neural networks more biologically accurate by including inter-neural travel times

General idea:

Make deep neural networks more biologically accurate by including inter-neural travel times. Train with some normal task like digit-recognition.

Motivation:

Currently, deep neural networks only share some similarity to actual neurons: threshold behavior and hierarchical representations.
However, in real neural networks, signals travel with finite speed and activations are integrated over time
This ignored aspect could be one reason why real neuronal networks/brains are superior
Further connecting the two fields of neuroscience and deep learning would be pretty cool
We could use the "regular" neural network machinery to optimize weights etc for tasks like forecasting/image recognition and then see whether we find neural avalanches and chaotic behavior etc.

Details (first ideas):

In artificial neural networks, different neurons are connected by weights. To this, we add another connection between the neurons: the inter-neuron travel time.
The inter-neuron travel time is computed by a RNN
inference works by letting the network oscillate/ come to an equilibrium
activation of neuron i at time t: a_i(t) = sum_over_connnected_neurons [f(a_j(t)) * delta(rnn(j->i)-t ) + exp(-lambda*t) f(a_i(t))], where delta is the Kronecker delta.
I.e. the signal from connected neurons arrives at the time specified by the RNN and then slowly decays with exponent lambda
if the RNN just gives t=1 for all travel times, this essentially reduces the normal deep neural net output.

Evolution of social norms as a process within or between societies

General idea

Currently, there are two ideas floating around:

How do social norms evolve *within* a society? Method-wise this is perhaps be related to the spread of ideas/information on a social network. The agents in this network are people. Potentially relevant models: Opinion formation, infectious (disease) spread and/or games on social networks.
Think of a whole society (group/tribe/nation/etc) as an agent. A society may adopt or discard various social norms over time. If one of the chosen social norms (or a combination of the chosen combination of social norms) is woefully impractical it decreases the "fitness" of the whole society and it loses members/power/resources/territory to competing societies. Potentially relevant models: Models for evolutionary game theory.

The project can focus on (1), (2), or both.

Branch: Agent Based Models and System Dynamics

This branch seeks to use the 2 tools of ABMs and SD to further understand how social norms emerge through individual interaction from the bottom up(ABM) and how governing mechanisms then influence and shape those social norms from the top down (SD). Ideally this will even allow individual agents to select between emergent social norms and governing institutions which then further influences the feedbacks and system behavior.

The current challenge is finding a parsimonious construct and identify the key elements of this model to create the desired dynamics and analyze the subsequent behavior.

Interested in Branch : Tom

Recommended Papers

For (1):

Ostrom, Elinor. "Collective action and the evolution of social norms." Journal of economic perspectives 14.3 (2000): 137-158.
Sethi, Rajiv, and Eswaran Somanathan. "The evolution of social norms in common property resource use." The American Economic Review (1996): 766-788.
Centola, Damon, et al. "Experimental evidence for tipping points in social convention." Science 360.6393 (2018): 1116-1119.

For (2):

Powers et al, "How institutions shaped the last major evolutionary transition to large-scale human societies" Phil. Trans. R. Soc. (2016)

Interested Participants

Alice, Vandana, Alan, Xindi, Jenn, Matt, Sandra, Kevin, Alex, Cedric, Thushara, Subash, Josefine, Tom

Topological features of neutral networks in evolution

Introduction

In a genotype network, nodes are genotypes and a link from genotype A to genotype B indicates that they are separated by a single mutation. Each genotype has a phenotype associated with it. In a fixed environment, a phenotype is associated with a fixed fitness value. So for every node, one has:

GENOTYPE -> PHENOTYPE -> FITNESS VALUE

The fitness values form a "fitness landscape", in which one can embed the genotype network. The set of nodes in a genotype network that corresponds to the same fitness value are a *neutral network*. These networks have received little or no attention from network scientists. Let's change that!

General idea

Depending on the interest of participants, this project could focus on (1) data analysis or (2) network theory.

(1) Szendro et al. mention that empirical data for genotype networks and their neutral networks is available. This is a somewhat new development (<10years). One could scout for one or several available data sets and study the topology of the networks. For example,

what are topological characteristics of genotype networks? Can these characteristics be explained by constraints of embedding on a curved manifold? (One could compare data to random graph models, e.g. Erdos-Renyi, small world, or geometric random graph models.)
how are neutral networks for high or low fitness values different?
one could also think of the genotype network as a multlayer network with a lot of layers ... and analyse its topology from a multilayer perspective.

(2) A neutral network is a "level-set network" in the genotype network. The genotype network is a network that is embedded in a curved manifold in a high-dimensional space. There is so much cool math/physics/topology that one could do with this!!

Recommended Papers

https://en.wikipedia.org/wiki/Neutral_network_(evolution)
Szendro, Ivan G., et al. "Quantitative analyses of empirical fitness landscapes." Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.
De Visser, J. Arjan Gm, and Joachim Krug. "Empirical fitness landscapes and the predictability of evolution." Nature Reviews Genetics 15.7 (2014): 480.
Kondrashov, Dmitry A., and Fyodor A. Kondrashov. "Topological features of rugged fitness landscapes in sequence space." Trends in Genetics 31.1 (2015): 24-33.

Interested Participants

Alice

Ricky

Carlos

Sarah B.

George

Luca

Kofi K. (background in cancer genomics, data mining, and bioinformatics tools) kkhamitk@gmail.com

Networks from thresholded normally distributed data

Observations:

- real-world networks are often created by thresholding dyadic interaction;
- lots of things are approximately normally distributed.

Idea:

- Suppose for each pair of nodes, i and j, there is a normally distributed interaction: x_ij ~ Normal(0,1);
- Then, we place edges between nodes i and j whenever x_ij>threshold;
- Edge correlations could be controlled by a single parameter, i.e. Cov(x, y) = beta .

Conjecture:

- The resulting degree distributions have two limiting forms, and are approximately Poisson or power law(ish) (+ exponential cut-off), with something intermediate inbetween (log normal?)

Things to look at:

- Can we solve for the degree distribution of this model?
- Does this degree distribution look like real networks? Can we fit the model easily (e.g. maximum likelihood or method of moments)
- What about the giant component phase transition?
- Does clustering vanish in the limit of large network size?

This would be a more mathematical/theoretical project, and less about real world data.

Interested participants:

George (background in physics and networks)
Alice
Yanchen
Guillaume
Jordan
Conor

The Evolution of Beliefs in Abrahamic Religions

General Idea

One commonality across all Abrahamic faiths – Judaism, Christianity, Islam, and others – is its reliance on the written word to solidify and codify beliefs, even centuries after the text was documented. Because of this large time difference between when documents were written – Torah, New Testament, Qu’ran – and when these beliefs grow and evolve, decisions are often linked to other texts as justification for the decision. For instance, when Ecumenical Councils declare a new testament of faith, they often point to previous texts from church fathers for justification (or sometimes non-believers, like pre-Christian Greek philosophers). Similarly, when imams declare testaments of faith, these are often linked to the hadiths and sirahs as justification. Canon law and Islamic law is based on these two dynamics respectively. Religions often influence each other, both as attractors (Islam prompted Iconoclasm in Eastern Christianity) and repulsors (Early Christianity set itself in opposition to Judaic practices, despite being considered a Jewish sect).

Recommended Papers

Interested Participants

1. Kevin
2. Carlos Marino
3. Pete K.

City as a Complex System: Clustering/Mobility Network Effects

Introduction

Cities are complex systems within which many sub-systems develop, interact and evolve. The understanding of how different systems within a city interact and connect with each other can help inform better urban planning decisions, to support different communities and ecosystems.

Through this study, we aim to gain insights on human choices, development of ecosystems, and spatial distribution in a city. Data from Singapore is available as a case study.

Research Questions

Some possible research questions are listed below, but feel free to add on any ideas related to this topic and we can discuss how to go from there!

Possible research questions (open to more ideas!):
a. Business Clustering & Flow of Capital (human and/or monetary) between Industries

Motivation: To better distribute jobs closer to homes, a polycentric structure can be developed to establish multiple employment nodes in different areas of a city. Understanding of how business ecosystems develop and factors to support successful business/industrial clusters can help inform the strategies to establish and facilitate the growth of polycentricity.

Are there clustering effects for business/industries across different sectors and how can this be measured/analyzed
What are the implications of clustering on the performance of businesses and industries?
What are the drivers to facilitate a sustainable business ecosystem?

b. Intra-city Public Transport (PT) Mobility Patterns

Motivation: The study of people’s PT mobility patterns within a city allows us to understand human movement, choice and interactions. Through understanding mobility patterns in relation to the built environment and demographic make-up of different areas, we can gain insights to the human-environment relationship. This facilitates the formulating of more informed policy decisions and urban planning strategies to cater to the needs of the society on a local and macro level.

To study how PT mobility patterns within/across towns differ
Relationship between PT mobility and factors such as town demography profile, job/worker ratio, and land use mix

Interested Participants

Shantal, Alex, Jared, Sanna, Kevin, Chris, Sarah B.

The Evolution of Water Narratives in US Newspapers

Motivation

The complex interactions between physical and social factors in water management have led to the emergence of a new field in socio-hydrology. Various dynamics are studied by socio-hydrologists including the influence of economics, culture, and institutions on behaviors related to water. This study focuses on improving our understanding of the evolution of social narratives in the water domain.

Data

We have access to 500K+ newspaper articles (across 37 publications over 15 years) from the LexisNexis database that touch on water to some degree shape or form.

General Approach

The data provides a lot of opportunities for play! Currently, we are thinking of exploring a variety of natural language processing (e.g., word2vec and sentiment) and network evolution techniques to help us characterize and understand the evolution of narratives. There are of course other directions the project can evolve. If you are interested, please put your name down below and join us on slack (#waternewspapers) to be a part of the conversation!

Recommended Papers

I think it would be useful to think about how the newspapers and their narratives are affecting the capacity for collective action around shared pool resources... this might be useful towards that: https://dlc.dlib.indiana.edu/dlc/bitstream/handle/10535/1344/Marelli_119601.pdf?sequence=1

Interested Participants

Thushara, Saska, Jenn, Inga, Xindi, Yuki, Sandra, Eleonara, Javier, Kevin, Allie, Vandana

Reproducibility and Underdeterminacy in Mathematical Modeling

Motivation

The reproducibility crisis has shaken the scientific world as many findings have failed to replicate in new experiments and datasets. At the same time, the rise of highly accurate predictive machine learning methods challenges the notion that we need deep scientific understanding in order to make predictions about the world around us. Will developing scientific theory still be necessary, or practicably justifiable, if we can just get enough data?

General Approach

There are several dimensions of this tension that we could explore:
- We think of physics as the area that achieves the highest degree of predictive accuracy among all sciences. Can deep learning predict physical scenes more accurately than physical models? If not, perhaps there is still hope for science. If so, perhaps we need to rethink either the practice of physics or the authority of prediction.
- Data is often brought to bear when trying to provide evidence for a mechanistic model. In order to establish firm evidence, strong alternative hypotheses must be specified. Yet in many cases, alternative models are not even compared, or when alternative models are compared, those alternatives are weak strawmen. Can typical datasets actually uniqely identify mechanistic models among alternatives? One approach to answering this question is to generate data according to known model (e.g., from published papers), and see if an analyst who does not know the true model can infer it, or to see if multiple different models provide equally good accounts of the data.
- If we want to hold out hope that our scientific models are useful for prediction in the face of machine learning, perhaps we can productively combined structured scientific models with less structured machine learning approach, e.g., by predicting model residuals. Do structured models actually help in such a joint model above and beyong machine learning alone?
- Can collecting richer data, such as finer-grain neural data or interviews / ethnographies in social science, help resolve any indeterminacy we identify, or would having richer data simply make machine learning more effective as an alternative?

Some additional random thoughts (in defense of science) by Jonas:
- a lot of the flexible (low inductive bias) function approximators need a lot of (labeled) data; is this realistic in all/many/some scientific disciplines? For instance, in psychology there are fundamental limits to how many subjective measurements one can take of an individual on a given day, both in frequency and number of variables p; in such situation adding more inductive bias (e.g. through understandable parametric models) is possibly a good idea
- convenience samples vs. samples from a proper sampling scheme (probably not a big problem in classifying cat pictures, but maybe a bigger problem in more contextualized phenomena)
- observational data vs. experimentation
- predictive models (function approximation) vs. causal models (building a model of the world); Pearl argues for the latter and against the former in his new book (but didn't read it)
- in many situations it is not so interesting to predict variables, but to be able to come up with useful interventions on them; this is difficult in a black box approximators in which parameters do not map to concepts we have about the real world

Relevant Papers

Kleinberg et al. (2017) The Theory Is Predictive, but Is It Complete? An Application to Human Perception of Randomness
Youyou (2015) Computer-based personality judgments are more accurate than those made by humans
Pearl and Mackenzie (2018) The Book of Why

Slack Channel

link Slack

Interested Participants

Pete K.,

Alice

Jonas

Classifying language by grammatical motifs

General Idea

Every once in a while when we get people from different countries sitting around a table (at CSSS for example!) and then we come across words, idioms, or concepts that we can't accurately translate from one language to another. There a lots of words that exist only in one language but not in others. Consequently, there are lots of concepts that exist only in one language but not in others. In this project, let us explore the differences between language by "higher-order grammatical structure", not just single words.

We can take a sentence and think of its structure as a small network (also called motif) of words. Nodes are subjects and objects that are linked via verbs of prepositions (see for example https://en.wikipedia.org/wiki/Object_(grammar) ). Taking a text and counting the reoccurence of sentence structures, we can get a distribution of motifs. Let us explore if we can use this distribution of motifs to characterise different texts. Comparisons could be

between texts in different languages
between British and American English
between texts for different purposes (fiction/novels, news, scientific writing, policy, etc.)

Given the diversity of the CSSS crowd, this would be a unique opportunity to work on the comparison of texts in different languages!

The main challenge would be to develop a text mining algorithm that can give us the motif distribution for a text (in a given language). This project could benefit from

expertise in computational linguistics, text mining, and machine learning
a diverse team of people who speak different languages.

Recommended Papers

???

Interested participants

Alice Vandana

Structures in Open Source Software Communities

General Idea

A lot of open source software projects organize through mailing list. This mailing list interactions in combination with for example data from github could give some insight in how those groups organize. Possible interesting questions could include:

How does the project size influence the structure.
What members collaborate more/less?
Who collaborates on specific code pieces?
How does communication behavior influence the position of contributers in the community? (sentiment analyses? )
... your ideas ...

Existing Work in this Field?

Useful Methods

Data Sets

linux kernel https://lkml.org/lkml/2016/

Interested participants

Maria W Cedric P

Measuring information distortion in networks (rumors/fake news)

General Idea

Analyzing analytically, numerically and experimentally how information get distorted in networks when passed between people. The network is layered (people in one layer pass the message to people in the next layer). In-degrees and out-degrees are fixed (1,2,3...)

Possible parameters: error rate, degree, length of chains, number of agents, speed of news propagation (internet vs newspapers etc.)

Interested participants

1. Javier
2. Pete
3. Zohar
4. Cedric
5. Guillaume
6. Allie?
7. Yuki
8. Jonas
9. Yanchen
10. Jarno
11. R Maria
12. Josefine
13. George

Data Sets

Safe Cast: Radiation and air quality data primarily for Fukushima and Tokyo https://blog.safecast.org/downloads/

Measuring epigenetic effect of stress at a macro scale

General Idea

Epigenetic processes describe environmental effects on genome expression/regulation which are transmitted to the next generations. In particular, recent research indicates that stress in human can have transgenerational effect. Can these epigenetic effects can be detected in data at a macro scale, for instance after a global stressful crisis (world war, etc..) ?

Relevant papers

1. Israel Rosenfield and Edward Ziff. "Epigenetics: The Evolution Revolution" The New York Review of Books (2018)

2. McGuiness et al. "Socio-economic status is associated with epigenetic differences in the pSoBid cohort" International Journal of Epidemiology (2012)

2. Uddin et al, "Epigenetic and immune function profiles associated with posttraumatic stress disorder". Proceedings of the National Academy of Sciences (2010)

3. Borders et al. "Chronic stress and low birth weight neonates in a low-income population of women." (2007) DOI: https://doi.org/10.1097/01.AOG.0000250535.97920.b5

4. Miller GE, Chen E, Parker KJ. Psychological Stress in Childhood and Susceptibility to the Chronic Diseases of Aging: Moving Towards a Model of Behavioral and Biological Mechanisms. Psychological bulletin. (2011). doi:10.1037/a0024768.

5. Jack P. Shonkoff, Andrew S. Garner. "The Lifelong Effects of Early Childhood Adversity and Toxic Stress." Pediatrics. (2012), DOI: 10.1542/peds.2011-2663

Interested participants

1. Cedric P
2. Sarah B.
3. Chathika G.
4. Simon J.
5. Kofi K (background in bioinformatics, data-mining, behavioral psychology, microbiology)

Data Sets

1. ???

Robustness of the presidential information cascade on Twitter

General Idea

How does information dissemination change when Drumpf blocks other users on Twitter?

Interested participants

Alice

Topology of natural conversations

General Idea

Everyone who belongs to a Whatsapp political discussion group (or any other discussion group regarding a specific topic) knows that consensus is difficult to reach. People seem to go back and forth in their arguments trying to convince others of their own views. Looks like a dynamical system to me! I would like to use what we learned from Joshua's talk and what we will learn from Simon deDeo's lectures to represent each text sent as a point along a one dimensional opinion continuum. The state of the conversation can then be represented as a point moving along the state space composed of every person participanting in the conversation. Is there an attractor? is it a strange attractor? What is its topology? How does that topology look like when people are arguing versus when they are planning or simply chatting? Hit me up if you are interested!

Interested participants

1. Niccolo (proponent)
2. Cedric
3. Yuki

Scaling of information requirements in living things

Information about the environment is a resource that organisms must take in and process to survive, just like energy/nutrients. Inspired by West's talk, I wonder how this requirement might scale as a function of mass. Bacteria sense chemical concentrations in their environments, while more advanced organisms process increasingly sophisticated kinds of information (visual, social, and so on). However, we can simply ask how many bits per unit time are required by various creatures. By analogy with the principles underlying metabolic scaling, I would guess that bigger organisms are able to do more with less because larger networks might allow for greater processing power. On another level, innovations in processing like the emergence of nerves and brains might change that picture.

The nice thing about this project is that I think it ought to be relatively easy; if we read enough existing papers I think we should be able to produce reasonable estimates of information requirements, and there will be a story behind the answer one way or another.

Thoughts? Recommended Papers?

Interested Participants

1. Elan (proponent)
2. Yanchen
3. Subash
4. Kofi K (background in (bioinformatics, data-mining, microbiology & genomics)
5. Conor (background in information theory)

Game of Coins: Developing a Robustness Analysis Tool for Decentralized Cryptocurrency Networks

Game Theory and Decentralized Governance Models

Changing the Data Paradigm: New Models in Data Ownership

Information Asymmetry in Distributed Systems: A Common Currency

Summary

Creating a tool that is based on a set metrics derived from available network data that would determine robustness and health of public decentralized cryptocurrency networks.

Since the inception of Bitcoin in 2009 there has been a huge rise in the development of decentralized networks (and centralized networks), with each Coin there is a network behind that Coin. However since some (not all) of these networks are p2p based there are user thresholds that make certain networks (Coins) viable and secure (51%, DoS, Sybil ect).

Bitcoin is described as the most robust and secure financial network amongst cryptocurrency networks however there are thousands of other networks competing for some sort of slice of the market.

Of these other networks battling each other, (Bitcoin is generally categorized as a payment network), there are many viable use cases for decentralized networks (Coins) beyond payment networks:

Decentralized data market place
Tokenized securities
Governance models
Stable digital currency
Lending
Distributed computing

Potential Data

Example of a decentralized open source coin explorer: http://explorer.threeeyed.info/info

https://coinmetrics.io/data-downloads/
https://onchainfx.com/
https://bitinfocharts.com/
https://coin.dance/nodes
https://dappradar.com/

Suggested Literature

*New P2P Paradigm: https://www.hindawi.com/journals/misy/2018/2159082/

Metcalfe Law in regards to Network Value: http://novel.ict.ac.cn/zxu/JournalPDF/Zhang_JCST_2015.pdf
Governance Model Overview: https://blockchainconsultants.io/blockchain-governance-models/
Governance Article of just one blockchain (Decred): https://www.cryptocompare.com/coins/guides/a-look-at-decreds-governance-system/
Article on Tokenized Securities: https://medium.com/@apompliano/the-official-guide-to-tokenized-securities-44e8342bb24f

Thoughts? Questions?

Possibility of doing other projects related to cryptocurrency? Data is widely available for decentralized networks.
Segmenting into difference governance models
Energy Consumption and GPU sells metrics/modeling

Interested Participants

1. Jared Edgerton
2. Laura Mann
3. Alice Schwarze
4. Chris Fussner
5. Louisa Di Felice

Twitractors: What kind of non-linear dynamic attractrors exist across OSM discussions

Online social media discussions center around emotion-driven exchanges of information on current topics that participants often have considerable social and cognitive investment in. Typically, the participants on these discussions have both opposing and supporting views , leading to emergence of collective effects such as polarization or information cascades. The result is a "heartbeat" of emotion, signifying the global collective emotion among society regarding the topic under discussion.

In this project, we will explore this collective "heartbeat" over many topics on Twitter through non-linear time series analysis.

Join the discussion at #Twitractor on slack

Available Datasets

Twitter Firehose data with sentiment analysis.

Interested Participants

1. Chathika
2. Laura
3. Subash
4. Evgenia

Social Networks and International Relations

Summary

This project draws from the logic of Paul Hooper's research on cooperation dynamics in communities and the fractal and scalar presentations. I think the interactions between countries follow similar social dynamics as families, hunter gatherer groups, organizations, and within countries. I would be interested in simulating conditions under which countries cooperate. I think there are clear analogs to periods of colonization, WWI, and WWII. Also, this approach would be novel to international relations research.

Potential Data

I thought this would be modeled with ABMs and referencing historical periods.

Interested Participants

1. Jared Edgerton

Fluctuations in correlated data, random variables or models

When estimating observables (e.g. parameters) from datasets we need to quantify the error associated to our estimation in order to decide whether or not our estimation is statistically significant. In sets of correlated data, the correlations may produce fluctuations that affect the error of our estimators. In this project we are interested in studying how the fluctuations depend on the sample size in different sets of data, simulations or models that the participants bring. In particular, when the fluctuations are anomalously suppressed, this phenomenon is known as hyperuniformity. The fingerprint of these systems is the suppression of fluctuations on large scales, manifesting a regularity that is not apparent on short scales. It can be found in systems of any dimensions, examples are jammed packing systems, crystal-like materials and some biological tissues such as the chicken retina.

Some literature:

foundations&examples: Torquato S. and Stillinger F. H., Phys. Rev. E, 68 (2003) 041113.
hyperuniformity in jammed particle systems: L. Berthier, P. Chaudhuri, C. Coulais, O. Dauchot, and P. Sollich, Phys. Rev. Lett. 106, 120601 (2011).
hyperuniformity in chicken retina: Jiao Y., Lau T., Hatzikirou H., Meyer-Hermann M., Corbo J. C. and Torquato S., Phys. Rev. E, 89 (2014) 022721.
hyperuniformity in an avalanche model: Garcia-Millan, R., Pruessner, G., Pickering, L., & Christensen, K. (2017). Correlations and hyperuniformity in the avalanche size of the Oslo Model, arXiv preprint arXiv:1710.00179.

Understanding Cardiac Dynamics in Health and Disease (#cardio)

Motivation

Arrhythmias (abnormal electrical activity of the heart) are common cardiac diseases and are amongst the most common causes of impaired quality of life and death. I am particularly interested in two of the most complex cardiac arrythmias namely 1. atrial fibrillation (disorganized electrical activity in the upper chambers of the heart -i.e. atria- not lethal but very disabling) and 2. ventricular fibrillation (disorganized activity in the bottom part of the heart -i.e. ventricles- that is lethal). We have a minimal understanding of the mechanisms of these arrhythmias and our current therapeutic strategies (namely medications, implantable cardiac devices that can deliver electrical therapy and ablation procedures where we intentionally destroy heart tissue in specific areas of the heart) are relatively ineffective. The lack of effective treatments largely reflect the lack of our understanding of the fundamental mechanisms responsible for these arrhythmias.

General Ideas

1. I have intracardiac recordings of patients that are in atrial fibrillation before and after a therapeutic procedure. These are spatiotemporal data of simultaneous recordings from 64 locations inside the heart. We could use these data to develop creative ways to either (a) understand the dynamics of the system and specifically phase transitions and changes in spatiotemporal structures (b) develop markers that predict the success of the procedure, (c ) identify locations inside the heart that would serve as "hot-spots" or would be critical for sustainment of the arrhythmia.
2. I have several toy models of cardiac arrhythmias. These models are simulations of reaction diffusion models (specific for cardiac dynamics) that give rise to solutions such as stable periodic activity, spiral waves, or wave breakdown with multiple daughter wavelets. These could be used for a more theoretical assessment of spatiotemporal phase transition.
3. Should any of the methods that we might come up ends up working, I plan to scale it up to large animal models and clinical (human) studies, in the near future and I would welcome your collaboration.

Specific Projects

1. Representation of intracardiac recordings as networks using horizontal visibility graphs: we plan to analyze both synthetic (simulation) data as well as real patient data. Our preliminary plan is to develop such networks and compare network characteristics between different states.
2. Use Koopman analysis to get an insight in the dominant spatiotemporal patterns that govern the dynamics of healthy and diseased heart rhythms. Similar to above we plan to analyze both synthetic (simulation) data as well as real patient data.

Interested Participants

1. Konstantinos (Cardiology, Translational Research)
2. Antreas (Mathematics)
3. Anastasya (Physics)

List of all available datsets

Safe Cast: Radiation and air quality data primarily for Fukushima and Tokyo https://blog.safecast.org/downloads/
Good Judgment Open scraped data: crowdsourced geopolitical forecasts from the Good Judgment Open website. Ask Niccolo
Twitter Firehose data with sentiment analysis (Plutchik 's emotions and OCEAN personality traits) https://gitlab.com/caslab_ucf/Public/TwitterSenitmentLive.git . Ask Chathika for more info.

Exploring Income Inequality From a Game Theoretic (or Other) Perspective:

Many economic markets are fundamentally unfair and lead to high level of inequality. This has consequences for how people's opinions of fairness and trust develop and evolve. Data shows that an american citizen's likelihood of making their way from the bottom to the top is lower than that of citizens from other advanced countries. Data also shows that children born into "rich" families are more likely than not to remain rich. Literature also shows very strong demographic variations.

Thoughts? Recommended Papers?

Here is some relevant literature: https://www.jstor.org/stable/pdf/3088921.pdf?refreqid=excelsior%3A1839833f8090beb4f9e3f37e55cbf6c0 http://web.mit.edu/14.193/www/WorldCongress-IEW-Version6Oct03.pdf https://arxiv.org/pdf/1406.6620.pdf http://cailinoconnor.com/wp-content/uploads/2015/03/CRKE-2.pdf

One idea is to consider a evolutionary game theoretic model that considers a stratified market (stratified into different income levels). Within each stratum, you could have various groups of agents corresponding to different demographics. The model could include some systemic barriers that may be unique to certain demographics. Agents could be self-interested, altruistic, spiteful, etc.

A non-game theoretic model could also work, so this is quite an open problem. If anybody else is interested in discussing this further, please contact Priya.

Another approach could be agent based modeling.
Some literature:
1. http://yildizoglu.fr/macroabm2/Submissions/15-Russo_et_al_Inequality_ABMacro.pdf
2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4430112/

Interested participants:

- Priya
- Carlos Marino

Archived Projects ("Parking Lot")

This section is for projects that we decide not to continue with. Maybe they're ideas that can be picked back up later (hence the "parking lot").

Using Principles from Complex Systems in Thinking about AGI Development

AGI = Artificial General Intelligence, a catchphrase for "smarter-than-human" AI, a very misleading phrase which basically means algorithms which are generally capable of performing a wide range of tasks with high efficacy without being explicitly programmed to do each task.

For now, this is intentionally vague to keep open the various possibilities and gather together those who are interested. The project would move beyond current ML techniques, though, and either build on those techniques in significantly novel ways, propose new techniques, or consider from a theoretical standpoint how to design and train an agent (without specification of the implementation) which can perform a broad range of tasks "intelligently" and is aligned with human interests. An important focus is on ensuring alignment (doing what humans would want it to do), which is for various reasons quite hard to do both technically and philosophically.

There are two ways to use complex systems principles:

In the design and training process of the algorithm
In understanding how an algorithm will interact with the world around it

Specific project ideas:

Building in an adaptive mechanism for an agent to adjust its input-output map as the dynamics of its environment change
Using insights from various evolutionary processes to design a learning process that can produce an intelligent and aligned agent (either using existing AI techniques, or being implementation-agnostic and considering an arbitrary agent)

Feel free to add your name below, and any project ideas above! If we get a few interested people we can meet tonight or tomorrow.

Interested Participants:

Luca Rade
Nam Le

Complex Systems Summer School 2018-Projects & Working Groups

From Santa Fe Institute Events Wiki

Projects

Characterizing the spatiotemporal transmission dynamics of smallpox in the United States prior to eradication

Suggested papers

Potential data

Interested Participants

Meeting time

Understanding and creating music

Understanding music from a complex system point of view

General Idea

Relevant papers

Neural style transfer in music styles via interacting agents

General idea

Relevant papers

Potential Data

Packages to handle MIDI/music (based on python)

Thoughts?

Interested Participants

Optimal representations of high dimensional data in deep learning and biological systems:

Thoughts? Recommended Papers?

Interested participants:

The Emergence and Evolution of Legal Systems as Pertaining to Water Distribution

General Idea

Recommended Papers

Interested Participants

Academic hiring networks

General idea:

Literature:

Interested Participants

Make deep neural networks more biologically accurate by including inter-neural travel times

General idea:

Motivation:

Details (first ideas):

Evolution of social norms as a process within or between societies

General idea

Branch: Agent Based Models and System Dynamics

Recommended Papers

Interested Participants

Topological features of neutral networks in evolution

Introduction

General idea

Recommended Papers

Interested Participants

Networks from thresholded normally distributed data

Observations:

Idea:

Conjecture:

Things to look at:

Interested participants:

The Evolution of Beliefs in Abrahamic Religions

General Idea

Recommended Papers

Interested Participants

City as a Complex System: Clustering/Mobility Network Effects

Introduction

Research Questions

Interested Participants

The Evolution of Water Narratives in US Newspapers

Motivation

Data

General Approach

Recommended Papers

Interested Participants

Reproducibility and Underdeterminacy in Mathematical Modeling

Motivation

General Approach

Relevant Papers

Slack Channel

Interested Participants

Classifying language by grammatical motifs

General Idea

Recommended Papers

Interested participants

Structures in Open Source Software Communities

General Idea

Existing Work in this Field?

Useful Methods

Data Sets

Interested participants