Research Experiences for Undergraduates 2015-Potential Mentors: Difference between revisions

Revision as of 16:38, 14 April 2015

Research Experiences for Undergraduates 2015

A complete list of resident faculty list and our postdoctoral fellows can be found here

Potential Mentors and Projects

Cristopher Moore, SFI REU PI, Santa Fe Institute, Resident Faculty

Mentors: Cris Moore and Pan Zhang
The idea of statistical significance is tricky in networks. Do these three definitions of statistically significant community agree with each other?
1. communities stay the same when a few edges are added or removed (robustness)
2. communities help predict missing edges (if we train the algorithm on the edges we know about)
3. the statistical physics definition: communities correspond to states of low free energy (low energy and high entropy)

Nihat Ay, Santa Fe Institute, Resident Faculty

Andrew Berdahl, Santa Fe Institute, Omidyar Fellow

Tanmoy Bhattacharya, Santa Fe Institute, Resident Faculty

Simon DeDeo, Santa Fe Institute, External Faculty
Beyond the Mind-Virus: the Cultural Phylogeny of Anti-Semitism
Simon DeDeo / simon@santafe.edu
(external faculty at Santa Fe; professor of Informatics and of the Cognitive Science department at Indiana University)

In any society, much of what passes for disinterested information is anything but: what individuals perceive to be {\it prima facie} reasonable facts are often complete fabrications. Their lack of veracity is compensated for by their social function.

One of the most persistent and perilous examples of this is anti-Semitism, the hatred and fear of the Jewish people. The varying yet correlated clusters of beliefs associated with anti-Semitism have been leveraged and incited by governments and mass political movements and have been a defining feature of global culture for thousands of years.

Far from being the simple ``mind viruses and memes described by Richard Dawkins, anti-Semitism is rather a network of prejudices and paranoias that draw on everything from envy and fear of political elites to hygenic, moral and even sexual disgust. Varying in time and place, they force us to go beyond simple population-genetic models of cultural transmission, to look at epistatic interactions among syntactic and semantic variations.

What we'll do:

We'll try to understand the cognitively complex development of anti-Semitism by analysis of multiple editions of some of its most virulent and persistent documents, including such texts as the Protocols of the Elders of Zion and Conspiracy of the Six-Pointed Star. We'll draw our data from core texts drawn found in the Hathi Trust, a large-scale library digitization project of which Indiana University is a member.

We'll learn some core tools for the modeling of natural language texts, including "topic modeling", a sophisticated statistical tool to extract large-scale regularities in word co-occurrence. Topic modeling has been used for everything from tracking signal emergence in the French Revolution to the study of Charles Darwin's reading habits. Here, it will allow us to decompose some of the themes -- and stylistic patterns -- of the texts, and thus the key ideas in play. Topic modeling should allow us to construct a geometry of anti-semitic ideas (for students with a taste for abstract mathematics, there is a sub-project here in the differential geometry of information theory with anti-semitism as a sample space).

Given successful completion of stage one, and based on these results, we will attempt an ambitious reconstruction of the evolutionary history of anti-semitism in the 19th and 20th Centuries. We will use phylogenetic reconstruction to track the complex branching, fragmentation, and synthesis of the multiple threads that define this terrifying phenomenon, and graph clustering algorithms developed by other members of the institute such as the message passing algorithms of Zhang and Moore.

Getting clear scientific answers to even the most basic of the questions addressed here may have profound importance not only for the cognitive, social, and political sciences, but for policy makers and those working to understand and manage a global world where rapid communication and idea-sharing can lead to both great good and great ill.

An ideal student would have some experience with programming (e.g., Python) and some comfort with mathematical reasoning; we can teach on the job and ahead of time. Most importantly a prospective researcher should have a keen interest in the development of culture and how it interacts with the human mind. In a Summer of intense work on a topic of profound political importance, we can expect plenty of unexpected results.

The researcher on this project will work with Simon DeDeo, who will be in residence from June 6th through August 16th and able to communicate beforehand on potential readings and places to begin. This work is in collaboration with Cassidy Sugimoto, professor of Information and Library Sciences, also at Indiana. Prof. Ken Stanley, an expert in evolutionary algorithms, will also be involved in this project, but will only be on site for the first few weeks of the REU.

Vanessa Ferdinand, Santa Fe Institute, Omidyar Fellow

Stephanie Forrest, Santa Fe Institute, External Professor

1. Understanding and evolving software diversity:

Neutral landscapes and mutational robustness are believed to be important enablers of evolvability in biology. We apply these concepts to software, defining mutational robustness to be the fraction of random mutations to program code that leave a program’s behavior unchanged. Test cases are used to measure program behavior and mutation operators are taken from earlier work on genetic programming. Although software is often viewed as brittle, with small changes leading to catastrophic changes in behavior, our results show surprising robustness in the face of random software mutations. Depending on the student's interest, the REU project could either involve learning to use the GenProg software and running new experiments, or it could involve developing theoretical models based on earlier work in the biological literature.

2. Understanding and modeling large-scale cybersecurity threats:

Cyber security research has traditionally focused on technical solutions to specific threats such as how to protect desktops and mobile devices against the latest malware. This approach has greatly enhanced our ability to defend against specific attacks, but technical improvements will not be sufficient on their own. Today’s cyber issues involve social, economic, organizational, and political components, which are often pursued in isolation from technical reality. Forrest and Prof. Robert Axelrod (Univ. of Michigan) are collaborating on project that aims to address that gap by focusing on the problem of reducing risks arising from cyber conflicts, especially those among state actors. Our current plan is to focus on the attribution problem, that is, once a cyberattack has been discovered, what does it take to hold a particular actor responsible? Depending on the student's interest and background, this project could entail statistical modeling, developing game-theoretic or decision-theoretic models (e.g., when is it advantageous to plant false flags), to researching the current (public) state of the art for attributing cyberattacks.

3. Forrest and Moses are collaborating on a project to examine how ant colonies and immune systems form distributed information exchange networks to search, adapt and respond to their environments. The is characterizing search strategies quantitatively in terms of how information flows over networks of communicating components, in this case, ants or immune cells. It will measure how information improves performance, measured in terms of how quickly a colony finds seeds or the immune system neutralizes pathogens. By studying in detail how distributed interaction networks guide search in two distinct systems, this project aspires to formulate a general theory describing how decentralized biological networks are organized to search, respond and adapt to different environments, and how they effectively scale up to large sizes. As the research has progressed we have begun to test our theoretical understanding of distributed search strategies by implementing them as search algorithms in robotic swarms. This demonstrates practical applications as well as providing a controlled experimental system in which to test theoretical predictions.

Mirta Galesic, Santa Fe Institute, Professor

Josh Grochow, Santa Fe Institute, Omidyar Fellow

Title: Exploring the space of algorithms, or, finding novel algorithms for hard and easy problems using genetic algorithms

Mentor(s): Josh Grochow

Despite nearly half a century of algorithm development, our understanding of algorithms is still shockingly poor. For example, how long does it take to multiply two n-digit numbers? The state of the art is nearly O(n log(n)) steps, much better than the standard way we are taught in elementary school. Yet it's consistent with current knowledge that this could be done in only O(n) steps - i.e., that multiplying two numbers is just about as easy as adding them! No one knows the right answer. Such issues are even worse for the famous P versus NP problem, which essentially asks whether brute-force search can always be improved upon. Unlike multiplying numbers, virtually no progress has been made on this problem in the half-century that it's been around.

Part of the reason for our paltry understanding of these questions is that, despite thousands of new algorithms, we've actually only explored a tiny corner of the space of all possible algorithms. This was brought home with the discovery in the early 2000's of so-called holographic algorithms - classical algorithms, inspired by quantum computing, that take advantage of surprising cancellations to efficiently solve problems that no one would have thought could be efficiently solved.

In this project we propose to explore more of the space of algorithms by several methods. We'll mention one such method here, and can discuss others in person if you're interested.

We propose to use novelty-seeking genetic algorithms to help find examples of truly new classes of algorithms. Given that genetic algorithms have long been used to search for new and better algorithms, what's special here? At least three things: (1) in the last decade or so there have been great advances in using genetic algorithms to search for novel solutions, rather than to directly search for better solutions; whether or not this finds better algorithms, in looking to further explore the space of algorithms, novelty is a key aspect; (2) the methods used in these novelty-seeking algorithms, such as indirect encoding, match up well with frontier problems in computational complexity, and (3) we propose to apply genetic algorithms to problems that we think may not have efficient solutions, such as NP-hard problems. If the genetic algorithm fails to find a good algorithm, we hope to still extract some insight from the bad algorithms it finds. Our hope is that we'll either find genuinely new algorithms for these problems, or that we'll gain theoretical insight into how hard these problems truly are.

Laurent Hebert-Dufresne, Santa Fe Institute, Postdoctoral Fellow

Alfred Hubler, Santa Fe Institute, External Professor

Eric Libby, Santa Fe Institute, Omidyar Fellow

John Miller, Santa Fe Institute, External Professor

John Pepper, Santa Fe Institute, External Professor

Project Proposal:

This project is to refine and implement a new evolutionary heuristic algorithm for a classic NP-hard spatial optimization problem called the “Traveling Salesman Problem” The implementation could be done in any object-oriented programming platform with good spatial graphics, so we could decide that part together if you’re interested. -Faculty mentor: Dr. John W Pepper, external faculty

John Rundle, Santa Fe Institute, External Professor

Sam Scarpino, Santa Fe Institute, Omidyar Fellow

The dynamics of Ebola underreporting during the 2014 West Africa epidemic
Full Description of Project
Introduction
Infectious disease surveillance data can be unreliable during unfolding crises in which resources are limited and public health authorities have poor access to affected communities. In the ongoing 2014 Ebola epidemic in West Africa, surveillance efforts primarily detect cases that are treated in healthcare facility, and likely miss the sizable fraction of cases that do not seek care in a clinical setting [1, 2]. The CDC recently used a factor of 2.5 to correct for this underreporting in Sierra Leone and LIberia, but acknowledged that this quantity is essentially unknown [1]. Accurate outbreak projections and assessments of intervention strategies depend on reliable estimates of underreporting rates. However, underreporting can be a very dynamic process, potentially varying in time, space, and/or with outbreak size, and driven by intrinsic properties of the pathogen, human behavior, diagnostic practices, and the healthcare infrastructure. Mechanistic models that capture these factors is critically important for estimating, reducing, and correctly accounting for underreporting, but they do not yet exist. The primary difficulty we have faced in estimating underreporting is the limited data commonly available during an outbreak, namely confirmed and suspected cases and mortalities. From these data alone, one cannot estimate the rate of underreporting without making strong modeling assumptions and, even with such assumptions, we often lack statistical power to make precise estimates. However, new types of data are available from the ongoing EVD outbreak, including digital data - Facebook, Twitter, cell phones, HealthMap, etc - and Ebola virus genomic data [3]. These data, when coupled with an appropriate mathematical and statistical framework, provide a novel opportunity to model underreporting. For example, Facebook surveys can poll urban populations in West Africa regarding healthcare seeking behavior and, phylodynamic models can directly estimate the rate of underreporting from pathogen sequence data.

Here, we propose to construct an integrative framework, combining a mechanistic model of underreporting factors with survey-based and phylodynamic-based inference, to estimate dynamic rates of underreporting in the ongoing EVD outbreak. The results of this study will have an immediately impact, in enabling better EVD projections and analyses of intervention policies, and more broadly advance our understanding of and statistical analysis of reporting biases in emerging epidemics.

Markus Schlapfer, Santa Fe Institute, Postdoctoral Fellow

Caitlin Stern, Santa Fe Institute, Omidyar Fellow

Pan Zhang, Santa Fe Institute, Postdoctoral Fellow