CSSS 2010 Santa Fe-Projects & Working Groups
From Santa Fe Institute Events Wiki
CSSS Santa Fe 2010 |
Students are required to craft a research project -- use this page to brainstorm and organize your efforts.
Evolution of Words (Dan Rockmore) - In a class on complex systems that I teach at Dartmouth one of the final projects seemed to indicate from a small and somewhat biased sample of English words, that word origins (as indicated by one of the online dictionaries) seem clustered at certain times. As a start I would propose a mining of this info in some online dictionary, performing some initial analysis and see if "there is a there, there.." and if so, keep on going.
Dynamics of Equities Market Structure (Dan Rockmore) -- In a paper of mine w/some of my buddies (some of whom you will meet this summer), "Topological Structures in the Equities Market," PNAS December 30, 2008 vol. 105 no. 52 20589-20594, we found some interesting structure in the correlation network of the NYSE equities market. This required a choice of a time window. It would be interesting to see how/if this structure changes over time and window size, especially on either side of market crises. Scott Pauls has code that could be used to do some of this analysis.
Style of Chess Play (Dan Rockmore) -- I am curious to see if using tools from learning one can characterize the "style" of a chess player. The website www.playchess.com has a database of chess games. I'm not sure if the annotation would enable the determination of particular players, but even without that, can clustering on the move data give sensible/interesting results with respect to style of play?
Movement Careers of Couchsurfing.org members (Bogdan State) - I am working with Couchsurfing.org and two Stanford Professors in trying to analyze this social movement organization's member data. One aspect both we and the Couchsurfing management are interested in is the evolution of members in the movement over time. I would like to perform a preliminary analysis of these "movement careers", using a sample of about 10,000 nodes (out of 1.7 milion) we are scheduled to obtain soon.
"Genes for Breakfast" (Yixian Song) - I've once read a paper of Redfield(1993) "Genes for Breakfast: The Have-Your-Cake and-Eat-lt-Too of Bacterial Transformation". Though it's an old publication, I still find the idea very inspiring. Well, considering bacteria living in a gene-pool with abandoned DNA strands, each bacterium can randomly "eat" free DNA strands, and use them as nutrition or for DNA repairing or even gene improvement. But the DNA strands were abandoned for a reason. Some of them can be virulent.(!!!) Besides bacteria can exchange DNA with each other, of course. We can define a population size of bacteria, amount of free DNA strands in gene-pool, percentage of virulent DNA and their virulence (impact on the bacteria fitness). We certainly can also consider the bacteria as a metapopulation.("A metapopulation consists of a group of spatially separated populations of the same species which interact at some level." - says wikipedia.org) The question to be answered will be "in which situation the bacterial population will become extinct in the end".
Patterns in Cenozoic Western US volcanism (Leif Karlstrom) - Allen Glazner (UNC) has put together a neat database of volcanic activity over the past 65 million years in the Western US (here's a movie of it), including location, duration of activity and lava composition. This data is derived from several careers worth of geologic mapping and dating volcanic rocks exposed all over the West. While it is not complete (not everything is preserved, and not everything has been mapped yet), there is a wealth of information about volcanic processes in here. I think it would be neat to mine this dataset for correlations, then think about ways to model it. This could include actual physics and geology, but could also be based solely on the data.
Pitch diffusion in groups of musicians (Leif Karlstrom) - When the violin section of an orchestra tunes, the concertmaster gets up and plays a note that all the rest of the violins try to match. I did some experiments in my undergrad with John Toner (physics, U Oregon) where we looked at what happens when the frequency of this tuning note shifts during the time when players are actively trying to match one another. We found that the shifted pitch diffuses through group if it is a small shift (a few Hz), but is immediately sensed by the whole group if it is a large shift. This implies that there is a shift from local to long-range interaction that governs how pitch matching occurs. We envisioned a process similar to flocking behavior in birds for the local interactions, which is governed by an advection-diffusion equation. But we were unable to model the data with this model, because it does not allow for long-range interactions. I still have the data, and would be interested in thinking again about how people process sound in groups.
- Sounds like a cool topic! A quick question: do you have data on the social structure of the orchestra? It would be interesting to look at the formal hierarchy, as well as at the informal social network, and see if it has any influence on pitch diffusion, especially for the long-range interactions. (Question asked by Bogdan State)
Language Evolution in an Archipelago (Erika Fille Legara) - The Philippines is an archipelago containing 7,106 islands with three broader divisions (three main islands): Luzon, Visayas, and Mindanao. It has around 175 individual languages, four of which already have no more known speakers. Moreover, the Constitution recognizes eight (8) major and twelve (12) regional languages (statistics are taken from Wikipedia on the Philippines). It is also interesting to note that most Filipinos know at least three languages: (1) his/her native language, (2) Filipino, and (3) English. Now, if I could get data on the different language distributions (per year or per decade) within the archipelago, it might give us new insights on how certain languages evolve. It would also be interesting to model or predict which languages would eventually thrive and die. Also, I'd like to predict what would happen to certain languages at certain regional boundaries after a few decades or a few centuries. And finally, taking a hint from Professor Dan's idea (above), it may also be interesting to look at how certain words in the Filipino dictionary evolve through time. Caveat: I still need to check if we could have the data available before June.
Social Cognition: Defining the Situation (Lynette Shaw) – A foundational concept in social cognition is that of the “mental representation.” Essentially, this is a preexisting framework of meaning that is automatically imposed on perceived information in order to develop the inferences necessary for generating interpretations and expectations from that information. This basic concept bears a strong relationship to many popular ideas in the social sciences such as the “categories” involved in discrimination, cultural “schemas,” the “frames” of social movements, organizational “scripts,” and the “mental models” that are associated with institutions.
In his foundational piece, “On Perceptual Readiness,” Bruner proposes a very simple model of how these representations are essentially “selected for” on the basis of inference validation. Since that time, the complex interdependencies of this automatic, cognitive process occurring within a social context have been explicitly noted in work dealing with “expectancy confirmation.” Implicitly, the interdependent nature of this process within the social context has arguably undergirded several bodies of both classical and contemporary social theory - especially those relying on an idea of individuals reaching a “shared definition of the situation.”
Though this inference-validation model of mental representation is a relatively simple one, little work to date has really sought to represent it in ways that could be formally or systematically elaborated upon. This project would translate this conceptual model into an agent-based computer simulation and, if time allows, begin exploring key parameterizations of it that have interesting real world analogs.
- If I understand this correctly, I find it interesting :-) Ligtvoet 21:37, 21 May 2010 (UTC)
"Structure, Function and Spaces" (Giovanni Petri): recently networks have been studied in relation to their space embeddings (usually hyperbolic) for a number of reasons, for example efficient navigation, data filtering or visualization (see here, here and here). To wet your appetite, one of the fascinating results is that any graph can be embedded as a planar graph on a surface with sufficiently high genus (i.e. how many donut's hole you make in the space). Now I would be interested in studying whether such hidden metric space analogy goes a bit deeper. For example, whether there is a relation between diffusion and transport properties on a networks and its space embedding, whether interacting systems (think of correlation matrices, multi-body systems etc) can be cast in such form and some of their properties derived from the embedding space's characteristics (say genus, curvature etc etc). As I'm currently reading on the subject but don't have a precise idea how to implement it, I would very much like feedback from any interested peer/p.
"Ego'o'war" (Giovanni Petri): Brandes et al. (link broken) -> This seems to work recently extracted role-models for ego-networks from a dataset obtained through questionnaire in a large community of immigrants. It would be interesting to use some of the available data to try and identify behavioral archetypes (socialites, noobs, PKers, carebears, griefers etc etc) in online communities, how their interact and evolve. I'm thinking of virtual worlds (as Eve Online or Michael Szell's world for instance) as they do present a wider range of possible interactions than standard social networks, i.e. grouping, migrations, wars, commerce etc etc . This project however sounds pretty data-intensive and it might not be easy to get all the data involved.
- (Michael Szell) I have begun working on exactly this topic, in succession to this paper. See Video of an aggressive player. One could follow the evolution of some players and their activities in time, and see how their "careers" evolve. I am sure one could observe a lot of interesting things, e.g. "bursty" behavior, long-range correlations, non-gaussian distributions of activity... I can try to extract data from some players, so we can take a look at it in June.
- (Giovanni Petri) Great! Another issue might be the mobility of virtual agents as opposed to real agents (say from mobile networks). It would be interesting to see if there are any similarities or not (what I'm thinking of is something along the lines of, can we learn something useful for real-world applications from the virtual ones?) and maybe there might be links to the project proposed by Bogdan State at the top of the page.
Development of an online environment for simple behavioral experiments (Michael Szell): Classic "bottom-up" behavioral experiments, such as conducted by Henrich et al. or Traulsen et al. ..Meta-Info, face three main problems:
- ) It is highly cumbersome and resource-intensive to set up a physical environment, and to assemble enough subjects who take part in your experiments (usually they have to be paid)
- ) The subjects are often students or another possibly non-representative/biased sample of the human population
- ) It is not possible to assemble more than a few dozen/hundreds of subjects, leading to possibly non-significant results. Number of subjects scales linearly with cost.
It is baffling how scientists (with very few exceptions) have so far avoided the vastness of the internet population for conducting such behavioral experiments. Problem 1) can be solved with a relatively small amount of resources, by setting up an online environment for experiments. Problem 2) shifts, as the bias shifts (depending on the subjects you attract). However, problem 3) is solved instantly, as >10^4 subjects which you can easily motivate over time (with practically zero running cost) will guarantee statistical significance. My proposal is to gather experts in software engineering / web development / experimental setup, to develop such an online environment (as simple as possible). I suggest it should be AJAX+LAMP-based, portable, open-source, and as easy as possible to embed on any page having MySQL/PHP behind. This way it could serve its experiments as "mini-games" in e.g. bigger browser-games, or on other sites. The first implemented experiment could be the Ultimatum game. Note that I have no experience with AJAX, so this project would need someone qualified in this field.
"Phenotypic Plasticity and Climate Change" (Kyla Dahlin): One of the biggest challenges to understanding how ecosystems will change with a changing climate is that we don't know species' fundamental niches. People like to take existing distributions ("realized niches"), correlate them with climate, then project where that climate will move in the future, but that ignores the fact that plants could actually be able to tolerate a much wider range of conditions than those we currently find them in. It sees like you could get a better handle on this if you knew (1) a plant's generation time, (2) how many generations it takes to evolve a new trait, and (3) the climate the plant experienced in the past. If the timescale of a plant's evolution is similar to that of, say, glacial cycles, that would suggest that the plant could handle a pretty wide range of temperatures and weather extremes. I'd love to know if any of the evolutionary bio folks have thought about this or know the literature better than I do!
Quantitative Analysis of Northern New Mexico Acequia Infrastructure: An Applied Complexity Approach (John Paul)
New Mexico has community ditch irrigation systems called acequias that are some of the oldest decentralized European social structures in the Americas. Some work has been done by a previous CSSS student studying the social structure of acequias and how they are both sustainable and vulnerable to novel disturbances. More academic work can be found here discussing social structures. Most of this work has been qualitative traditional sociological work.
I've come across some data from the New Mexico Office of the State Engineer detailing acequia water rights infrastructure I think may be interesting to look at. Please see the spreadsheets on this page.
Cursory data analysis is a good place to start.
I'm eventually interested either crafting a model to simulate acequia network growth (theoretical) for historical research purposes, or some research into the statewide structure of acequias that may determine future policy recommendations (applied).
"Roadkill as a means of spreading disease in Tasmanian Devils" (Gavin Fay) - Living in Tasmania, it is hard not to become familiar with the plight of the Tasmanian Devil, whose population is currently dwindling due to Devil Facial Tumour Disease (DFTD), a rather nasty infectious cancer which has become prevalent through much of the state. DFTD infection relies on transmission of infected cells from contact, most likely due to biting, which these critters do a lot of during mating and around prey carcasses. A hot conservation topic right now is forestry plans to build roads opening up a wilderness area in the north of the state to ecotourism opportunities. The devil population in this area has until now remained disease free. There are concerns that the road will increase the likelihood that DFTD will spread to the diseasse-free population: Devils are scavengers and frequently feed on roadkill, the creation of a road may then provide an opportunity for increased frequency of contact between infected and disease-free devils. It might be interesting to investigate how introducing a fixed-location source of additional prey items (ie a road) to a devil population would change the contact network for Devils, and then also to what extent the increased contact frequency would have to be to facilitate transmission of DFTD from an infected devil population to a disease-free one.
"Manage lots of fish stocks, or a few?" (Gavin Fay) - Australia's Southern and Eastern Scalefish and Shark Fishery (SESSF) is a multiple species fishery with a large number of vessels operating using a range of gears. The fishery exploits 80+ species, with a subset of target species managed by a total allowable catch (TAC) under a quota management system. Management of other species within the fishery is controlled by other measures such as trip limits, gesar restrictions, and spatial and seasonal closures. Specification of TACs requires data collection and routine stock assessment in order to calculate suitable catch limits given an assessment of stock status.
It is not feasible to perform full quantitative analyses for each quota species on an annual basis (from both a data perspective, instituional capacity, $$$, and other reasons) and rapid assessment methods are prevalent (or absent). Given that the fishery is multiple species there exist a considerable number of technical interactions within the fishery. ie targeting one species leads to catch of several others - single fishing opportunities (shots, hauls) are not single species.
Given these interactions: Which species should we manage for? Are there a suite of species that we can actively manage for such that the risk to other stocks is not too great? Should we target the high-value species, abundant species (that may be low in value per kg but overall count big $), or manage to minimise the catch of vulnerable species? What are the effects of these options on ecosystem biomass, proportion of stocks in danger of collapse, yield, profit, etc?
Indeed, just describing the multispecies interactions (which species are associated with others in the data, how do these fishery assemblages vary over time and space (perhaps by port)), would be an exercise in itself. Perhaps one could also look at the relative costs to a port-based community associated with the fished assemblage shifting from one to another (perhaps as a result of climate change?).
"Estimating abundance trends of non-target species" (Gavin Fay) - Trends in abundance of non-target, or bycatch species in fisheries is generally achieved by the results of fishery independent surveys. Surveys are expensive, and are not always available. How then can we estimate trends when these data are not available? Direct effects (ie incidental harvesting) can be measured (time series of catch), and it might be expected that the relative trends in exploitation rate of bycatch species should be similar to those target species with which the bycatch species are taken. This idea has been attempted in multispecies assessments under a 'Robin Hood' approach (steal from the data-rich to give to the poor). An issue is that the lack of information for the data-poor species can degrade the performance of the data-rich assessment, when the assessments are conducted simultaneously in a multispecies framework.
I'm thinking about a general multivariate state-space modelling framework for nontarget species, which could use correlations of direct effects with target species derived from fisheries logbook data, and the 'known' abundances and trends of the target species. An additional question is how to quantify indirect effects of fishing on abundance of nontarget species. One possibility could be to guide the covariance with the results of foodweb modelling, which could be limited to simply describing how connected nontarget species are with the various target species. Another might be to use information about life history, or trophic level to describe general expectations for the degree of correlation/change. It might be useful to make use of a system where it is possible to groundtruth methods - ie an ecosystem for which survey data are available. Alternatively, one could subset the data-rich species.
"Evolution of life history strategies in sea lions" (Gavin Fay) - The Australian sea lion (ASL) is unique among the otariids (fur seals and sea lions) in that it exhibits a non-annual breeding strategy, with breeding cycle of ~17 months, an extended pupping season at rookeries of 4-5 months, and non-synchronous pupping among subpopulations (rookeries). In contrast, all other sea lions breed on 12 monthly cycle, with short pupping seasons, for which most species is synchronised among rookeries for the entire population.
A proposed idea is that the ASL strategy is in response to living in a low productivity environment (most other fur seals and sea lions live in highly productive, nutrient rich places). with the ability to vary the delay in implantantion of fertilised eggs depending on environmental conditions, thus enabling indiviudals to only invest in reproductive output when the probability for pup survival is high. Indeed, there is evidence that the length of breeding period is correlated with environemntal conditions. Perhaps it would be neat to see whether the 2 different life history strategies observed are concordant with the hypotheses given evolutionary pressure. I am not familiar with the methods involved, but one could evolve a suite of life histories given different environmental regimes and see which survive?
"Complexity: Friend or Foe ?" (Erik Van den broecke). I am setting up a new initiative – code name “The Brussels Institute for Complexity Based Solutions”, that aims to foster “smart leadership” (read: complexity aware leadership) in business, government and academia. By mid July we will finalize the service portfolio. Per service we need to define the service value, service pricing and target groups. Furthermore high level service descriptions (process, techniques, tools) need to be described. As from mid July we will go to market.
Today we have identified four candidate services, namely: strategy development, strategy implementation & organizational performance, complexity related training for leaders & decision makers, complexity related methodology coaching especially for consultants.
First thing is to brainstorm on leadership/management related, complexity based, services.