Structure, Statistical Inference and Dynamics in Networks: From Graphs to Rich Data

From Santa Fe Institute Events Wiki

Workshop Navigation

organized by Cris Moore (SFI), Aaron Clauset (UC Boulder & SFI), and Mark Newman (University of Michigan). Held at the Santa Fe Institute.

Network science is a thriving and increasingly important cross-disciplinary domain that focuses on the representation, analysis and modeling of complex social, biological and technological systems as networks or graphs. Much of the work so far has focused on simple topological approaches, in which the system is modeled as a static list of nodes and edges.

However, the structure and dynamics of many real-world complex systems are not so easily represented. Nodes can have locations, directions, memory, demographic characteristics, content, and preferences. Edges can have lengths, directions,capacities, costs, durations, and types. Moreover, the structure of a network is often not fixed, with edges and nodes appearing, disappearing and changing their characteristics over time. Capturing, modeling and understanding these more complicated aspects of networks is one of the most exciting emerging area within network science.

For instance, in food webs—a set of biological species and their energetic or predatory interactions—we typically have only static topological information(who eats whom), but our goal is to say something useful, e.g., about dynamics, assembly, and robustness over time. A similar situation exists in most social networks, with the additional complication that the networks there are often enormous.Augmenting these simple networks with additional information about the nodes and edges allows us to go beyond statements about the pattern of interactions to address more fundamental questions about the underlying processes. So far, however, relatively few of our models or algorithms speak to such rich data.

An emerging framework for addressing these outstanding needs in network science comes from statistical inference and so-called “generative models.” These techniques offer a number of highly attractive properties for network analysis, and have so far produced principled and scalable approaches for learning directly from a wide variety of static network structures, including assortative and disassortative communities, overlapping communities, and hierarchical structure. These methods draw on sophisticated techniques from machine learning and statistical physics, and can deal gracefully with data that is noisy or incomplete, helping us predict missing or spurious links or test complex organizational hypotheses.

These approaches offer particular promise for understanding the rich network data described above. But, there are also considerable challenges and elucidating these is a primary goal of this workshop. For instance, making practical inferences from these richer data sets requires new generative models that naturally incorporate the additional data, but also models to hew close enough to underlying system’s structure to produce useful conclusions. Additionally, model selection techniques are needed to sort among alternative network hypotheses, and principled measures and tests of model goodness-of-fit are needed to avoid over fitting the data with imaginative models. And, these models and algorithms must scale up to efficiently handle the large networks that increasingly pervade science.

Simply developing new models, however, is not enough, and we must also meet the challenge of understanding when and how the results of these algorithms,like the identification of communities or modules, help us understand network dynamics.For instance, if we use the stochastic block model to lump nodes together into communities, thereby reducing the number of variables as in compartmentalized models in epidemiology, do we get a good approximation of the original dynamics? Similarly, what does the topology of a food web tell us about its ability to respond robustly to species loss or the introduction of invasive species? Can the predators or prey within a functional community be lumped together, giving good coarse-grainings of systems of Lotke-Volterra-type equations?

This workshop will bring together researchers from physics, computer science,statistics, biology, and the nascent field of network science to explore how to deal with rich network data in new ways, to understand the strengths and weaknesses of our current statistical and computational techniques, to develop new models and algorithms, and to look for types of structure in real-world networks that we haven’t seen before. As in a recent successful workshop on the power grid, we will have a relatively small number of hour-long talks with a strong pedagogical component, shorter talks on current research, and large stretches of free time for discussion and collaboration.