# Georg M Goerg

### From Santa Fe Institute Events Wiki

## Contents

## My path to SFI

I am a PhD candidate (starting 4th year) in Statistics at Carnegie Mellon. I received my masters in mathematics (applied / econometrics / time series) from the Vienna University of Technology, Austria and before coming to the US, I spent a year in Chile teaching statistics (mainly time series) at PUC. For more details you can visit my website. You can email me at "my_3_initials_in_lowercase"@stat.cmu.edu.

I am very eager to participate in the CSSS; especially because of the inter-disciplinary research / collaborations on real world problems with people from many backgrounds - that's what statistics is all about (at least for me). So I am looking forward to meeting all of you and I am sure we'll have a great month ahead of us.

## Research Interests

In my thesis I work on local statistical complexity (LSC) - a measure of
*interestingness* for spatio-temporal fields. We develop the
statistical methods and algorithms to i) forecast a spatio-temporal
system, and ii) discover patterns automatically solely from the data. We
do this using modern non-parametric statistical / machine learning
techniques with good properties for any kind of (complex)
spatio-temporal system.

One reason why I work on spatio-temporal systems is that I have always been drawn to time series (a la "My interest lies in the future because I am going to spend the rest of my life there. ” - Charles F. Kettering) and methods that try to solve real-world problems. These include time series clustering, forecasting, blind source separation techniques for forecastable time series, time-varying parameter models. Another side-project are skewed and heavy-tailed distributions, in particular how we can transform random variables to introduce skewness and heavy tails. And as a statistician what's even more relevant to me is how can I reverse this transformation so I can take data and remove skewness, remove power laws, remove heavy tails.

I do all my statistical computing in R -- for user-friendly code and R packages (two so far), and Python -- for huge data tasks.

In my spare time I like to play soccer, volleyball, salsa dancing, traveling, ...

## SFI Project: Traffic pattern analysis - Can we estimate car velocity by only observing car counts?

**Disclaimer:** The model/framework I am thinking about can be applied to many systems where one observes only an overall ensemble average (number of ``particles`` in a system) at each time point t, which is the sum of all particles/entities that have ``survived`` until time t (see also Conceptual View below). However, the really interesting stuff goes on in the individual entities / particles (how long are they alive typically?). So the traffic example is just one of many where these ideas could be applied; if you have a similar situation and you have real-world data then I would have no problem to change the focus on your particular problem if it fits this framework.

#### Problem statement

Imagine you have a monitored highway section with a start and end point. At both points you count the number of cars that pass by. The question I'd like to answer / simulate / estimate is: can we make some inference about the velocity of cars although we only have their counts? This would be very useful from an engineering / economic perspective because it's much easier / cheaper to count cars instead of actually tracking them from A to B.

#### Ideas on how I would approach this

I have some intuition about how to go about this, but these are purely statistical (think of it as birth and death process; or as particles in a system that have a certain lifetime - cars in the highway section are like particles in a system, and their velocity is just inverse proportional to their lifetime in this highway section). I would like to see if using explicit physical modeling of motion and agent-based modeling of traffic flow could shed more light on this problem.

**Update 06/05/12:** Just today we saw *Takens theorem* about how we can infer a systems structure from only observing a subset of variables. Well, it seems like that's exactly what this project is about.

#### Existing approaches

First of all I am not a civil engineer or working in public policy, so I am not aware of the current state of technology / ``art``. So if you happen to know of reference that exactly approach it this way please let me know.

- Hazelton tries to do something similar, but their methods uses more observables than just the counts (they also use occupancy rates). Nevertheless this would be I guess a starting point for the project.
- A glossary of traffic analysis terms
- Freeway Traffic Speed Estimation Using Single Loop Outputs
- Road Traffic Data: Collection Methods and Applications: contains many sources of information and existing real-world approaches / technologies. Includes references to online data-sources.

### Math / Statistics

#### Conceptual view

Parke proposes an error duration model (EDM) for how time series observed in a system happen to form, which is very different to the typical auto-regressive (moving-average) explanation of stochastic phenomena:

*The basic mechanism for an error duration model is a sequence of shocks of stochastic magnitude and stochastic duration. The variable observed in a given period is the sum of those shocks that survive to that point.*

The point of this formulation is that the distribution of the (unobserved) survival times determines the correlation structure of the observed series. Thus vice-versa we should be able to infer the lifetime distribution of the shocks from the correlation structure. The point of this is that in practice we don't observe neither the individual shocks nor their lifetime, but we can estimate the correlations of the observations. Thus in principle it should be possible to infer/estimate the lifetime distribution only from the counts.

#### Formal details

Follows later or link to external pdf.