Georg M Goerg: Difference between revisions

Revision as of 22:32, 5 June 2012

My path to SFI

I am a PhD candidate (starting 4th year) in Statistics at Carnegie Mellon. I received my masters in mathematics (applied / econometrics / time series) from the Vienna University of Technology, Austria and before coming to the US, I spent a year in Chile teaching statistics (mainly time series) at PUC. For more details you can visit my website. You can email me at "my_3_initials_in_lowercase"@stat.cmu.edu.

I am very eager to participate in the CSSS; especially because of the inter-disciplinary research / collaborations on real world problems with people from many backgrounds - that's what statistics is all about (at least for me). So I am looking forward to meeting all of you and I am sure we'll have a great month ahead of us.

Research Interests

In my thesis I work on local statistical complexity (LSC) - a measure of interestingness for spatio-temporal fields. We develop the statistical methods and algorithms to i) forecast a spatio-temporal system, and ii) discover patterns automatically solely from the data. We do this using modern non-parametric statistical / machine learning techniques with good properties for any kind of (complex) spatio-temporal system.

One reason why I work on spatio-temporal systems is that I have always been drawn to time series (a la "My interest lies in the future because I am going to spend the rest of my life there. ” - Charles F. Kettering) and methods that try to solve real-world problems. These include time series clustering, forecasting, blind source separation techniques for forecastable time series, time-varying parameter models. Another side-project are skewed and heavy-tailed distributions, in particular how we can transform random variables to introduce skewness and heavy tails. And as a statistician what's even more relevant to me is how can I reverse this transformation so I can take data and remove skewness, remove power laws, remove heavy tails.

I do all my statistical computing in R -- for user-friendly code and R packages (two so far), and Python -- for huge data tasks.

In my spare time I like to play soccer, volleyball, salsa dancing, traveling, ...

SFI Project: Traffic pattern analysis - Can we estimate car velocity by only observing car counts? =

Problem statement

Imagine you have a monitored highway section with a start and end point. At both points you count the number of cars that pass by. The question I'd like to answer / simulate / estimate is: can we make some inference about the velocity of cars although we only have their counts? This would be very useful from an engineering / economic perspective because it's much easier / cheaper to count cars instead of actually tracking them from A to B.

Ideas on how I would approach this

I have some intuition about how to go about this, but these are purely statistical (think of it as birth and death process; or as particles in a system that have a certain lifetime - cars in the highway section are like particles in a system, and their velocity is just inverse proportional to their lifetime in this highway section). I would like to see if using explicit physical modeling of motion and agent-based modeling of traffic flow could shed more light on this problem.

Update 06/05/12: Just today we saw Takens theorem about how we can infer a systems structure from only observing a subset of variables. Well, it seems like that's exactly what this project is about.

Math / Statistics

Conceptual view

Parke proposes an error duration model (EDM) for how time series observed in a system happen to form, which is very different to the typical auto-regressive (moving-average) explanation of stochastic phenomena:

The basic mechanism for an error duration model is a sequence of shocks of stochastic magnitude and stochastic duration. The variable observed in a given period is the sum of those shocks that survive to that point.

The point of this formulation is that the distribution of the (unobserved) survival times determines the correlation structure of the observed series. Thus vice-versa we should be able to infer the lifetime distribution of the shocks from the correlation structure. The point of this is that in practice we don't observe neither the individual shocks nor their lifetime, but we can estimate the correlations of the observations. Thus in principle it should be possible to infer/estimate the lifetime distribution only from the counts.

Formal details

Follows later or link to external pdf.

@@ Line 25: / Line 25: @@
 ==== Problem statement ====
 Imagine you have a monitored highway section with a start and end point. At both points you count the number of cars that pass by. The question I'd like to answer / simulate / estimate is: can we make some inference about the velocity of cars although we only have their counts? This would be very useful from an engineering / economic perspective because it's much easier / cheaper to count cars instead of actually tracking them from A to B.
-==== Ideas on how to approach this ====
+==== Ideas on how I would approach this ====
 I have some intuition about how to go about this, but these are purely statistical (think of it as birth and death process; or as particles in a system that have a certain lifetime - cars in the highway section are like particles in a system, and their velocity is just inverse proportional to their lifetime in this highway section). I would like to see if using explicit physical modeling of motion and agent-based modeling of traffic flow could shed more light on this problem.