Movie Project

Predicting Metadata from Network Structure

Summary

This is a 'meta' task... Essentially the idea is to use machine learning or any kind of other techniques to predict things like success, genre etc. of a movie.

First Few Tasks

Script to download all movie galaxies (MS -- done; see post by Andrew in slack)
Conversion from gephi to useful format (MS -- done; note that there is a broken file and two movies with 1 and 0 nodes!)
Network comparison (MS -- running atm; note that you need to get orca for graphlet counting)
Get DigitalSmiths data in usable format (WIll -- almost done; tons of good metadata like Rotten Tomatoes scores etc.)

Interested

Michael Schaub
Andrew Meller
Xiao (Thomas) Zhang
Lu Liu
Harrison Smith
Will Hamilton

Network Construction and Time Dynamics

Summary

The main goal here will be to look at the time dynamics of the movie character networks, with a particular focus on how characters are introduced to the network. We can use this analysis to see how stories develop through the network construction. This can be compared between movies to see how similar network construction and dynamics are across movies.

Interested

Moriah Echlin (moriah.echlin@gmail.com)
Dan Biro (daniel.biro@med.einstein.yu.edu)
Will Hamilton (wleif@stanford.edu)

Trope network

Summary

There is another dataset from TV Tropes (http://tvtropes.org) that I would be happy to bring into this project. Tropes are story telling elements (if you go to http://tvtropes.org/pmwiki/pmwiki.php/Main/Tropes and read a few entries, you will quickly get a sense of them). The dataset contains ~3,500 movies and a list of tropes for each, as well as the movie's year, IMDB rating, and box office.

I am interested in studying story archetypes (typical plots). From a network perspective, it may be possible to build a directed network of "narrative" tropes (identified in http://tvtropes.org/pmwiki/pmwiki.php/Main/NarrativeTropes , but may need more inspection), where the edge directions represent time orders. The time sequence of tropes is not represented in the TV Tropes data, therefore I'm thinking if any of Will's datasets may shed some lights on it. If the network construction is successful, extracting the backbones of the network will show us what are the most commonly used story arcs in movies, etc.

This is only a half-baked idea, and I would love to hear any ideas/comments. If anyone is interested, please let me(Elise) know.

Interested

Yizhi (Elise) Jing (jingy@indiana.edu)

Natural Language Processing of Dialogues

Summary

Data

This subproject works with the dataset of Cornell Movie-Dialogs (www.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html). Already clean.

Objective

The aim is explore the semantic information contained in dialogues (dynamic and static), and ideally to be complementary to other subprojects (on the film overlap) by bringing new features for datamining.

Ideas

Put your ideas here

(Juste) Use sentiment analysis to establish temporal profiles of sentiment evolution in movies. Try to find typical profiles by time-serie clustering e.g. ; check if they correspond to movie classification.

(Lu) Study difference among male/female characters by sentiment analysis, and how gender difference evolves over time and genres.

Interested

Marius Somveille (marius.somveille@zoo.ox.ac.uk)
Lu Liu
Juste Raimbault
Will Hamilton (wleif@stanford.edu)
Harrison Smith

Movie Project

From Santa Fe Institute Events Wiki

Contents

Predicting Metadata from Network Structure

Summary

First Few Tasks

Interested

Network Construction and Time Dynamics

Summary

Interested

Trope network

Summary

Interested

Natural Language Processing of Dialogues

Summary

Data

Objective

Ideas

Interested