Does anyone remember the movie Paycheck? Originally based on a Phillip K. Dick novel, it features Ben Affleck as a genius engineer that builds a portal that looks into the future. Alas, his invention works too well. He becomes determined to stop it at all costs, using the information he obtained about the future. Add a couple explosions and you have the general idea of the movie. The longer I wait the more Sci-Fi becomes truth. While looking 20 years into the future is still unobtainable, scientists are working are immediate future-predictive videos.
CSAIL recently released a youtube video outlining of how they predict a future visual sequence of events. They used two neural networks being trained on the same data sets. One neural networks job was to create videos, and the others networks job was to determine if the videos are real. They then simultaneously trained the two networks for two years of unsupervised learning. After which they were then able to produce results.
Given a sequence of images, they are able to produce the most likely future sequence. This makes sense because there are a limited amount of actions that things regularly make. In example, people may walk, jog or run as a means of transporting themselves. Its is very unlikely that they would turn around and moonwalk to their destination. Most improbably, they could be picked up by an alien and dropped off at the destination. Basically, the state space of all likely actions is much larger then the probable actions. And by limiting it to the probable actions it is possible to choose the most likely next one.
Another thing to think about is weather the network needs to process the background on every frame change. If only some objects are changing position, it might be necessary to only focus on those, having the background give context. If there is a ball rolling across the scene, how interested are we in what the surface it is rolling across?
A good question would be, what is this technology applicable to. I don't think it would be able to process heavily edited videos. It might only be able to handle real-life video where the observations are continuous, any break would probably throw off any predictions. Right now, its only able to handle very short sequences. It's hard to think of any good business applications for this technology. It might be after further research what this technology can do.
In the accompanying paper they suggest one of the next works would be to see if you can make a static image move. That would be very cool and reminds me of the image drawing algorithm we talked about last week. the Another way I can relate this to class is that they use RANSAC to find the shapes between frames.
The link to the video clip is here
https://www.youtube.com/watch?v=Pt1W_v-yQhw
Of course, this is also covered in a paper called “Generating Videos with Scene Dynamics”, which can be found in the video link.