Tuesday, December 6, 2016

Videos of the Future

Does anyone remember the movie Paycheck? Originally based on a Phillip K. Dick novel, it features Ben Affleck as a genius engineer that builds a portal that looks into the future. Alas, his invention works too well. He becomes determined to stop it at all costs, using the information he obtained about the future. Add a couple explosions and you have the general idea of the movie. The longer I wait the more Sci-Fi becomes truth. While looking 20 years into the future is still unobtainable, scientists are working are immediate future-predictive videos.

CSAIL recently released a youtube video outlining of how they predict a future visual sequence of events. They used two neural networks being trained on the same data sets. One neural networks job was to create videos, and the others networks job was to determine if the videos are real. They then simultaneously trained the two networks for two years of unsupervised learning. After which they were then able to produce results.

Given a sequence of images, they are able to produce the most likely future sequence. This makes sense because there are a limited amount of actions that things regularly make. In example, people may walk, jog or run as a means of transporting themselves. Its is very unlikely that they would turn around and moonwalk to their destination. Most improbably, they could be picked up by an alien and dropped off at the destination. Basically, the state space of all likely actions is much larger then the probable actions. And by limiting it to the probable actions it is possible to choose the most likely next one.

Another thing to think about is weather the network needs to process the background on every frame change. If only some objects are changing position, it might be necessary to only focus on those, having the background give context. If there is a ball rolling across the scene, how interested are we in what the surface it is rolling across?

A good question would be, what is this technology applicable to. I don't think it would be able to process heavily edited videos. It might only be able to handle real-life video where the observations are continuous, any break would probably throw off any predictions. Right now, its only able to handle very short sequences. It's hard to think of any good business applications for this technology. It might be after further research what this technology can do.

In the accompanying paper they suggest one of the next works would be to see if you can make a static image move. That would be very cool and reminds me of the image drawing algorithm we talked about last week. the Another way I can relate this to class is that they use RANSAC to find the shapes between frames.

The link to the video clip is here

https://www.youtube.com/watch?v=Pt1W_v-yQhw

 Of course, this is also covered in a paper called “Generating Videos with Scene Dynamics”, which can be found in the video link.

Sunday, November 27, 2016

Transfer Learning with Satellite Imagery

Looking for new papers on Machine Learning this last week, I came across an interesting article. Titled “Combining Satellite Imagery and Machine Learning to Predict Poverty” it hits on some topics from class, specifically about data clustering in linear spaces that we have been discussing over the last week. The basic premise of the article was that by using various types of satellite imagery as training data, they could accurately predict the poverty level (or wealth) of third world areas. This was of importance to the authors because there are still large economic "data gaps" within Africa, the continent of interest. The basic gist of the problem is that for most African nations, surveying for income issues is cost prohibitive. If the growing governments had more accurate reports about finances of all areas in their country, it would help them immensely in distributing aid.

One of the most interesting things about this article is the use of transfer learning. Transfer learning is leveraging the principle that Convolutional Neural Networks are layered and re-purposing one of the layers that was previously trained on another data set. The article doesn’t go into the specific details of the algorithms used; it just gives a high level overview of the process the authors used. The authors’ first step to building their CNN was, strangely enough, training on simple images. A set of labeled images consisting of 1000 different simple categories gives CNN the ability to discern between simple properties of images. The paper gives the example of "cat" as a possible label in the data. The data couldn't have anything less to do with wealth distribution of nations. At this point in the training process, the CNN is still general purpose.

The next step after general purpose image detection was to re-tune the CNN, training it to predict the nighttime light intensities based off of daytime satellite images. Essentially, the model within the CNN is now learning to break apart daytime images into a linear space and quantify what the corresponding results at nighttime would be based off of the respective values via a linear mapping. Fortunately, Google maps has hi-res data that was available to the researchers to use for this task. Note that this step is now starting to form information used by the next training phase. It is building predictive clusters to be re-used.

Lastly, the authors train the CNN one more time using what little data they had from the aforementioned wealth surveys. This is to re-correlate the daytime linear space formed from the satellite images to an actual metric they possessed for poverty. The formed clusters now, somehow, become mapped from a picture of a piece of land to wealth and asset holdings. The reasoning the authors give for this being more informative than simply using nighttime lights is interesting as well. Lights, by their nature, are binary.  Is it on or off? Where as looking at roads is much more informative:  how many roads and how well are they maintained? Looking at actual physical features is much more informative. By using regression, the CNN is able to work out what features become the most dominant or, conversely, unimportant.

I have seen pictures of the lights on the East Coast at nighttime hundreds of times but never thought beyond the fact it looked interesting. To see a group of people use that information reminds me, just because I haven't thought of anything interesting to do with the information, doesn't mean that there isn't anything that can be done with it.

The article I am referring to in this post can be found here.

http://science.sciencemag.org/content/353/6301/790

Monday, November 21, 2016

Unsupervised Learning

Unsupervised learning is very similar to supervised learning in that it takes certain inputs, images for example, and can apply labels to that input. Supervised learning, by contrast, follows by example. Supervised learning requires a human to show the program a bunch of images and categorize them for the program. The human must specify what is in each photograph. The program will then be able to use its previously learned images to infer labels onto new images it sees.
Unsupervised learning differs from supervised learning in that it doesn’t require previous inputs to be categorized or labeled. Unsupervised learning can take an image and analyze it without any previous examples. For instance, our current programming assignment is to take a given image and analyze it to find lines. The program didn’t have any previous examples of lines in images to look for. The program instead uses the RANSAC algorithm and keeps finding lines in the image until no more lines can be found. This can be extended to full resolution and full color images. The program can be told to cluster the image into similar sections and colors. This can help the program find edges of objects in the pictures. These edges can help create wire meshes for the images or can slice the image off into different sections.
Unsupervised learning can also be used with text instead of images. A program could look at a corpus of text and sort that data into certain groups for clustering. For example, a program could try to categorize a small corpus of posts on a blogging site. To follow the RANSAC algorithm, the program could randomly pick a few words from random posts and find inliers for those words (blogs with similar words). The program would then continue to refit the model to the new inliers it picked up until the model stops changing. This would create clusters of different categories throughout the site and would help people browse categories they are interested in.

Yahoo does something similar to this where it tried indexing the whole web. Yahoo is primarily used to input search queries and find specifically what you are looking for on the web, but they also have the option to choose a category that they have predefined. You could then explore the web sites that they have in this category. They can create this organization in a similar fashion to the blogging site example. They create a narrow topic and then find all sites that fall into this model using unsupervised learning. These are only a few of the many examples for uses of unsupervised learning.

Monday, November 14, 2016

Limits of Linear Regression


For this weeks blog post, I have some thoughts on linear regression - in particular on the limits of using regression in the context of artificial intelligence.

To analyze the weaknesses of linear regression, it might be appropriate to talk about one of the strengths of regression first. The best thing that linear regression has going for it is simplicity. Anyone who has taken a course in algebra or basic statistics has enough knowledge to be able to understand the goal of this process. Given a set of observed data, try to describe what function produces it. Even in some of the more advanced topics such as fitting to multivarite functions, the basic idea of trying to create a function of best fit remains the same.

This simplicity, however, is also the main weakness of regression in that you need to be able to model the situation as a function. This isn’t a hard task in some fields where regression dominates, where you are gathering easily discretized data. There are no guarantees that you are going to be operating in a area with nice data. Even with a large bag of functions to fit to (exponential growth/decay, power law), you are still assuming that the function is well behaved. Not only does data have to be easy to dissect, it also has to be easy to digest.

In some ways, most of the algorithms we have talked about are the antithesis of linear regression. While before we had been striving for optimal solutions, in linear regression we are attempting to model the true solution, which will always be sub-optimal. The solutions error at any given point might be negligible, but how do you know? This is frustrating world we have stepped into to, we can only hope to have a good approximation to the solution.

I don’t actually think over-fitting is a huge problem with regression as it is a method to avoid overfitting error to begin with. Methods like Lagrange interpolation and cubic splines are much more aggressive. While over-fitting is a problem, it’s actually more operator error as opposed to fault with the method. This is more a problem of trying to throw all of the data at the equation without thinking about it.

And, I feel that is actually what people want to be able to do. Forget about the inherent logistical problems in managing large amounts of data, figuring out some way to sift through information and figure out whats important is just as much of a challenge. So while a neural network might not be as easily understandable, the ability to simply feed it all of your data is awesome. If the hardest part about dealing with the information you have is actually deciphering where to start, regression might be the wrong tool.


Don’t get me wrong, I think linear regression definitely has its place. I would personally use it over a neural network whenever I could get away with it.

Tuesday, November 8, 2016

Handwriting Analysis

Have you ever struggled to read someones handwriting? That seems to be a classic problem, if someone is only writing for themselves, their handwriting seems to become nearly illegible to anyone else. Yet, they can still read it perfectly fine. So if you were to get someone else's grocery list, would you be able to go get everything they need? Better still, what if you had an app that could make sure that you got everything on their list. Think about being able to scan your notes into an editable format, would that be very useful? How about possibly detecting when someone attempts to fake your signature? There are many interesting applications for handwriting recognition. This technology would be useful to have in society.

Handwriting recognition comes in two flavors, on line and off line. On line refers to any handwriting being directly inputted into a system, off line refers to the analysis of a photography of handwriting. The previous example of a grocery list would be off-line, where as signing the credit card system at the store would be an example of on line. The benefits off on line is that you have more situational data to analyze, length of contact with the screen, amount of pressure applied etc. Obviously the downside to this is that there is more to keep track of.

A straightforward way to tackle this problem is simplification, attempting to identify one character at a time. After that, simple learning techniques like k-nearest neighbors will work. It is important to specify what domain is being used for this technique, i.e. Roman alphabet of Modern Standard Arabic, the more you are able to reduce the domain the more success this technique enjoys.

A more common way to analyze handwriting is through training of a neural network based off of several handwriting characteristics. It is important when deciding what to represent in your vector as a characteristic of handwriting. Some things such as aspect ratio, curvature and location relevant to other letters. You can see how breaking down a problem into its fundamental aspects is a property of learning. How you choose to break down the information will determine the overall classification process and therefore your results. There are even some crazy people attempting to use unsupervised learning to classify handwriting, to varying degrees of success.

Something that is very cool, is that actually have competitions for this, whose algorithms can identify handwriting the best. There is a conference called ICFHR (International Conference on Frontiers in Handwriting Recognition) that holds a annual competition. The examples they give for pages that will be analyzed (the actual pages themselves are kept secret till the competition) actually look very impressive. I couldn’t understand anything from just trying to eyeball it, and shows that in practice it is possible for a computer to read better then a human.

Monday, November 7, 2016

Image Recognition

Image Recognition

               Image recognition is a large and vast field. Self-driving cars use it to distinguish road signs and markings while Google Photos uses it to recognize objects in photos and group photos together by type. These image recognition software tools use supervised learning to help guide the image recognition to determine what it sees in the photo. Supervised learning requires a human to show the program a bunch of images and categorize them for the program. The human must specify what is in each photograph. The program will then be able to make guesses on what objects it sees in new images presented to it based on the examples it was given from the human trainer.
               An example of this image recognition being used in self-driving cars is the company Comma.ai. Comma.ai is a small company that is working on retrofitting current cars with additional hardware that will allow limited self-driving on highways. The company will add a camera onto the car that uses image recognition software to determine where to drive the car. Before Comma.ai puts this product into production, they need to train it with supervised learning. Comma.ai released a phone app that you can use while driving. This app will record the road in front of your car and send videos and pictures to the company. This will gather lots of real world training data for Comma.ai to use when training its algorithm.
Comma.ai also released another tool that will allow normal people to train its algorithm. They put up a web tool that will display an image they received from their user-base. The user of the web tool will then be able to label parts of the image. For example, the user can mark they sky, road, cars, signs, road markings, and other parts of the image. These users are acting as trainers for the algorithm. The algorithm will take all these examples and will then be trained to determine what is in a certain image on its own. This will allow the algorithm to be deployed into a car. The product will then be able to determine what is on the road ahead of the car and drive the car accordingly.
This supervised learning is crucial in teaching the car to know what it is looking at on the road. This is not the whole picture, though. The car must still know what to do with the data is has acquired from the photos. It can determine that there are lines on the road, but it still needs to know that it must drive on the right side of a double yellow line, for example. Other algorithms along with supervised learning are required to come together to make self-driving technology.

Monday, October 31, 2016

Stanford Parser

In my recent e-travels searching for a parsing solution in a project, I stumbled across a very cool NLP parser for available under the full GNU GPL. It is called the Stanford Parser, and it parses out the grammatical structure of sentences from context free languages. That means it takes down a sentences into it’s individual pieces, building it into abstract syntax tree. It should be mentioned that it is probabilistic, so it gives you the most likely derivation, instead of an absolute one. In fact, if you specify an amount, it will print that number of highest-scoring possible parses for a sentence. For any given sentence there might be thousands of possible parses, creating a state space too large to exhaustively search. There is also the ability to train the parser, if you have your own corpus lying around that you specifically need to use.

An example parse would be -

Warren enjoys parsing words for fun.

(ROOT
  (S
    (NP (NNP Warren))
    (VP (VBZ enjoys)
      (S
        (VP (VBG parsing)
          (PRT (RP out))
          (NP
            (NP (NNS words))
            (PP (IN for)
              (NP (NN fun)))))))
    (. .)))

As you can see it is in somewhat of a LISP list format, it is actually referred to as a treebank by linguists. This specific format is the Penn-Treebank, and without getting too much into the inside-baseball, it is tagging each part of the sentence with its linguistic element. For example looking at

(RP out)

RP is simply a particle, so out is a particle in this representation of the parse tree.

(VBZ enjoys)

VBZ refers to, Verb (VB) 3rd person singular present (Z). Again, this is the how the word enjoys could be represented in a parse tree. Also, the Stanford Parser supports other models, such as simple tagging of each word in the sentence, or measuring the dependencies of the sentence (other linguistic measures of interest).

Perhaps the best part about the parser though is that by using different tree banks, you can parse with different languages (There is a version for Chinese, German and Arabic linked from the website). So interestingly enough this parser is language independent. I don’t know enough about linguistics as to why, but I am guessing any spoken language can be represented by a treebank (anything with a context free grammar). It should also be noted that treebanks can be set up to represent things differently using the same language. There can be any number of treebanks set up for a language, if a computational linguists was looking to measure something differently.

So what can you use this for? Well, any task where you need to parse out complex meaning from language. It is too heavy duty for parsing simple structured commands from English (ANTLR would still be the best for that task). But if you need to do machine translation or relationship analysis, you are going to need something this heavy-duty.