Tuesday, December 6, 2016

Videos of the Future

Does anyone remember the movie Paycheck? Originally based on a Phillip K. Dick novel, it features Ben Affleck as a genius engineer that builds a portal that looks into the future. Alas, his invention works too well. He becomes determined to stop it at all costs, using the information he obtained about the future. Add a couple explosions and you have the general idea of the movie. The longer I wait the more Sci-Fi becomes truth. While looking 20 years into the future is still unobtainable, scientists are working are immediate future-predictive videos.

CSAIL recently released a youtube video outlining of how they predict a future visual sequence of events. They used two neural networks being trained on the same data sets. One neural networks job was to create videos, and the others networks job was to determine if the videos are real. They then simultaneously trained the two networks for two years of unsupervised learning. After which they were then able to produce results.

Given a sequence of images, they are able to produce the most likely future sequence. This makes sense because there are a limited amount of actions that things regularly make. In example, people may walk, jog or run as a means of transporting themselves. Its is very unlikely that they would turn around and moonwalk to their destination. Most improbably, they could be picked up by an alien and dropped off at the destination. Basically, the state space of all likely actions is much larger then the probable actions. And by limiting it to the probable actions it is possible to choose the most likely next one.

Another thing to think about is weather the network needs to process the background on every frame change. If only some objects are changing position, it might be necessary to only focus on those, having the background give context. If there is a ball rolling across the scene, how interested are we in what the surface it is rolling across?

A good question would be, what is this technology applicable to. I don't think it would be able to process heavily edited videos. It might only be able to handle real-life video where the observations are continuous, any break would probably throw off any predictions. Right now, its only able to handle very short sequences. It's hard to think of any good business applications for this technology. It might be after further research what this technology can do.

In the accompanying paper they suggest one of the next works would be to see if you can make a static image move. That would be very cool and reminds me of the image drawing algorithm we talked about last week. the Another way I can relate this to class is that they use RANSAC to find the shapes between frames.

The link to the video clip is here


 Of course, this is also covered in a paper called “Generating Videos with Scene Dynamics”, which can be found in the video link.

Sunday, November 27, 2016

Transfer Learning with Satellite Imagery

Looking for new papers on Machine Learning this last week, I came across an interesting article. Titled “Combining Satellite Imagery and Machine Learning to Predict Poverty” it hits on some topics from class, specifically about data clustering in linear spaces that we have been discussing over the last week. The basic premise of the article was that by using various types of satellite imagery as training data, they could accurately predict the poverty level (or wealth) of third world areas. This was of importance to the authors because there are still large economic "data gaps" within Africa, the continent of interest. The basic gist of the problem is that for most African nations, surveying for income issues is cost prohibitive. If the growing governments had more accurate reports about finances of all areas in their country, it would help them immensely in distributing aid.

One of the most interesting things about this article is the use of transfer learning. Transfer learning is leveraging the principle that Convolutional Neural Networks are layered and re-purposing one of the layers that was previously trained on another data set. The article doesn’t go into the specific details of the algorithms used; it just gives a high level overview of the process the authors used. The authors’ first step to building their CNN was, strangely enough, training on simple images. A set of labeled images consisting of 1000 different simple categories gives CNN the ability to discern between simple properties of images. The paper gives the example of "cat" as a possible label in the data. The data couldn't have anything less to do with wealth distribution of nations. At this point in the training process, the CNN is still general purpose.

The next step after general purpose image detection was to re-tune the CNN, training it to predict the nighttime light intensities based off of daytime satellite images. Essentially, the model within the CNN is now learning to break apart daytime images into a linear space and quantify what the corresponding results at nighttime would be based off of the respective values via a linear mapping. Fortunately, Google maps has hi-res data that was available to the researchers to use for this task. Note that this step is now starting to form information used by the next training phase. It is building predictive clusters to be re-used.

Lastly, the authors train the CNN one more time using what little data they had from the aforementioned wealth surveys. This is to re-correlate the daytime linear space formed from the satellite images to an actual metric they possessed for poverty. The formed clusters now, somehow, become mapped from a picture of a piece of land to wealth and asset holdings. The reasoning the authors give for this being more informative than simply using nighttime lights is interesting as well. Lights, by their nature, are binary.  Is it on or off? Where as looking at roads is much more informative:  how many roads and how well are they maintained? Looking at actual physical features is much more informative. By using regression, the CNN is able to work out what features become the most dominant or, conversely, unimportant.

I have seen pictures of the lights on the East Coast at nighttime hundreds of times but never thought beyond the fact it looked interesting. To see a group of people use that information reminds me, just because I haven't thought of anything interesting to do with the information, doesn't mean that there isn't anything that can be done with it.

The article I am referring to in this post can be found here.


Monday, November 21, 2016

Unsupervised Learning

Unsupervised learning is very similar to supervised learning in that it takes certain inputs, images for example, and can apply labels to that input. Supervised learning, by contrast, follows by example. Supervised learning requires a human to show the program a bunch of images and categorize them for the program. The human must specify what is in each photograph. The program will then be able to use its previously learned images to infer labels onto new images it sees.
Unsupervised learning differs from supervised learning in that it doesn’t require previous inputs to be categorized or labeled. Unsupervised learning can take an image and analyze it without any previous examples. For instance, our current programming assignment is to take a given image and analyze it to find lines. The program didn’t have any previous examples of lines in images to look for. The program instead uses the RANSAC algorithm and keeps finding lines in the image until no more lines can be found. This can be extended to full resolution and full color images. The program can be told to cluster the image into similar sections and colors. This can help the program find edges of objects in the pictures. These edges can help create wire meshes for the images or can slice the image off into different sections.
Unsupervised learning can also be used with text instead of images. A program could look at a corpus of text and sort that data into certain groups for clustering. For example, a program could try to categorize a small corpus of posts on a blogging site. To follow the RANSAC algorithm, the program could randomly pick a few words from random posts and find inliers for those words (blogs with similar words). The program would then continue to refit the model to the new inliers it picked up until the model stops changing. This would create clusters of different categories throughout the site and would help people browse categories they are interested in.

Yahoo does something similar to this where it tried indexing the whole web. Yahoo is primarily used to input search queries and find specifically what you are looking for on the web, but they also have the option to choose a category that they have predefined. You could then explore the web sites that they have in this category. They can create this organization in a similar fashion to the blogging site example. They create a narrow topic and then find all sites that fall into this model using unsupervised learning. These are only a few of the many examples for uses of unsupervised learning.

Monday, November 14, 2016

Limits of Linear Regression

For this weeks blog post, I have some thoughts on linear regression - in particular on the limits of using regression in the context of artificial intelligence.

To analyze the weaknesses of linear regression, it might be appropriate to talk about one of the strengths of regression first. The best thing that linear regression has going for it is simplicity. Anyone who has taken a course in algebra or basic statistics has enough knowledge to be able to understand the goal of this process. Given a set of observed data, try to describe what function produces it. Even in some of the more advanced topics such as fitting to multivarite functions, the basic idea of trying to create a function of best fit remains the same.

This simplicity, however, is also the main weakness of regression in that you need to be able to model the situation as a function. This isn’t a hard task in some fields where regression dominates, where you are gathering easily discretized data. There are no guarantees that you are going to be operating in a area with nice data. Even with a large bag of functions to fit to (exponential growth/decay, power law), you are still assuming that the function is well behaved. Not only does data have to be easy to dissect, it also has to be easy to digest.

In some ways, most of the algorithms we have talked about are the antithesis of linear regression. While before we had been striving for optimal solutions, in linear regression we are attempting to model the true solution, which will always be sub-optimal. The solutions error at any given point might be negligible, but how do you know? This is frustrating world we have stepped into to, we can only hope to have a good approximation to the solution.

I don’t actually think over-fitting is a huge problem with regression as it is a method to avoid overfitting error to begin with. Methods like Lagrange interpolation and cubic splines are much more aggressive. While over-fitting is a problem, it’s actually more operator error as opposed to fault with the method. This is more a problem of trying to throw all of the data at the equation without thinking about it.

And, I feel that is actually what people want to be able to do. Forget about the inherent logistical problems in managing large amounts of data, figuring out some way to sift through information and figure out whats important is just as much of a challenge. So while a neural network might not be as easily understandable, the ability to simply feed it all of your data is awesome. If the hardest part about dealing with the information you have is actually deciphering where to start, regression might be the wrong tool.

Don’t get me wrong, I think linear regression definitely has its place. I would personally use it over a neural network whenever I could get away with it.

Tuesday, November 8, 2016

Handwriting Analysis

Have you ever struggled to read someones handwriting? That seems to be a classic problem, if someone is only writing for themselves, their handwriting seems to become nearly illegible to anyone else. Yet, they can still read it perfectly fine. So if you were to get someone else's grocery list, would you be able to go get everything they need? Better still, what if you had an app that could make sure that you got everything on their list. Think about being able to scan your notes into an editable format, would that be very useful? How about possibly detecting when someone attempts to fake your signature? There are many interesting applications for handwriting recognition. This technology would be useful to have in society.

Handwriting recognition comes in two flavors, on line and off line. On line refers to any handwriting being directly inputted into a system, off line refers to the analysis of a photography of handwriting. The previous example of a grocery list would be off-line, where as signing the credit card system at the store would be an example of on line. The benefits off on line is that you have more situational data to analyze, length of contact with the screen, amount of pressure applied etc. Obviously the downside to this is that there is more to keep track of.

A straightforward way to tackle this problem is simplification, attempting to identify one character at a time. After that, simple learning techniques like k-nearest neighbors will work. It is important to specify what domain is being used for this technique, i.e. Roman alphabet of Modern Standard Arabic, the more you are able to reduce the domain the more success this technique enjoys.

A more common way to analyze handwriting is through training of a neural network based off of several handwriting characteristics. It is important when deciding what to represent in your vector as a characteristic of handwriting. Some things such as aspect ratio, curvature and location relevant to other letters. You can see how breaking down a problem into its fundamental aspects is a property of learning. How you choose to break down the information will determine the overall classification process and therefore your results. There are even some crazy people attempting to use unsupervised learning to classify handwriting, to varying degrees of success.

Something that is very cool, is that actually have competitions for this, whose algorithms can identify handwriting the best. There is a conference called ICFHR (International Conference on Frontiers in Handwriting Recognition) that holds a annual competition. The examples they give for pages that will be analyzed (the actual pages themselves are kept secret till the competition) actually look very impressive. I couldn’t understand anything from just trying to eyeball it, and shows that in practice it is possible for a computer to read better then a human.

Monday, November 7, 2016

Image Recognition

Image Recognition

               Image recognition is a large and vast field. Self-driving cars use it to distinguish road signs and markings while Google Photos uses it to recognize objects in photos and group photos together by type. These image recognition software tools use supervised learning to help guide the image recognition to determine what it sees in the photo. Supervised learning requires a human to show the program a bunch of images and categorize them for the program. The human must specify what is in each photograph. The program will then be able to make guesses on what objects it sees in new images presented to it based on the examples it was given from the human trainer.
               An example of this image recognition being used in self-driving cars is the company Comma.ai. Comma.ai is a small company that is working on retrofitting current cars with additional hardware that will allow limited self-driving on highways. The company will add a camera onto the car that uses image recognition software to determine where to drive the car. Before Comma.ai puts this product into production, they need to train it with supervised learning. Comma.ai released a phone app that you can use while driving. This app will record the road in front of your car and send videos and pictures to the company. This will gather lots of real world training data for Comma.ai to use when training its algorithm.
Comma.ai also released another tool that will allow normal people to train its algorithm. They put up a web tool that will display an image they received from their user-base. The user of the web tool will then be able to label parts of the image. For example, the user can mark they sky, road, cars, signs, road markings, and other parts of the image. These users are acting as trainers for the algorithm. The algorithm will take all these examples and will then be trained to determine what is in a certain image on its own. This will allow the algorithm to be deployed into a car. The product will then be able to determine what is on the road ahead of the car and drive the car accordingly.
This supervised learning is crucial in teaching the car to know what it is looking at on the road. This is not the whole picture, though. The car must still know what to do with the data is has acquired from the photos. It can determine that there are lines on the road, but it still needs to know that it must drive on the right side of a double yellow line, for example. Other algorithms along with supervised learning are required to come together to make self-driving technology.

Monday, October 31, 2016

Stanford Parser

In my recent e-travels searching for a parsing solution in a project, I stumbled across a very cool NLP parser for available under the full GNU GPL. It is called the Stanford Parser, and it parses out the grammatical structure of sentences from context free languages. That means it takes down a sentences into it’s individual pieces, building it into abstract syntax tree. It should be mentioned that it is probabilistic, so it gives you the most likely derivation, instead of an absolute one. In fact, if you specify an amount, it will print that number of highest-scoring possible parses for a sentence. For any given sentence there might be thousands of possible parses, creating a state space too large to exhaustively search. There is also the ability to train the parser, if you have your own corpus lying around that you specifically need to use.

An example parse would be -

Warren enjoys parsing words for fun.

    (NP (NNP Warren))
    (VP (VBZ enjoys)
        (VP (VBG parsing)
          (PRT (RP out))
            (NP (NNS words))
            (PP (IN for)
              (NP (NN fun)))))))
    (. .)))

As you can see it is in somewhat of a LISP list format, it is actually referred to as a treebank by linguists. This specific format is the Penn-Treebank, and without getting too much into the inside-baseball, it is tagging each part of the sentence with its linguistic element. For example looking at

(RP out)

RP is simply a particle, so out is a particle in this representation of the parse tree.

(VBZ enjoys)

VBZ refers to, Verb (VB) 3rd person singular present (Z). Again, this is the how the word enjoys could be represented in a parse tree. Also, the Stanford Parser supports other models, such as simple tagging of each word in the sentence, or measuring the dependencies of the sentence (other linguistic measures of interest).

Perhaps the best part about the parser though is that by using different tree banks, you can parse with different languages (There is a version for Chinese, German and Arabic linked from the website). So interestingly enough this parser is language independent. I don’t know enough about linguistics as to why, but I am guessing any spoken language can be represented by a treebank (anything with a context free grammar). It should also be noted that treebanks can be set up to represent things differently using the same language. There can be any number of treebanks set up for a language, if a computational linguists was looking to measure something differently.

So what can you use this for? Well, any task where you need to parse out complex meaning from language. It is too heavy duty for parsing simple structured commands from English (ANTLR would still be the best for that task). But if you need to do machine translation or relationship analysis, you are going to need something this heavy-duty.

Tuesday, October 25, 2016

Facial Recognition

Facial recognition is an interesting topic in Artificial Intelligence. In this subfield there is math being used from multiple topics, some heavy Linear Algebra and Bayesian Statistics. It is also important because facial recognition is used everywhere today, from Facebook identify your friends in a photo, to Google street view blurring out someones face. The heavy use of these algorithms brings up ethical issues, such as privacy concerns and data rights. Todays blog post however, will focus on the underlying technology that some of the algorithms are based on.

First and foremost, there must be way to mathematically break down someones face, in order to quantify it. One of the main ways this is done is through a method called principal component analysis (PCA). Without getting to much into the math, what principal component analysis does is it breaks down the face into a linear basis based off of pieces (or the features) of a humans face. By representing the face this way, you have a way of measuring the individual magnitude of the features a persons face. This allows us to have a possible unique representation of a face as a linear basis. PCA is closely related to using a Singular Valued Decomposition as a form of measurement. Note there are other methods, like independent component analysis or simply using the Eigen face (a face represented by Eigen values) that rely upon the same methodology of representing a face as a linear basis.

The second part to the aforementioned algorithms is using some sort of statistical analysis on the newly acquired linear basis.  The statistical can be some sort of Bayesian methodology, wether it be a simple Hidden Markov Model or a full featured network. Which can either be achieved though unsupervised or supervised data set processing, depending on the choice made by the developer. Like all training methods, the recogintion made by the algorithms is therefore dependent on the data sets used.

The only method I have seen found that tends to break from decomposing a face into a linear basis is a method called Dynamic Link Matching, which instead decomposes the face into a graph made by stretching the face over a 2d lattice layer by layer. However, this is still in some putting the information into layers (you could argue that is dimensions), simply this method is not concerned with maintaining orthogonality. It just creates a 'blob' which is then fed through a neural network, and has some similar statistical analysis run on it which matches it to the closest model. This method leverages the power a neural network gives simply by passing off the face as a 'blob' of information.

Some of the challenge facial recognitions has are still similar to the challenges humans have, such as poor viewing angles, improper lighting and an object obstructing someones face. This is because if you are missing any region of the face, the principal component analysis will be off. If the decomposition used is missing information about the domain, it is invalid. Only having one side of the face completely foils everything. Another thing that can throw off the principal component analysis is if someone is making a exaggerated gesture of their face. These algorithms assume that people faces are more less static, not changing very much.

Monday, October 24, 2016

Automated Planning

STRIPS was an automated planner developed at Stanford University and which paved the way for the beginnings of automated planning AI. Automated planning allows planners to make their way through an environment or a task using the same algorithm to solve completely different tasks. This can have many applications all over the world but is most commonly used for robotics.

               An automated planning algorithm needs to know certain information about the world it is functioning in to be able to plan correctly. The algorithm needs to be aware of actions that it is capable of doing. It can use these actions to choose what to do. It also needs to know what objects it is able to interact with. These objects can be chosen to be used with certain actions to be at certain states. Once the algorithm knows everything of everything around it, it can start planning for an optimal path to the goal.

Implementing these algorithms into robots is extremely helpful because it can teach robots to reach certain goals without needing a completely different algorithm to be coded for the environment it’s in. The robot just needs to know the actions that are doable in that environment and the objects it can interact with. For example, a robot working at a storage warehouse could be given the knowledge that there are boxes stacked around the warehouse and that they can be moved. If a box needs to be retrieved, the robot can go pick up the box and bring it to the destination. If the desired box is covered in a stack of other boxes, the robot must plan the best way to move the other boxes to get to the desired box. The robot will know the action of moving boxes and will know that there are boxes in the world that it can interact with.

               This same robot could then be reprogrammed to vastly different tasks. The actions and objects of this different domain would be taught to the robot. For example, the robot could have a knowledge of gardening. It could know the act of planting, watering, and weeding. It could then have a goal of always having a clean and watered garden bed. The same algorithm would be used to find the best way to plant all the seeds and watering them. The robot would also know of certain preconditions that would have to be fulfilled. For example; an area of dirt must have a seed in it for it to be watered. Or an area of dirt must have a weed in it for the robot to act in removing the weed. These algorithms can adapt to very different domains as long as they know the required objects and actions they will be dealing with. 

Tuesday, October 18, 2016

Expert Systems in the Business World

I was dubious about the efficacy of expert systems when we talked about them in class last week. It seemed to me that an expert system would be well suited for simple yes and no problems. Acting more as a reflex agent then something truly intelligent. I didn't see a great difference between having an expert system and having an extensive run book, or an elaborate flow diagram. Even though both of these are very useful, why bother turning them into expert systems, instead of just having them remain as searchable text.

In fact, it seemed like there would be only very special use cases where the cost of developing an expert system would be practical versus the savings it could generate. Expert systems are not simple to build at all, and require a large amount of man hours in setting up and maintaining them. Not to mention that the labor needed for them is very skilled, there is only a select subset of the population capable of programming them. In other words, you need an expert to build expert systems, and experts are expensive.

Another issue would be getting the subject matter experts to cooperate with the programmer. It would not be surprising if some experts do not want a computer to know what they know; job security is an important aspect to some people. What happens for issues where experts disagree on how a situation should be resolved, if it happens once it might not be very problematic, but if it happens continuously one experts knowledge might need to be excluded from the system. Actually getting the knowledge is out of someone's brain is a much bigger issue then we made it seem. This is known as the knowledge acquisition problem, and it actually has deeper roots as a philosophical issue.

I see an expert system as having an inherent lack of flexibility, one mistruth in the knowledge base could pollute all possible solutions. Finding these would take up most of the developers time, again leading to more overhead. Trusting a knowledge base as complete is dangerous as things can change quickly, or Simpsons Paradox could take hold in a knowledge base. There is a strong reliance on the fact that the knowledge bases can be exhaustive. Yet, after studying state spaces for half a semester, one realizes just how large "exhaustive" can mean.

Even with my skepticism expert systems have seen some interest over the last forty years. Perhaps the most intriguing application of the expert system is the multiple attempts for medical diagnosis. There is a compelling reason for experts to want to help build the system, saving lives. And any cost involved in development can be justified due to the fact saving one life outweighs any monetary cost. The question is why didn't they catch on then? I haven't been able to find any argument as to why. It isn't clear if the business reasons were at fault, or if they in fact just didn't work. Maybe the reason was that people are just distrust-worthy of a computer as a doctor.

Monday, October 10, 2016

Is Art For Humans Only?

Google's Deep Dream is amazing, if you haven't seen any of the pictures it generates I recommend checking it out. To talk about the results of Deep Dream is simple, it is psychedelic art.

Deep Dream was essentially made as a way to visualize what happens in a convolutional neural network. At the time of conception (and still now) much of what happens inside a neural network is invisible to us, a black box function analyzing data. The goal of Deep Dream was to visualize what happens by measuring change via a picture. By training it to recognize faces and classify images the neural network could output what it thought it was recognizing. However, what this led to was crazy art, Dali-esque images on landscapes and backgrounds. I could speak more here about neural networks and back propagation, but I would rather focus on another aspect of this program.

Ignoring the fact that Deep Dream doesn't actually do anything, it is still interesting as a thought exercise about what is art, and does this qualify?

In class we have had the discussion numerous times about what is intelligence and how do we classify it, and I think that art is a very poignant topic. It doesn't have a pure function for existence, it is more of an existential quest. The fact that we can have computers create art (or something close to it) is a very strong indication to me about how far artificial intelligence has come. You could argue that Deep Dream still needs a seed image, and that might not be wrong. But I think the process it follows greatly mirrors the human creative process.

The fact Deep Dream had to train on large amounts of data to be able to "create" this images is just like an artist has to train for years before they master their medium. Not to mention, years of living as a human also trains the brain in some way to process things visually. And then is having a seed image truly that different from an idea? Every artist has to start from some idea, some initial concept that motivates them to bring the brush to the canvas, or charcoal to paper. The style might be similar for every picture, but aren't all artists famous for a style they pioneered?

I think from some perspectives Deep Dream has more in common with a humans creative process then people would like to admit.

Note I don't think Deep Dream is totally there in all aspects yet; however, it is quite a significant step towards artificial intelligence. It isn't measuring things in purely in what is the lowest cost to get from A to B, instead it is actually creating something novel and interesting.

Monday, October 3, 2016

Sentiment Analysis - a brief intro

One of the more interesting challenges being posed in NLP today is sentiment analysis. Sentiment analysis is the process of using text analysis and computational linguistics to extract information about that selected piece of language. Generally, the information being extracted is being tested against a predetermined sentiment; henceforth, why it is called sentiment analysis. An example of this might be running a communication letter through an analysis machine to determine the writers feelings towards the recipient of the letter. Another example might be examining a political article to measure the authors bias; if they are left leaning or conversly if they are right leaning.

Sentiment analysis is used in today mostly by companies on strongly controlled sanitized data sets. When given an easily measurable goal, sentiment analysis shines. If company want to measure reviews based of service in restaurants and they have a larger data set, say ten thousand reviews, it is easy to pick out say a positive review like "the food was delicious, and the service was prompt", versus a negative review like "meal was bland". The company can then draw conclusions about menus and service based off of the results of the analysis. Of course, the business value of the information is in the eye of the beholder, and it largely depends on the initial collection and categorization of the data.

Some of the techniques used for sentiment analysis are: pre-weighted techniques, statistical methods, and hybrid approach of the former two. By pre-weighted, that means that words have a pre-decided value attached to them. One generally also has a knowledge base to draw from, where all the values are stored and any relations between them. While, on the other hand statistical methods use techniques like support vector machines, or latent semantic analysis. The details behind both these methods are more complex; they both rely on statistical inference to judge the classification of a word. (Note: both these methods are worthy of their own blog posts, very fascinating stuff)

One the classical problems that sentiment analysis faces is a problem all NLP has, context. To cover what context is very quickly, when speaking (or reading) any language there is always some level of an implied situation. For example, if I said "I went down to the local store", there are some implied items to the recipient of the sentence. Firstly, we both probably have an agreed idea of what local means. A computer could possibly have a pre-progammed value of what local is, but it would have to differ greatly based off of many variables. For example, local in New York city probably differs from what local is to Durham NH. In addition, what type of store it is might be clear to a human based off of previous conversation, but the machine lacks the clues inferred from the context.

High-context and low-context are both found in languages, cultures have developed efficient communication with both methods. This strongly dictates what is implied when someone talks, and how much they actually have to say when communicating a point. Strangely enough, it is not clear which of high-context or low-context languages are easier to handle when parsing; because, both come with their own set of problems. In a high context language, where context is frequently implied but not mentioned, assuming the wrong context would be fatal. Where as in a low context language, context may switch without a sentiment analysis machine realizing it several times.

For example, if you ran all of Stephen Colbert's articles through a political sentiment analysis machine, it would come back with very right leaning scores. Similar to Rush Limbaugh, or some one on that side of the spectrum. The machine would lack the context to know when an article is satire, it doesn't have the context of the fact that Mr. Colbert is a comedian. Much of what he says is sardonic in nature. The machine lacks any notion of hyperbole or sarcasm; those human behaviors seem to be confusing even to other humans.

But still, the subtleties of human interaction in communication is still what makes sentiment analysis and NLP an interesting field. Or - I should say that is what makes humans interesting. There is much more to sentiment analysis then I have had time to go over today, I recommend anyone interested to do some of their own research. This is a loosely defined area where people have not come to a conclusion on best way to do things.

Tuesday, September 27, 2016

Genetic Algorithms

An interesting application of heuristics is something called a genetic algorithm, even though we have yet to talk about these in class (and probably will not), the topic is still very interesting.

It belongs to the evolutionary family of algorithms; meaning that as this algorithm runs it has continuous changes. The way a genetic algorithm works is by having an initial population of possible solutions and each can be represented as a "genome" (this allows for change of the organism). This population also needs to have a way to be evaluated for “fitness” of the genome when compared to the solution. The final ingredient is allowing the population to have a way to change, be it randomly or through interactions. Then by using the idea of mutation on the entire population, you can force evolution of the potential solutions. The population is subject to change iteration by iteration as the algorithm runs. On every new iteration the algorithm attempts to produce a new population that is (possibly) a better fit to the solution. Eventually, if all goes well there will be a solution found deemed to be acceptable set by the algorithms tolerance level. Or, you could simply run out of computation time.

This is particularly good for problems where one might have no idea in what part of the domain the optimal (or acceptable) solution may belong to. By using an initial stochastic distribution of the population throughout a search space, through evolution the algorithm has a higher chance of converging to a local minima/maixima in the solution. It should be noted that this technique is best used when a optimal solution isn’t needed, only one that is good enough to the user. In many cases like that it might be impossible to brute force an optimal solution, or simply take too much time.

Some of the downsides to this method is that the structure of the population and the evaluation method have to be defined by the programmer. Therefore any bias is hardcoded in from the start, and may be missing any crucial connections that weren’t known at the time. Of course, this problem seems to be present in all forms of research and design. Another problem of notice is that they tend to be used in discrete solution sets, looking for a solution. Which is a problem of setting up a finite domain for the solution, some things may not be well formed for this.

This class of algorithm sees great use everywhere, even though some new methods of thinking in evolutionary techniques have risen to prominence over the last decade. It still continues to work very well for many problems. Because, in this algorithms heart it is a optimization technique, so any problem well posed as a function is suited for genetic algorithms. It excels at things like scheduling tasks and financial analysis. It just has a much cooler sounding name then multivariable-optimization.

I think an interesting idea for a final project might be implementing a genetic algorithm. Finding an interesting problem to implement it on is another problem though.

Monday, September 26, 2016

Adversarial Search

Adversarial Search

Adversarial search, or Minimax search, is an algorithm to try to result with the best outcome of a situation while also accounting for the actions of an opponent. It is similar to other simpler search algorithms in that it creates a tree of the possible states and tries to walk down it. At every other level of the tree, the algorithm must account for the opponent. In most cases when the opponent is choosing, you expect the opponent to choose the worst outcome for you and the best outcome for them. This search has many uses and is applied in many games.
In games where minimax is applicable there can only be two players. A good example of a game for this is the game of checkers. It is a simple game so it is possible for a computer to compute all the possible moves in the game. If played correctly, the game is actually impossible to lose. A minimax algorithm is a great algorithm to use for this game. You can quantify the goals of the game. For example, if you win the game, you get infinite points. If you lose the game, you lose infinity points. If you lose a piece, you lose a point. If the opponent loses a piece, you gain a point. The minimax algorithm can then look down the tree to see where the possible moves are that would give it the most points.
We can assume that the opponent wants us to get the least number of points possible as well. To apply this to checkers for example, we wouldn’t want to move our piece in front of the opponent’s piece. This would allow for the opponent to immediately jump and remove our piece, decreasing our number of points.
There is a slight oddity of behavior though when encountered with a board that is impossible to win. Say you only have one piece left that is in a corner. The entire rest of the board, save a few empty spaces around you, are your opponent’s pieces. The algorithm won’t choose to fight as long as possible. It will just accept defeat and choose any path that might or might not lead straight towards your loss. This might not really matter because you’ve already analyzed every possible play on the board and have concluded that there is no way for you to win. But it is an interesting habit nonetheless. 

Monday, September 19, 2016

StarCraft AI Bots

One of the most interesting things that was mentioned in class last week was an AI competition; the AAIDE Starcraft AI Competition hosted by the University of Alberta. It sounded pretty cool to me, so I decided to check out some of the starter code that was posted for the competition. ( They give you some starter code for your bot, if you choose to use it.) There is a huge amount of what would seem to miscellaneous code to the non-StarCraft player. Things like managing workers who are building something, or making sure a proper amount of resources are being obtained for the player. Inside baseball knowledge seems to be necessary to pick out what is important and what isn’t. Perhaps the most entertaining part of the misceaallenous tasks to be encoded is that the dynamics for each race are different, even when building things.

One would assumes battle strategy has to differ greatly due to the fact each race has different types of army, but by extension everything needs to be re-coded for the three races. So it might be more accurate to say your are programming three AI’s instead of just one. While one build-order heuristic for one race might be great ( when to build what type of unit), it is completely useless for another race, because they don’t share any units or buildings.

 Another implication brought about by the management of the workers is that there is a huge amount of information that needs to be checked continuously about the game state. This is to the advantage of the computer as opposed to the human, seeing as computers excel at completing sequential trivial tasks. A little research reveals that the top Korean pro players have an APM (actions per minute) of around 300-400. I couldn’t find a breakdown as to how much of that is spent on infrastructure as opposed to actual battles. Amateurs seem to be all over the place in APM, but breaking 75 seems to be amazing. So the computer has a massive advantage here. Just based off this information about the limits of humans, it seems that computers would always be able to win.

But in fact, no. The top human players are still able to beat top AI’s consistently. In fact it seemed to be a complaint among better players that the AI that shipped with the game was laughably bad. Too predictable and easy to manipulate. These AI projects do not have the massive amounts of resources behind like Alpha Go does, but the fact remains real time strategy games are a much more complex problem.

My conclusion after reading through the starter code and documentation, is that a large amount of work goes into every bot to be programmed. Not only that — domain knowledge is crucial in this. It seems to me that there is a very small subset of people who would be able to successfully program this AI bot by themselves. It requires you to be an expert in the video game of StarCraft, as well as being well versed in AI techniques and algorithms. This seems to be a common thread when it comes to implementing most of the algorithms covered in class, deep domain knowledge is needed. 

So yes, in this case you could play video games for school work, in order to refine your domain knowledge.

For the actual meat of the AI algorithms used, check this page.


Tuesday, September 13, 2016

Thinking of better heuristics

I thought that the most challenging part of this weeks assignment was writing a good heuristic, after all it seems to be something that can always be improved upon. Furthermore, when designing a heuristic for a more complicated problem one might work better form of the problem, but then perform poorly on the same problem when another variable is introduced. The key to writing a good heuristic is to simplify the problem before analyzing it, but what to do when someone doesn't have a simple problem. Most problems can be represented in a way were exhaustible search is not a feasible option. So how can we improve?

Well - one idea is a hyper-heuristic, a sort of multi-level heuristic. The idea is that by having a set of possible heuristics you can fine tune. Instead of writing one mega-heuristic that tries to jam pack all possible information about the state into one, bundle them into smaller single heuristics. Similar to a multi-variable optimization problem, there could never be a solution (h*) found for the problem, but it can be approximated as much as possible. Instead of traditional numerical analysis methods being used to find the "roots", they instead use machine learning to tune the heuristic being made. By mixing and matching different type of heuristics that measure several aspects of the problem, you can hopefully build a more robust heuristic.

This is an intriguing approach to building a heuristic, instead of having to scrap all of the existing values, you can simply add the new heuristic to the hyper-heuristic. Then as your algorithm runs, the heuristic will tune itself to accommodate the added heuristic. Of course, this is assuming that the problem needs to be run many times (or their is time in between solutions for training); there would be no benefit for forming a set of heuristics for a solution that is trivial. Conversely, there is probably great value in having a set of heuristics that has been trained/tuned over a long period of time. If you have a hyper heuristic that works extremely well on a problem, there is a good chance it can re-purposed for other problems in the same set.

Considered to be closely related are heuristics that generate other heuristics. While the efficacy of those might be the same or more, as the domain expert you might lose out in terms of insight as to what is driving the solution. As one might want to try to re-weight the "equation" to see what difference it makes, if only for their own knowledge. The practical value in having insight into what is the major hang-up points in a solution can be more valuable then having a planner that silently works around them every time.

Monday, September 12, 2016

AI Pathfinding in Games

When it comes to having units in games make their way from one waypoint to another, there are many different approaches for finding the optimal route. A* is definitely the most popular pathing algorithm that developers get to choose from, but the problem is slightly more complicated than that. A* is still a fairly generic algorithm and game developers still need to tune the algorithm to work best with what their game does. Every developer has their own different game and different requirements. A* is still the best starting point and is implemented in many games.

Take a game like Civilization V for example. Civ V is a turn-based strategy game. It requires you to move units around a map made up of a hexagonal grid. Players move one unit at a time and units can’t collide. An algorithm like A* would work very well in this situation because the map is always static from the point of view of the single unit being moved. That unit could easily try to map a way through a hexagonal grid towards the goal, which would be the location the player wanted the unit to move to. This implementation works well because the world is static, though. Games where everything is always moving would require a different algorithm.

Blizzard’s real-time strategy game, Starcraft, is a good example of a dynamic map. In Starcraft, players move units around the map nonstop, multiple players moving units at the same time. If a player tells a unit to move across the map, the developer could continue to use the A* algorithm to move the unit across the map. But that would require the A* algorithm to compute an entirely new path whenever the unit encounters a new obstacle that it didn’t know of before. That would be quite inefficient and there are better algorithms to do this. Enter “Dynamic A*”, or D* for short. D* is much better suited for dynamic maps because it can change the cost of an arc as the algorithm runs. If it encounters an obstacle, it can dynamically search for other options to get to the goal.

In short, developers can use whatever algorithms they want to use for their AI pathfinding. A* is the gold standard for static pathfinding. Developers can customize A* to be what they need. D* works very well for dynamic solutions and maps that are constantly changing because it can recalculate the best path as it receives new information. 

Monday, September 5, 2016

Natural Language Generation Final Project Idea

We haven’t gotten into any Natural Language Processing (and wouldn’t until the end of this class - if we do), but I wanted to talk about it anyways.

For my project, I was thinking of doing something along the lines of Natural Language Generation with a functional language, namely Clojure. I feel like the expressive tendencies of functional languages fit well with human languages, and this could be something interesting to explore.

What is Natural Language Generation? Well, to start it is a sub-field of Natural Language Processing, but where NLP (as a whole) is concerned with understand human speech NLG would be concerned with sounding human.  An example of this we have already seen in class would be Siri. It needed to parse out the information from the question which would be the understanding, but then the response would be the generation aspect of the problem. So, some NLG projects might be getting a robot to narrate what it is doing to a human, or having a kiosk at a mall interact with a customer via generated voice.

More interesting than just having a one to one mapping from actions to phrases, would be having a program that summarizes data. One of the big problems with the large amounts data companies have nowadays, is that it is hard to draw conclusions from raw data, but there is no story that frames the data for the end user. Problems like creating summaries for videos and sound clips are common topics for Computer Scientists to tackle.

So my idea is this: Make a program that takes all the news from an aggregator on one very specific topic, for a period of time, and condense the most important ideas to summarize over one page. I would limit the domain to start with to one very specific topic, with  six months for my time period.

This is mildly ambitious, a problem that most of my ideas suffer from. But, I think this would probably touch upon many areas of this class. To go beyond simple template driven text generation will require lots of filtering along with the intelligence to pick out the important parts. I think though if I can get away with using libraries for some of the heavy lifting in text processing (breaking everything into n-grams) and sentiment analysis, that would let me focus on the text generation part.  The problem would be to choose the right algorithms at this step.

I know people have had success using convolutional neural networks to pick out sentiment from text, and that would give the program enough information to pick out what is important and what isn’t. The only problem is a neural network might become to heavy for this. I was talking to a developer who works for a company that does sentiment analysis APIs and he said that sometimes a simple node making snap decisions can work better. It's tough to tell what to use because I personally don't know enough about this part of the system yet. Frequency of something being mentioned and the sentiment attached to it would probably be the most important details for judging if that piece of information makes it into the summary.

Another challenge for this would be how to avoid keeping the summary from becoming something of a fill in the blanks, instead having a more organic feel. Most of what I have read so far on this topic seems be template based, so there is more research to be done on this topic.

In any case, I think making something from nothing is always interesting. One of my goals in taking this class was to be able to take on projects like this on my own!

Tuesday, August 30, 2016


This blog is for students in CS 730 Intro to AI at UNH, especially those who would like writing-intensive credit.  Three kinds of posts are encouraged: 1) design idea for your final project (which kind of algorithm is appropriate for your problem and why, what modifications would improve it for your application), 2) critiques and extensions of something we've covered in class (state-space search is too limited because it doesn't handle...), 3) ideas on how something we talked about in class might apply to a real-world problem.  Please don't post entries that are just how you felt about something without any analysis or argumentation or entries that could have been written by someone who hadn't actually taken the class and studied the material.