In my recent e-travels searching for a parsing solution in a project, I stumbled across a very cool NLP parser for available under the full GNU GPL. It is called the Stanford Parser, and it parses out the grammatical structure of sentences from context free languages. That means it takes down a sentences into it’s individual pieces, building it into abstract syntax tree. It should be mentioned that it is probabilistic, so it gives you the most likely derivation, instead of an absolute one. In fact, if you specify an amount, it will print that number of highest-scoring possible parses for a sentence. For any given sentence there might be thousands of possible parses, creating a state space too large to exhaustively search. There is also the ability to train the parser, if you have your own corpus lying around that you specifically need to use.
An example parse would be -
Warren enjoys parsing words for fun.
(ROOT
(S
(NP (NNP Warren))
(VP (VBZ enjoys)
(S
(VP (VBG parsing)
(PRT (RP out))
(NP
(NP (NNS words))
(PP (IN for)
(NP (NN fun)))))))
(. .)))
As you can see it is in somewhat of a LISP list format, it is actually referred to as a treebank by linguists. This specific format is the Penn-Treebank, and without getting too much into the inside-baseball, it is tagging each part of the sentence with its linguistic element. For example looking at
(RP out)
RP is simply a particle, so out is a particle in this representation of the parse tree.
(VBZ enjoys)
VBZ refers to, Verb (VB) 3rd person singular present (Z). Again, this is the how the word enjoys could be represented in a parse tree. Also, the Stanford Parser supports other models, such as simple tagging of each word in the sentence, or measuring the dependencies of the sentence (other linguistic measures of interest).
Perhaps the best part about the parser though is that by using different tree banks, you can parse with different languages (There is a version for Chinese, German and Arabic linked from the website). So interestingly enough this parser is language independent. I don’t know enough about linguistics as to why, but I am guessing any spoken language can be represented by a treebank (anything with a context free grammar). It should also be noted that treebanks can be set up to represent things differently using the same language. There can be any number of treebanks set up for a language, if a computational linguists was looking to measure something differently.
So what can you use this for? Well, any task where you need to parse out complex meaning from language. It is too heavy duty for parsing simple structured commands from English (ANTLR would still be the best for that task). But if you need to do machine translation or relationship analysis, you are going to need something this heavy-duty.
Thoughts and comments from students at the University of New Hampshire on artificial intelligence.
Monday, October 31, 2016
Tuesday, October 25, 2016
Facial Recognition
Facial recognition is an interesting topic in Artificial Intelligence. In this subfield there is math being used from multiple topics, some heavy Linear Algebra and Bayesian Statistics. It is also important because facial recognition is used everywhere today, from Facebook identify your friends in a photo, to Google street view blurring out someones face. The heavy use of these algorithms brings up ethical issues, such as privacy concerns and data rights. Todays blog post however, will focus on the underlying technology that some of the algorithms are based on.
First and foremost, there must be way to mathematically break down someones face, in order to quantify it. One of the main ways this is done is through a method called principal component analysis (PCA). Without getting to much into the math, what principal component analysis does is it breaks down the face into a linear basis based off of pieces (or the features) of a humans face. By representing the face this way, you have a way of measuring the individual magnitude of the features a persons face. This allows us to have a possible unique representation of a face as a linear basis. PCA is closely related to using a Singular Valued Decomposition as a form of measurement. Note there are other methods, like independent component analysis or simply using the Eigen face (a face represented by Eigen values) that rely upon the same methodology of representing a face as a linear basis.
The second part to the aforementioned algorithms is using some sort of statistical analysis on the newly acquired linear basis. The statistical can be some sort of Bayesian methodology, wether it be a simple Hidden Markov Model or a full featured network. Which can either be achieved though unsupervised or supervised data set processing, depending on the choice made by the developer. Like all training methods, the recogintion made by the algorithms is therefore dependent on the data sets used.
The only method I have seen found that tends to break from decomposing a face into a linear basis is a method called Dynamic Link Matching, which instead decomposes the face into a graph made by stretching the face over a 2d lattice layer by layer. However, this is still in some putting the information into layers (you could argue that is dimensions), simply this method is not concerned with maintaining orthogonality. It just creates a 'blob' which is then fed through a neural network, and has some similar statistical analysis run on it which matches it to the closest model. This method leverages the power a neural network gives simply by passing off the face as a 'blob' of information.
Some of the challenge facial recognitions has are still similar to the challenges humans have, such as poor viewing angles, improper lighting and an object obstructing someones face. This is because if you are missing any region of the face, the principal component analysis will be off. If the decomposition used is missing information about the domain, it is invalid. Only having one side of the face completely foils everything. Another thing that can throw off the principal component analysis is if someone is making a exaggerated gesture of their face. These algorithms assume that people faces are more less static, not changing very much.
First and foremost, there must be way to mathematically break down someones face, in order to quantify it. One of the main ways this is done is through a method called principal component analysis (PCA). Without getting to much into the math, what principal component analysis does is it breaks down the face into a linear basis based off of pieces (or the features) of a humans face. By representing the face this way, you have a way of measuring the individual magnitude of the features a persons face. This allows us to have a possible unique representation of a face as a linear basis. PCA is closely related to using a Singular Valued Decomposition as a form of measurement. Note there are other methods, like independent component analysis or simply using the Eigen face (a face represented by Eigen values) that rely upon the same methodology of representing a face as a linear basis.
The second part to the aforementioned algorithms is using some sort of statistical analysis on the newly acquired linear basis. The statistical can be some sort of Bayesian methodology, wether it be a simple Hidden Markov Model or a full featured network. Which can either be achieved though unsupervised or supervised data set processing, depending on the choice made by the developer. Like all training methods, the recogintion made by the algorithms is therefore dependent on the data sets used.
The only method I have seen found that tends to break from decomposing a face into a linear basis is a method called Dynamic Link Matching, which instead decomposes the face into a graph made by stretching the face over a 2d lattice layer by layer. However, this is still in some putting the information into layers (you could argue that is dimensions), simply this method is not concerned with maintaining orthogonality. It just creates a 'blob' which is then fed through a neural network, and has some similar statistical analysis run on it which matches it to the closest model. This method leverages the power a neural network gives simply by passing off the face as a 'blob' of information.
Some of the challenge facial recognitions has are still similar to the challenges humans have, such as poor viewing angles, improper lighting and an object obstructing someones face. This is because if you are missing any region of the face, the principal component analysis will be off. If the decomposition used is missing information about the domain, it is invalid. Only having one side of the face completely foils everything. Another thing that can throw off the principal component analysis is if someone is making a exaggerated gesture of their face. These algorithms assume that people faces are more less static, not changing very much.
Monday, October 24, 2016
Automated Planning
STRIPS was an automated planner developed at Stanford University
and which paved the way for the beginnings of automated planning AI. Automated
planning allows planners to make their way through an environment or a task
using the same algorithm to solve completely different tasks. This can have
many applications all over the world but is most commonly used for robotics.
An
automated planning algorithm needs to know certain information about the world it
is functioning in to be able to plan correctly. The algorithm needs to be aware
of actions that it is capable of doing. It can use these actions to choose what
to do. It also needs to know what objects it is able to interact with. These
objects can be chosen to be used with certain actions to be at certain states.
Once the algorithm knows everything of everything around it, it can start
planning for an optimal path to the goal.
Implementing these algorithms into
robots is extremely helpful because it can teach robots to reach certain goals
without needing a completely different algorithm to be coded for the
environment it’s in. The robot just needs to know the actions that are doable
in that environment and the objects it can interact with. For example, a robot
working at a storage warehouse could be given the knowledge that there are
boxes stacked around the warehouse and that they can be moved. If a box needs
to be retrieved, the robot can go pick up the box and bring it to the
destination. If the desired box is covered in a stack of other boxes, the robot
must plan the best way to move the other boxes to get to the desired box. The
robot will know the action of moving boxes and will know that there are boxes
in the world that it can interact with.
This
same robot could then be reprogrammed to vastly different tasks. The actions
and objects of this different domain would be taught to the robot. For example,
the robot could have a knowledge of gardening. It could know the act of planting,
watering, and weeding. It could then have a goal of always having a clean and
watered garden bed. The same algorithm would be used to find the best way to
plant all the seeds and watering them. The robot would also know of certain
preconditions that would have to be fulfilled. For example; an area of dirt
must have a seed in it for it to be watered. Or an area of dirt must have a
weed in it for the robot to act in removing the weed. These algorithms can
adapt to very different domains as long as they know the required objects and
actions they will be dealing with.
Tuesday, October 18, 2016
Expert Systems in the Business World
I was dubious about the efficacy of expert systems when we talked about them in class last week. It seemed to me that an expert system would be well suited for simple yes and no problems. Acting more as a reflex agent then something truly intelligent. I didn't see a great difference between having an expert system and having an extensive run book, or an elaborate flow diagram. Even though both of these are very useful, why bother turning them into expert systems, instead of just having them remain as searchable text.
In fact, it seemed like there would be only very special use cases where the cost of developing an expert system would be practical versus the savings it could generate. Expert systems are not simple to build at all, and require a large amount of man hours in setting up and maintaining them. Not to mention that the labor needed for them is very skilled, there is only a select subset of the population capable of programming them. In other words, you need an expert to build expert systems, and experts are expensive.
Another issue would be getting the subject matter experts to cooperate with the programmer. It would not be surprising if some experts do not want a computer to know what they know; job security is an important aspect to some people. What happens for issues where experts disagree on how a situation should be resolved, if it happens once it might not be very problematic, but if it happens continuously one experts knowledge might need to be excluded from the system. Actually getting the knowledge is out of someone's brain is a much bigger issue then we made it seem. This is known as the knowledge acquisition problem, and it actually has deeper roots as a philosophical issue.
I see an expert system as having an inherent lack of flexibility, one mistruth in the knowledge base could pollute all possible solutions. Finding these would take up most of the developers time, again leading to more overhead. Trusting a knowledge base as complete is dangerous as things can change quickly, or Simpsons Paradox could take hold in a knowledge base. There is a strong reliance on the fact that the knowledge bases can be exhaustive. Yet, after studying state spaces for half a semester, one realizes just how large "exhaustive" can mean.
Even with my skepticism expert systems have seen some interest over the last forty years. Perhaps the most intriguing application of the expert system is the multiple attempts for medical diagnosis. There is a compelling reason for experts to want to help build the system, saving lives. And any cost involved in development can be justified due to the fact saving one life outweighs any monetary cost. The question is why didn't they catch on then? I haven't been able to find any argument as to why. It isn't clear if the business reasons were at fault, or if they in fact just didn't work. Maybe the reason was that people are just distrust-worthy of a computer as a doctor.
In fact, it seemed like there would be only very special use cases where the cost of developing an expert system would be practical versus the savings it could generate. Expert systems are not simple to build at all, and require a large amount of man hours in setting up and maintaining them. Not to mention that the labor needed for them is very skilled, there is only a select subset of the population capable of programming them. In other words, you need an expert to build expert systems, and experts are expensive.
Another issue would be getting the subject matter experts to cooperate with the programmer. It would not be surprising if some experts do not want a computer to know what they know; job security is an important aspect to some people. What happens for issues where experts disagree on how a situation should be resolved, if it happens once it might not be very problematic, but if it happens continuously one experts knowledge might need to be excluded from the system. Actually getting the knowledge is out of someone's brain is a much bigger issue then we made it seem. This is known as the knowledge acquisition problem, and it actually has deeper roots as a philosophical issue.
I see an expert system as having an inherent lack of flexibility, one mistruth in the knowledge base could pollute all possible solutions. Finding these would take up most of the developers time, again leading to more overhead. Trusting a knowledge base as complete is dangerous as things can change quickly, or Simpsons Paradox could take hold in a knowledge base. There is a strong reliance on the fact that the knowledge bases can be exhaustive. Yet, after studying state spaces for half a semester, one realizes just how large "exhaustive" can mean.
Even with my skepticism expert systems have seen some interest over the last forty years. Perhaps the most intriguing application of the expert system is the multiple attempts for medical diagnosis. There is a compelling reason for experts to want to help build the system, saving lives. And any cost involved in development can be justified due to the fact saving one life outweighs any monetary cost. The question is why didn't they catch on then? I haven't been able to find any argument as to why. It isn't clear if the business reasons were at fault, or if they in fact just didn't work. Maybe the reason was that people are just distrust-worthy of a computer as a doctor.
Monday, October 10, 2016
Is Art For Humans Only?
Google's Deep Dream is amazing, if you haven't seen any of the pictures it generates I recommend checking it out. To talk about the results of Deep Dream is simple, it is psychedelic art.
Deep Dream was essentially made as a way to visualize what happens in a convolutional neural network. At the time of conception (and still now) much of what happens inside a neural network is invisible to us, a black box function analyzing data. The goal of Deep Dream was to visualize what happens by measuring change via a picture. By training it to recognize faces and classify images the neural network could output what it thought it was recognizing. However, what this led to was crazy art, Dali-esque images on landscapes and backgrounds. I could speak more here about neural networks and back propagation, but I would rather focus on another aspect of this program.
Ignoring the fact that Deep Dream doesn't actually do anything, it is still interesting as a thought exercise about what is art, and does this qualify?
In class we have had the discussion numerous times about what is intelligence and how do we classify it, and I think that art is a very poignant topic. It doesn't have a pure function for existence, it is more of an existential quest. The fact that we can have computers create art (or something close to it) is a very strong indication to me about how far artificial intelligence has come. You could argue that Deep Dream still needs a seed image, and that might not be wrong. But I think the process it follows greatly mirrors the human creative process.
The fact Deep Dream had to train on large amounts of data to be able to "create" this images is just like an artist has to train for years before they master their medium. Not to mention, years of living as a human also trains the brain in some way to process things visually. And then is having a seed image truly that different from an idea? Every artist has to start from some idea, some initial concept that motivates them to bring the brush to the canvas, or charcoal to paper. The style might be similar for every picture, but aren't all artists famous for a style they pioneered?
I think from some perspectives Deep Dream has more in common with a humans creative process then people would like to admit.
Note I don't think Deep Dream is totally there in all aspects yet; however, it is quite a significant step towards artificial intelligence. It isn't measuring things in purely in what is the lowest cost to get from A to B, instead it is actually creating something novel and interesting.
Deep Dream was essentially made as a way to visualize what happens in a convolutional neural network. At the time of conception (and still now) much of what happens inside a neural network is invisible to us, a black box function analyzing data. The goal of Deep Dream was to visualize what happens by measuring change via a picture. By training it to recognize faces and classify images the neural network could output what it thought it was recognizing. However, what this led to was crazy art, Dali-esque images on landscapes and backgrounds. I could speak more here about neural networks and back propagation, but I would rather focus on another aspect of this program.
Ignoring the fact that Deep Dream doesn't actually do anything, it is still interesting as a thought exercise about what is art, and does this qualify?
In class we have had the discussion numerous times about what is intelligence and how do we classify it, and I think that art is a very poignant topic. It doesn't have a pure function for existence, it is more of an existential quest. The fact that we can have computers create art (or something close to it) is a very strong indication to me about how far artificial intelligence has come. You could argue that Deep Dream still needs a seed image, and that might not be wrong. But I think the process it follows greatly mirrors the human creative process.
The fact Deep Dream had to train on large amounts of data to be able to "create" this images is just like an artist has to train for years before they master their medium. Not to mention, years of living as a human also trains the brain in some way to process things visually. And then is having a seed image truly that different from an idea? Every artist has to start from some idea, some initial concept that motivates them to bring the brush to the canvas, or charcoal to paper. The style might be similar for every picture, but aren't all artists famous for a style they pioneered?
I think from some perspectives Deep Dream has more in common with a humans creative process then people would like to admit.
Note I don't think Deep Dream is totally there in all aspects yet; however, it is quite a significant step towards artificial intelligence. It isn't measuring things in purely in what is the lowest cost to get from A to B, instead it is actually creating something novel and interesting.
Monday, October 3, 2016
Sentiment Analysis - a brief intro
One of the more interesting challenges being posed in NLP today is sentiment analysis. Sentiment analysis is the process of using text analysis and computational linguistics to extract information about that selected piece of language. Generally, the information being extracted is being tested against a predetermined sentiment; henceforth, why it is called sentiment analysis. An example of this might be running a communication letter through an analysis machine to determine the writers feelings towards the recipient of the letter. Another example might be examining a political article to measure the authors bias; if they are left leaning or conversly if they are right leaning.
Sentiment analysis is used in today mostly by companies on strongly controlled sanitized data sets. When given an easily measurable goal, sentiment analysis shines. If company want to measure reviews based of service in restaurants and they have a larger data set, say ten thousand reviews, it is easy to pick out say a positive review like "the food was delicious, and the service was prompt", versus a negative review like "meal was bland". The company can then draw conclusions about menus and service based off of the results of the analysis. Of course, the business value of the information is in the eye of the beholder, and it largely depends on the initial collection and categorization of the data.
Some of the techniques used for sentiment analysis are: pre-weighted techniques, statistical methods, and hybrid approach of the former two. By pre-weighted, that means that words have a pre-decided value attached to them. One generally also has a knowledge base to draw from, where all the values are stored and any relations between them. While, on the other hand statistical methods use techniques like support vector machines, or latent semantic analysis. The details behind both these methods are more complex; they both rely on statistical inference to judge the classification of a word. (Note: both these methods are worthy of their own blog posts, very fascinating stuff)
One the classical problems that sentiment analysis faces is a problem all NLP has, context. To cover what context is very quickly, when speaking (or reading) any language there is always some level of an implied situation. For example, if I said "I went down to the local store", there are some implied items to the recipient of the sentence. Firstly, we both probably have an agreed idea of what local means. A computer could possibly have a pre-progammed value of what local is, but it would have to differ greatly based off of many variables. For example, local in New York city probably differs from what local is to Durham NH. In addition, what type of store it is might be clear to a human based off of previous conversation, but the machine lacks the clues inferred from the context.
High-context and low-context are both found in languages, cultures have developed efficient communication with both methods. This strongly dictates what is implied when someone talks, and how much they actually have to say when communicating a point. Strangely enough, it is not clear which of high-context or low-context languages are easier to handle when parsing; because, both come with their own set of problems. In a high context language, where context is frequently implied but not mentioned, assuming the wrong context would be fatal. Where as in a low context language, context may switch without a sentiment analysis machine realizing it several times.
For example, if you ran all of Stephen Colbert's articles through a political sentiment analysis machine, it would come back with very right leaning scores. Similar to Rush Limbaugh, or some one on that side of the spectrum. The machine would lack the context to know when an article is satire, it doesn't have the context of the fact that Mr. Colbert is a comedian. Much of what he says is sardonic in nature. The machine lacks any notion of hyperbole or sarcasm; those human behaviors seem to be confusing even to other humans.
But still, the subtleties of human interaction in communication is still what makes sentiment analysis and NLP an interesting field. Or - I should say that is what makes humans interesting. There is much more to sentiment analysis then I have had time to go over today, I recommend anyone interested to do some of their own research. This is a loosely defined area where people have not come to a conclusion on best way to do things.
Sentiment analysis is used in today mostly by companies on strongly controlled sanitized data sets. When given an easily measurable goal, sentiment analysis shines. If company want to measure reviews based of service in restaurants and they have a larger data set, say ten thousand reviews, it is easy to pick out say a positive review like "the food was delicious, and the service was prompt", versus a negative review like "meal was bland". The company can then draw conclusions about menus and service based off of the results of the analysis. Of course, the business value of the information is in the eye of the beholder, and it largely depends on the initial collection and categorization of the data.
Some of the techniques used for sentiment analysis are: pre-weighted techniques, statistical methods, and hybrid approach of the former two. By pre-weighted, that means that words have a pre-decided value attached to them. One generally also has a knowledge base to draw from, where all the values are stored and any relations between them. While, on the other hand statistical methods use techniques like support vector machines, or latent semantic analysis. The details behind both these methods are more complex; they both rely on statistical inference to judge the classification of a word. (Note: both these methods are worthy of their own blog posts, very fascinating stuff)
One the classical problems that sentiment analysis faces is a problem all NLP has, context. To cover what context is very quickly, when speaking (or reading) any language there is always some level of an implied situation. For example, if I said "I went down to the local store", there are some implied items to the recipient of the sentence. Firstly, we both probably have an agreed idea of what local means. A computer could possibly have a pre-progammed value of what local is, but it would have to differ greatly based off of many variables. For example, local in New York city probably differs from what local is to Durham NH. In addition, what type of store it is might be clear to a human based off of previous conversation, but the machine lacks the clues inferred from the context.
High-context and low-context are both found in languages, cultures have developed efficient communication with both methods. This strongly dictates what is implied when someone talks, and how much they actually have to say when communicating a point. Strangely enough, it is not clear which of high-context or low-context languages are easier to handle when parsing; because, both come with their own set of problems. In a high context language, where context is frequently implied but not mentioned, assuming the wrong context would be fatal. Where as in a low context language, context may switch without a sentiment analysis machine realizing it several times.
For example, if you ran all of Stephen Colbert's articles through a political sentiment analysis machine, it would come back with very right leaning scores. Similar to Rush Limbaugh, or some one on that side of the spectrum. The machine would lack the context to know when an article is satire, it doesn't have the context of the fact that Mr. Colbert is a comedian. Much of what he says is sardonic in nature. The machine lacks any notion of hyperbole or sarcasm; those human behaviors seem to be confusing even to other humans.
But still, the subtleties of human interaction in communication is still what makes sentiment analysis and NLP an interesting field. Or - I should say that is what makes humans interesting. There is much more to sentiment analysis then I have had time to go over today, I recommend anyone interested to do some of their own research. This is a loosely defined area where people have not come to a conclusion on best way to do things.
Subscribe to:
Posts (Atom)