Facial recognition is an interesting topic in Artificial Intelligence. In this subfield there is math being used from multiple topics, some heavy Linear Algebra and Bayesian Statistics. It is also important because facial recognition is used everywhere today, from Facebook identify your friends in a photo, to Google street view blurring out someones face. The heavy use of these algorithms brings up ethical issues, such as privacy concerns and data rights. Todays blog post however, will focus on the underlying technology that some of the algorithms are based on.
First and foremost, there must be way to mathematically break down someones face, in order to quantify it. One of the main ways this is done is through a method called principal component analysis (PCA). Without getting to much into the math, what principal component analysis does is it breaks down the face into a linear basis based off of pieces (or the features) of a humans face. By representing the face this way, you have a way of measuring the individual magnitude of the features a persons face. This allows us to have a possible unique representation of a face as a linear basis. PCA is closely related to using a Singular Valued Decomposition as a form of measurement. Note there are other methods, like independent component analysis or simply using the Eigen face (a face represented by Eigen values) that rely upon the same methodology of representing a face as a linear basis.
The second part to the aforementioned algorithms is using some sort of statistical analysis on the newly acquired linear basis. The statistical can be some sort of Bayesian methodology, wether it be a simple Hidden Markov Model or a full featured network. Which can either be achieved though unsupervised or supervised data set processing, depending on the choice made by the developer. Like all training methods, the recogintion made by the algorithms is therefore dependent on the data sets used.
The only method I have seen found that tends to break from decomposing a face into a linear basis is a method called Dynamic Link Matching, which instead decomposes the face into a graph made by stretching the face over a 2d lattice layer by layer. However, this is still in some putting the information into layers (you could argue that is dimensions), simply this method is not concerned with maintaining orthogonality. It just creates a 'blob' which is then fed through a neural network, and has some similar statistical analysis run on it which matches it to the closest model. This method leverages the power a neural network gives simply by passing off the face as a 'blob' of information.
Some of the challenge facial recognitions has are still similar to the challenges humans have, such as poor viewing angles, improper lighting and an object obstructing someones face. This is because if you are missing any region of the face, the principal component analysis will be off. If the decomposition used is missing information about the domain, it is invalid. Only having one side of the face completely foils everything. Another thing that can throw off the principal component analysis is if someone is making a exaggerated gesture of their face. These algorithms assume that people faces are more less static, not changing very much.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.