Fun With Artificial Neural Networks



Introduction

For centuries, man(1) has wondered if he could ever build an artificial version of himself, a machine that could think, reason, and most importantly learn as he did. Not until the advent of the digital computer has this dream been brought so closely to fruition. Today computers capable of nearly one billion instructions per second sit on the desktops of millions of people around the world, both at work and at home, yet we still don't seem much closer to an almost-human computer. An important computing algorithm in the arsenal of Artificial Intelligence (AI) research, the Artificial Neural Network (ANN), is the topic of this series of articles and hopes to change this state of affairs by explaining in simple terms some of the theory behind the ANN and walking through its design.



This article assumes a basic knowledge in some procedural computer language. I chose to use C++ for my designs for its object-orientedness and my familiarity with the language, but I will try to make the code as language-independent as possible. There is also a fairly decent amount of math knowledge needed to understand how a neural net is trained, involving some calculus, but this can be somewhat ignored by understanding how the network works. Hopefully by the end of this article you will be able to construct an example neural network and start showing your computer how to learn.



What is a Neural Network?

An artificial neural network, or 'neural net' for short(2), is a network of computing elements designed to accept a set of stimuli as input, process that stimuli, and return some set of stimuli as output. This black box view of a neural net stresses its resemblance to a function in a programming language; in this respect, there doesn't seem to be very much special about this functional unit. As an example problem to pass to this black box, say we want to determine from a black&white photograph of a person's face, whether the person is male or female. We could make the inputs to our black box correspond to the darkness levels of every pixel in a scan of the picture (say a 640x480x8 bit, or 307,200 pixel x 256 values/pixel scan), and the outputs could consist of two outputs, one for female and one for male. If the black box thinks the input corresponds to a female, the 'female' output would be 1.0 (or near 1.0) and the 'male' output would be near 0.0 (vice-versa if the box thinks it's a male at the input). All we need is an algorithm to stick into the box that can 'magically determine this characteristic.



We often solve problems by trying to discover an algorithm for solving the problem, then implementing that algorithm. To play a compact disc we reason that we must turn the CD into sound, we know we have a device to do this, and that the CD must be placed into the device and it must be told to play it. If we had to automate this process, all we have to do is look at the above steps, or algorithm, and turn it into a mechanism that follows the same steps. The same holds for our Gender Classifier example; generally a programmer first determines how he or she determines a person's gender from a picture, then attempts to code these features into an algorithm. There are several features we humans use to categorize people by gender: makeup, hair length, facial hair, bone structure, ornamentation, etc. A programmer would have the daunting task of sifting through the 300k pixels of data looking for features; the resulting code could easily become huge, specifying every possible combination of features, yet not being able to correctly classify many new pictures. And then someone says, "Why not teach the algorithm to perform the task itself?"



Neurons

A neural network is modelled after how we thought biological neurons work and store information. Basically a neuron is a single cell which accepts stimuli from other nearby neurons (its input) and, if the stimuli are above a certain level, fires a stimulus to nearby listening neurons (its output). When you accidentally touch a hot stove, the pain nerve cells generate stimuli which are transmitted to listening nerve cells. That stimuli would be really large, causing the listening nerves to exceed a certain level and making them fire, causing the nerves listening to them to fire, etc., etc., until the signal reaches your spinal cord, a long, complex bundle of nerves. Certain nerves would fire according to this arriving signal, and others wouldn't. The ones that would fire happen to belong to the nerve path leading to the muscles in your arm, causing you to jerk your hand out of the way. It's this selective firing of neurons that is thought to be where information is stored and learning takes place, and this is what artificial neural networks attempt to model.



A neural network is simply a network of neurons. For our artificial neural network we have to model a single neuron, then connect several of them together in some sort of network. The basic characteristics of a neuron are as follows:



This model of a neuron is called a perceptron. The leftmost lines are inputs, numbered x0 through xn-1. They each accept a stimulus input represented by a real number between 0.0 and 1.0 usually. Each input is scaled by its own real number scaling value w0 through wn-1 called a weight by multiplying an input x value by its corresponding w value. Each perceptron maintains its set of weights, and they are allowed to change. In fact, the weights are where information in a neural net is stored, by providing the perceptron a set of inputs and adjusting the weights so that it outputs the desired output. After each input is scaled by its weight, the resulting products are summed to get an excitation value. This value determines whether the perceptron fires; if over a certain value, the perceptron outputs 1.0, otherwise it outputs 0.0. This is the job of the output function, which in its simplest variation is the f(x)=sign(x/2.0 + 0.5). This function outputs 1.0 if x >= 0.0 and 0.0 if x < 0.0 (if sign(x) = 1.0 for x=0.0). A problem with this function is that it's not continuous at x=0, making it non-differentiable. The output function needs to have a derivative if our training function is going to work later on. Other functions use ex to approximate this simple output function with a curve that has a derivative; we'll look at these later.



An example should make it easier to see how a perceptron works. Let's train a simple 2-input perceptron to behave like an AND gate, outputting a 1.0 when the inputs are both 1.0. The weights w0 and w1 are initialized to random, small numbers. Our objective is to find the values of w0 and w1 such that the inputs of a training set times the weights, summed and passed to the output function equal the corresponding outputs of the training set. That is, for each pair (x0, x1) in the set {(0,0), (0,1), (1,0), (1,1)} (the inputs of the training set), calculating the output yields the corresponding member of the set {0, 0, 0, 1} (this is the AND function).



We can derive the generic equation represented by our example 2-input perceptron pretty simply. The output is equal to the sum of the products of the inputs to the perceptron and their corresponding weights, taking this sum and applying it to the output function:



x0w0 + x1w1 = sum

f(sum) = output

output = f(x0w0 + x1w1)



We have a definition for f(), a set of four (x0, x1) input pairs from our training set, and a set of four training set outputs that correspond to the input pairs; all we need now is to find w0 and w1 such that the output is correct for every input pair in the training set. Let's pick some random values for the weights, say 0.3 and -0.1, and find the actual outputs from our training input pairs:

x0 x1 desired output actual output
0 0 0 0
0 1 0 0
1 0 0 1
1 1 1 1


Looking at the actual outputs, our guesses for weights were pretty good, except that the third training example didn't yield the desired output. In fact, if you try to find a pair of real-valued weights that will make our perceptron perform the AND function you won't be able to do it. (Can you see why?) Let's plot a graph of the AND function, with the inputs being the X and Y axes and placing '0' on the graph where the output is 0 and '1' where the output is 1:

This graph diagrams the AND function. There are 3 '0's on the graph corresponding to the three inputs that yield 0, and one '1' for the one input that ouputs 1 (using the name 'x' for both axes of a graph might take a bif of getting used to). Our perceptron has to somehow differentiate between those two classes of outputs (1 and 0) based on the inputs they correspond to (the axes). In other words, it must somehow divide this graph into two different regions, so that some inputs yield 0 and all the rest yield 1. This graph shows a possible line dividing the graph into two regions. The '1' region corresponds to all inputs to the above function that yield 1, similarly with the '0' region. Now this possible division would make a correct AND function, but our perceptron cannot (yet) divide the AND graph like this; in order to see why we have to look at how the perceptron categorizes inputs by separating the input graph into regions.



The output of the graph depends on what goes into our output function f(x). If x is positive, f(x) returns 1; if x is negative, f(x) returns 0. So x determines what side of the line in the above graph we are on. The line separating the two regions corresponds to x=0 (the f(x) we defined above actually categorizes x=0 as 0.5 if sign(x) returns 0, but this shouldn't change our analysis). Therefore the argument to our output function, (x0w0 + x1w1), determines our regions, as well as our line. To find out what line our perceptron can represent, all we have to do is set (x0w0 + x1w1)=0 and solve for x1:

x0w0 + x1w1 = 0

x1w1 = -x0w0

x1 = -x0(w0/w1)

x1 = (-w0/w1) x0 + 0

In this form, x1 corresponds to the Y axis and x0 to the X axis. This is the form of the equation for a line; from this equation we can see that the slope of the dividing line of our perceptron is -w0/w1, but the y-intercept is 0. What this shows is that any dividing line that our perceptron could possibly generate for any combination of weights has to pass through the origin! There is no way to draw a line through the origin of the AND graph and separate the AND graph into two regions where all the '0's are on one side and all the '1's are on the other. What we need is a perceptron that can represent any line (This isn't hard...can you see how to do this?).



In the next installment I'll fix our perceptron so that it can separate an input space (e.g. our AND graph) into any two regions, explain why this is even important and what this all means in the big picture of neural networks, and begin connecting these little suckers together to make a neural network.

1. For brevity and in lieu of political correctness, singular masculine pronouns/nouns may be used to represent both masculine & feminine.

2. AI people make a distinction between 'neural network' and 'artificial neural network', the former including biological systems such as the human brain.