Fun With Artificial Neural Networks
Part 2



Review

In our last article we covered some basic initial concepts in artificial neural networks. A neural network, again, is simply a network of neurons, simple processing elements which accept 1 or more input values, and output a single value based on those inputs and a current state of the neuron. This current state is stored in a set of weights which modify the input value(s) to the neuron. Each input value is multiplied by its corresponding weight value; all the resulting products are summed to get an activation value, which determines whether the neuron "fires" or not. This determination is made by a (usually) simple output function, which outputs a large value when the input is over a certain value and a small value otherwise.

We also went through a simple design example using a single artificial neuron, or perceptron, to perform the boolean AND gate function, but found our model isn't quite adequate at this task. This is a minor and easily fixable detail we needed to get out of the way to have a general-purpose neuron which could perform it's main task: separate an n-dimensional space into two parts, effectively classifying the space based on the inputs. I'll explain that complicated-sounding function in simpler terms later; now let's fix our perceptron (for the most part this article assumes that you have read or have access to the previous article, particularly the figures).
 

Perceptrons and n-dimensional input space

Recall that our two-input perceptron, attempting to emulate a 2-input AND gate, was unable to do so because it was incapable of separating its input space into two distinct categories where all 0 outputs were on the side of the input space with (0,0) (0,1) and (1,0) as the input and all 1 outputs were on the side of the input space with (1,1) as the input. The function that a perceptron performs is one of dividing its input space, or the range of all possible values of inputs, into two regions. One region should represent an output from the perceptron of 1.0 and the other should represent an output of 0.0. The problem with our perceptron is that it can only separate its input space with a line through the origin, and not any general-purpose line... and we need a line that doesn't pass through the origin for our AND function to work. This is fixed by adding an imaginary input to the perceptron that is always set to 1.0. This input has its own weight value like all the others; it just has a fixed, constant input value of 1.0. So now for a perceptron with n inputs we will actually have n+1 inputs when we include this extra input, called a bias input value. The next section explains, with a decent bit of math, why exactly we need the bias value, but it can be skipped.
 

Bias Input Value

fp(x0, x1, ... , xn-1) = {0.0, 1.0}

This equation is the function performed by a perceptron; it accepts n inputs, x0 to xn-1, processes them and outputs either 0.0 or 1.0. It defines an n-dimensional input space, over which there are only 2 possible outputs: 0.0 and 1.0. Because the above function is defined as a linear combination of the inputs and some constant weights, this space is divided into two regions by a surface one dimension smaller than n. A perceptron with one input, for example, divides its one-dimensional input space (a line) with a 0-dimensional "surface" (a point):

fp(x0) = f(x0*w0)

Notice that all positive values of x0 result in one output and negative values in the other, meaning the one-dimensional output is divided at the origin, x0=0.0. A perceptron with two inputs divides its two-dimensional input space (a plane) with a 1-dimensional "surface" (a line):

fp(x0 + X1) = f(x0*w0 + x1*w1)

Here you get the boundary condition x0*w0 + x1*w1 = 0.0 with the set of lines x1 = w0/w1 * x0, which is the set of all lines passing through the origin x0 = x1 = 0.0, x0andx1 being the axes of the input space. Higher order input spaces are similar: the 3-input perceptron having it's 3-dimensional input volume divided by a plane intersecting the origin, the 4-input perceptron having it's 4-D input space divided by a volume intersecting the origin, etc. The problem with these divisions is that they aren't general enough; all dividing surfaces must pass through the origin. This problem is corrected with what's called a bias input value.

The solution to the problem is easy to see when we look at the 2-input perceptron's function:

x1 = w0/w1 * x0

This is the equation for the line separating the two regions; it has a slope of w0/w1 but a y-intercept of 0. In fact, for n dimensions the general equation for the surface separating the input space into two regions is

xn-1 = (w0/wn-1)x0 + (w1/wn-1)x1 + ... +(wn-2/wn-1)x0-2

Again the equation shows that the surface must go through the origin, since there's no "y"-intercept or constant value added. Here's where the bias input value comes in; it supplies the needed constant value to allow the surface to be anywhere in the input space and not require it to pass through the origin. To the perceptron the bias input value looks like an extra input with its own weight like all the others, but this input is set to 1.0 and never changes. With this bias input added, the equation for the n-input perceptron now looks like this:

output = f(x0w0 + x1w1 + ... + xn-1wn-1 + 1.0wn)
= f(x0w0 + x1w1 + ... + xn-1wn-1 + wn)

Showing that this equation does what it claims is trivial and left as an exercise to the reader; I'm behind schedule on explaining the really neat stuff.

AND Function Revisited

With this correction our 2-input AND perceptron will now have three inputs and just as many weights (coding a perceptron in an object-oriented way hides this bias input from the outside world, since it's fixed at 1.0). Let's try to find some weights that will correctly yield the AND function, say 0.3, -0.5 and -0.9 for w0, w1, and w2, respectively:
 

x0 x1 desired output actual output
0 0 0 0
0 1 0 0
1 0 0 0
1 1 1 0

This set of weights is almost close again, but not quite. The weights need to be adjusted so that the last input set gives the correct output. In order for the perceptron to output a 1.0, the sum of the products of the inputs and their respective weights, plus the value of the bias input weight, must be positive. For an input of (1.0, 1.0), we get 1.0*0.3 + 1.0*-0.5 + -0.9 = -1.1. Let's try raising the w1 value, say to 0.7. Recalculating the table, we find that this set of weights, {0.3, -0.5, -0.7} works! In fact, there is no single correct set of weights that works; this is what gives neural networks a measure of robustness. A single weight can change slightly without making a large change in the output of a perceptron. Now let's connect them together to make a neural network. We'll also change our output function from the simple sign() function to one that's similar in shape, but is differentiable.
 

Neural Networks

A neural network is just a set of connected neurons, the output of one feeding the inputs of one or more others. Neural networks are traditionally organized into layers of neurons, each layer consisting of one or more neurons. There are three categories of layers: input layer, output layer, and hidden layer. An input layer is the input to the entire neural network, an output layer is the network's output, and a hidden layer is a layer of neurons between the input and output layers. There is only one input layer and one output layer in a neural network; there may be more than one hidden layer in a neural network.

The neurons in each layer differ slightly from the perceptron model we've been working with. The hidden layer neurons are unchanged. The input layer neurons have only a single input. The output layer neurons may or may not have an output function, depending on the format of the output you want. Also, the neurons in all layers except for the input layer have bias input values.

The input layer's sole purpose is to accept input from the outside world, one input value per neuron. It is your job to get this input and format it for the input layer. Let's look at a real world example: character recognition. We'll design a neural net that will read a handwritten character and tell us what letter of the English alphabet it is. The first step is getting the input into a format that the neural net can understand, and in order to do this, we need to define our input. Since our input is a handwritten character, it makes sense to represent the character as a 2-dimensional array of pixels. One could draw a letter on an input tablet or using a mouse and generate a m x n pixel image of the character, where dark pixels are set to 1 and others set to 0. We can input all these pixels into the neural network (there are m*n pixels) by having an input layer with m*n neurons and having the input to each neuron be the state of its corresponding pixel from the handwritten character image data. The pixels with lines in them are set to 1 and the others are set to 0 (more pixels mean better resolution, but also more input neurons & processing). Other applications would have different format for input data: aircraft speed and attitude, voice data, text from a chat room. The neural net is good at classifying its inputs into categories; you merely have to provide it with all the inputs that are significant to your problem.

The input neurons, having only one input each, simply multiply the input value present at the input by the weight, send it through the output function, and output the result to all neurons in the second layer, almost always a hidden layer. Most neural networks have a minimum of three layers. The weights in the input layer serve as input scaling values essentially, although generally inputs are pre-scaled to between -1.0 and 1.0 before going to the neural network.

The hidden layer(s) do most of the work, along with the output layers. It is the weights of the neurons in these layers that hold the learned information of the neural net and allow it to classify the input (which I'll explain shortly). They take the output of the previous layers, process it, and send it to the next layer.

The output layer is similar to a hidden layer, except based on how you'd like the output you can either use an output function or not. An output function is used if you want a boolean output from your network, and not if you want a linear output. With our character recognition example a neural network designer typically creates an output layer with as many neurons as there are possible responses from the network. If we want to recognize all uppercase alphabetic characters, we would have 26 output neurons in the output layer, each with an output function. The network would be trained to output a 1.0 at only one of the output neurons and 0.0 at all the others based on the pixel input data to the network. This is the classification function performed by our neural network.

Actually we will change our output function from the simple sign() function to an approximation that is more amenable to the training procedure:

fo(x) = 1/(1 + e-x)

This function is similar to the previous function we were using, but has no abrupt corners at x=0. This means the function is differentiable (has a slope) at all points; we need to be able to take the derivative of the output function for the training process. Another difference is that this function never exactly reaches 0.0 or 1.0 as x grows negative or positive, until x is infinitely large. So we can't expect outputs of 0.0 and 1.0 as boolean outputs from our output neurons. Therefore we will use values close to 0.0 and 1.0 as our low and high values as training target values, such as 0.1 and 0.9.

This interconnection of layers is called a feedforward network, propagating an input through the neurons to the output. The layer organization makes it simpler to write an algorithm to process an applied input recursively. The algorithm is as follows:

1. current_layer = input layer
2. while(current_layer)
2a. current_neuron = first neuron in layer
2b. while(current_neuron)
2b.1. calculate output of current_neuron
2b.2. current_neuron = next neuron in layer, or NULL if none
2b.3. end while
2c. current_layer = next layer in network, or NULL if none
2d. end while
3. end
Next time we'll look at an object-oriented design for a neuron and a neural net, and hopefully get into training the thing, which is really the cornerstone of the neural net.