machine learning – back propagation / EECE 5136/6036: Intelligent Systems Homework 3

machine learning – back propagation
EECE 5136/6036: Intelligent Systems Homework 3
This homework will focus on a problem of “real-world application” size, though still quite small
compared to full-fledged applications. This homework will involve complex programming, simulation,
and reporting, and will take a long time. That’s why you are being given 3 weeks to do it. You will need
them, so please start immediately.
This and Homework 4 (also being posted at the same time) are, in fact, two parts of a single large
homework. This one is due March 30, and is the one that involves almost all the programming work.
Homework 4 will not require the writing of a program for any new algorithm, and will use the program
you develop for Homework 3 on the same dataset. You will need to make small modifications to the
program.
It is extremely important that you save your final networks obtained in both the problems in Homework 3
(i.e., their final weight matrices), as well as all data (the training and test sets selected, the error on every
test data point, the learning rate and momentum used.)
1. (200 points) One of the most widely used data sets for evaluating machine learning applications in the
image analysis area is the MNIST dataset, which provides images of handwritten digits and letters. In
this homework, you will use the numbers subset from this dataset.
Two data files are included:
• Image Data File: MNISTnumImages5000.txt is a text file that has data for 5,000 digits, each a grayscale
image of size 28 × 28 pixels (i.e., 784 pixels each). Each row of the data file has 784 values representing
the intensities of the image for one digit between 0 and 9. The first hundred images are shown in the
included file first100.jpg.
• Label Data File: MNISTnumLabels5000.txt is a text file with one integer in each row, indicating the
correct label of the image in the corresponding row in the image data file. Thus, the first entry ’7’
indicates that the first row of the image data file has data for a handwritten number 7.
You need to do the following:
1. Write a program implementing multi-layer feed-forward neural networks and training them with
backpropagation including momentum. Your program must be able to handle any number of hidden
layers and hidden neurons, and should allow the user to specify these at run-time.
2. Randomly choose 4,000 data points from the data files to form a training set, and use the remaining
1,000 data points to form a test set.
3. Train a 1-hidden layer neural network to recognize the digits using the training set. You will probably
need a fairly large number of hidden neurons – in the range of 100 to 200 – and several output neurons.
I suggest using 10 output neurons – one for each digit – such that the correct neuron is required to
produce a 1 and the rest 0. To evaluate performance during training, however, you can use “target
values” such as 0.75 and 0.25, as discussed in class. You will probably need hundreds of epochs for
learning, so consider using stochastic gradient descent, where only a random subset of the 4,000 points
is shown to the network in each epoch. The performance of the network in any epoch is measured by
the fraction of correctly classified points in that epoch (Hit-Rate). Save this value at the beginning, and
then in every tenth epoch (as in Homework 2).
4. After the network is trained, test it on the test set. To evaluate performance on the test data, you can
use a max-threshold approach, where you consider the output correct if the correct output neuron
produces the largest output among all 10 output neurons.
Write a report providing the following information. Each item required below should be placed in a
separate section with the heading given at the beginning of the item.:
• System Description: A description of all the choices you made – number of hidden neurons, learning
rate, momentum, output thresholds, rule for choosing initial weights, criterion for deciding when to stop
training, etc. You may need to experiment with several parameter settings and hidden-layer sizes before
you get good results.
• Results: Report performance of the final network on the training set and the test set using a confusion
matrix. This is a 10 × 10 matrix with one row and one column for each of the classes (digits). In the (i, j)
cell of the matrix, you will put the number of class i items classified as class j. Thus, cell (2, 4) of the
confusion matrix will show how many 2s were (incorrectly) classified as 4. Of course, the diagonal will
indicate the correct classifications. You will get one confusion matrix for the training set and another for
the test set.
Also plot the time series of the error (1 – Hit-Rate) during training using the data saved at every tenth
epoch.
• Analysis of Results: You should describe, discuss and interpret the results you got, and why you think
they are as they are.
• Appendix: Program: Printouts of your program. You may use any programming language, but you
cannot use toolboxes, libraries or simulators that provide pre-programmed versions of backpropagation.
You must implement the full algorithm yourself.
The text part of the report, excluding the figures and program, should be no more than 2 pages, 12 point
type, double spaced.
2. (200 points) In this problem you will train an auto-encoder network using the same data as in Problem
1, i.e., the same training and test sets. In this case, the goal is not to classify the images, but to obtain a
good set of features for representing them. Using the simulator developed in Problem 1, you will set up
a one hidden-layer feed-forward network with 784 inputs, 784 output neurons and the same number of
hidden neurons as your final network in Problem 1. The input presented to the network will be one 28 ×
28 image at a time, and the goal of the network will be to produce exactly the same image at the output.
Thus, the network is learning a reconstruction task rather than a classification task.
The network will be trained using back-propagation with momentum. Since this is not a classification
problem, and the goal is to produce real-valued outputs, you will use the J2 loss function to quantify
error, as we used when deriving back-propagation in class.
As in Problem 1, the system will be trained on a training set of 4,000 data points and tested on the other
1,000. During training, you will calculate the value of the loss function at the beginning and in every
tenth epoch, and save this. Training should continue until the loss function on the training set is
sufficiently low. At the end, you should also calculate the loss function over the test set. No confusion
matrices are calculated because they apply only to classification problems.
After training is complete, each hidden neuron can be seen as having become tuned to a particular 28 ×
28 feature for which it produces the strongest output. To visualize the feature for any hidden neuron,
take its 784 weights and plot them as a 28 × 28 grayscale image (the first 28 numbers are the first row,
the next 28 the second, and so on). This image will show what input the hidden neuron is most
responsive for.
Write a report providing the following information. Each item required below should be placed in a
separate section with the heading given at the beginning of the item.:
• System Description: A description of all the parameter choices you made – learning rate, momentum,
rule for choosing initial weights, criterion for deciding when to stop training, etc. Again, you may need to
try several parameter values. However, the number of hidden neurons in this case should be the same
as the final network in Problem 1.
• Results: Report the performance of the final network on the training set and the test set using the loss
function. In this case, this will just be two values, which you should plot as two bars side by side. Also
plot the time series of the error during training using the data saved at every tenth epoch.
• Features: Plot the images for a large number – and if possible, all – your features, just like the data is
shown in first100.jpg.
• Analysis of Results: Describe, discuss and interpret the results you got, and why you think they are as
they are. In particular, comment on the features you found, and what they suggest.
The text part of the report, excluding the figures and program, should be no more than 2 pages, 12-
point type, double spaced. You do not need to include a program for this because you should have used
the same program here as in Problem 1.
Points will be awarded for: 1) Correctness; 2) Clarity of description; 3) Quality of the strategy; and 4)
Clarity of arguments and presentation.
As in previous homework, the report text should not be mixed in with the program. It should be a
standalone document with text, tables, figures, etc., with the program as an appendix. None of the
information required in the report should be given as a comment or note in the program. It must all be
in the report. You may consult your colleagues for ideas, but please write your own programs and come
to your own conclusions.
EECE 5136/6036: Intelligent Systems Homework 3
This homework will focus on a problem of “real-world application” size, though still quite small
compared to full-fledged applications. This homework will involve complex programming, simulation,
and reporting, and will take a long time. That’s why you are being given 3 weeks to do it. You will need
them, so please start immediately.
This and Homework 4 (also being posted at the same time) are, in fact, two parts of a single large
homework. This one is due March 30, and is the one that involves almost all the programming work.
Homework 4 will not require the writing of a program for any new algorithm, and will use the program
you develop for Homework 3 on the same dataset. You will need to make small modifications to the
program.
It is extremely important that you save your final networks obtained in both the problems in Homework 3
(i.e., their final weight matrices), as well as all data (the training and test sets selected, the error on every
test data point, the learning rate and momentum used.)
1. (200 points) One of the most widely used data sets for evaluating machine learning applications in the
image analysis area is the MNIST dataset, which provides images of handwritten digits and letters. In
this homework, you will use the numbers subset from this dataset.
Two data files are included:
• Image Data File: MNISTnumImages5000.txt is a text file that has data for 5,000 digits, each a grayscale
image of size 28 × 28 pixels (i.e., 784 pixels each). Each row of the data file has 784 values representing
the intensities of the image for one digit between 0 and 9. The first hundred images are shown in the
included file first100.jpg.
• Label Data File: MNISTnumLabels5000.txt is a text file with one integer in each row, indicating the
correct label of the image in the corresponding row in the image data file. Thus, the first entry ’7’
indicates that the first row of the image data file has data for a handwritten number 7.
You need to do the following:
1. Write a program implementing multi-layer feed-forward neural networks and training them with
backpropagation including momentum. Your program must be able to handle any number of hidden
layers and hidden neurons, and should allow the user to specify these at run-time.
2. Randomly choose 4,000 data points from the data files to form a training set, and use the remaining
1,000 data points to form a test set.
3. Train a 1-hidden layer neural network to recognize the digits using the training set. You will probably
need a fairly large number of hidden neurons – in the range of 100 to 200 – and several output neurons.
I suggest using 10 output neurons – one for each digit – such that the correct neuron is required to
produce a 1 and the rest 0. To evaluate performance during training, however, you can use “target
values” such as 0.75 and 0.25, as discussed in class. You will probably need hundreds of epochs for
learning, so consider using stochastic gradient descent, where only a random subset of the 4,000 points
is shown to the network in each epoch. The performance of the network in any epoch is measured by
the fraction of correctly classified points in that epoch (Hit-Rate). Save this value at the beginning, and
then in every tenth epoch (as in Homework 2).
4. After the network is trained, test it on the test set. To evaluate performance on the test data, you can
use a max-threshold approach, where you consider the output correct if the correct output neuron
produces the largest output among all 10 output neurons.
Write a report providing the following information. Each item required below should be placed in a
separate section with the heading given at the beginning of the item.:
• System Description: A description of all the choices you made – number of hidden neurons, learning
rate, momentum, output thresholds, rule for choosing initial weights, criterion for deciding when to stop
training, etc. You may need to experiment with several parameter settings and hidden-layer sizes before
you get good results.
• Results: Report performance of the final network on the training set and the test set using a confusion
matrix. This is a 10 × 10 matrix with one row and one column for each of the classes (digits). In the (i, j)
cell of the matrix, you will put the number of class i items classified as class j. Thus, cell (2, 4) of the
confusion matrix will show how many 2s were (incorrectly) classified as 4. Of course, the diagonal will
indicate the correct classifications. You will get one confusion matrix for the training set and another for
the test set.
Also plot the time series of the error (1 – Hit-Rate) during training using the data saved at every tenth
epoch.
• Analysis of Results: You should describe, discuss and interpret the results you got, and why you think
they are as they are.
• Appendix: Program: Printouts of your program. You may use any programming language, but you
cannot use toolboxes, libraries or simulators that provide pre-programmed versions of backpropagation.
You must implement the full algorithm yourself.
The text part of the report, excluding the figures and program, should be no more than 2 pages, 12 point
type, double spaced.
2. (200 points) In this problem you will train an auto-encoder network using the same data as in Problem
1, i.e., the same training and test sets. In this case, the goal is not to classify the images, but to obtain a
good set of features for representing them. Using the simulator developed in Problem 1, you will set up
a one hidden-layer feed-forward network with 784 inputs, 784 output neurons and the same number of
hidden neurons as your final network in Problem 1. The input presented to the network will be one 28 ×
28 image at a time, and the goal of the network will be to produce exactly the same image at the output.
Thus, the network is learning a reconstruction task rather than a classification task.
The network will be trained using back-propagation with momentum. Since this is not a classification
problem, and the goal is to produce real-valued outputs, you will use the J2 loss function to quantify
error, as we used when deriving back-propagation in class.
As in Problem 1, the system will be trained on a training set of 4,000 data points and tested on the other
1,000. During training, you will calculate the value of the loss function at the beginning and in every
tenth epoch, and save this. Training should continue until the loss function on the training set is
sufficiently low. At the end, you should also calculate the loss function over the test set. No confusion
matrices are calculated because they apply only to classification problems.
After training is complete, each hidden neuron can be seen as having become tuned to a particular 28 ×
28 feature for which it produces the strongest output. To visualize the feature for any hidden neuron,
take its 784 weights and plot them as a 28 × 28 grayscale image (the first 28 numbers are the first row,
the next 28 the second, and so on). This image will show what input the hidden neuron is most
responsive for.
Write a report providing the following information. Each item required below should be placed in a
separate section with the heading given at the beginning of the item.:
• System Description: A description of all the parameter choices you made – learning rate, momentum,
rule for choosing initial weights, criterion for deciding when to stop training, etc. Again, you may need to
try several parameter values. However, the number of hidden neurons in this case should be the same
as the final network in Problem 1.
• Results: Report the performance of the final network on the training set and the test set using the loss
function. In this case, this will just be two values, which you should plot as two bars side by side. Also
plot the time series of the error during training using the data saved at every tenth epoch.
• Features: Plot the images for a large number – and if possible, all – your features, just like the data is
shown in first100.jpg.
• Analysis of Results: Describe, discuss and interpret the results you got, and why you think they are as
they are. In particular, comment on the features you found, and what they suggest.
The text part of the report, excluding the figures and program, should be no more than 2 pages, 12-
point type, double spaced. You do not need to include a program for this because you should have used
the same program here as in Problem 1.
Points will be awarded for: 1) Correctness; 2) Clarity of description; 3) Quality of the strategy; and 4)
Clarity of arguments and presentation.
As in previous homework, the report text should not be mixed in with the program. It should be a
standalone document with text, tables, figures, etc., with the program as an appendix. None of the
information required in the report should be given as a comment or note in the program. It must all be
in the report. You may consult your colleagues for ideas, but please write your own programs and come
to your own conclusions.

Order from us and get better grades. We are the service you have been looking for.