python - TensorFlow: 2 layer feed forward neural net -
i'm trying implement simple fully-connected feed-forward neural net in tensorflow (python 3 version). network has 2 inputs , 1 output, , i'm trying train output xor of 2 inputs. code follows:
import numpy np import tensorflow tf sess = tf.interactivesession() inputs = tf.placeholder(tf.float32, shape = [none, 2]) desired_outputs = tf.placeholder(tf.float32, shape = [none, 1]) weights_1 = tf.variable(tf.zeros([2, 3])) biases_1 = tf.variable(tf.zeros([1, 3])) layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1) weights_2 = tf.variable(tf.zeros([3, 1])) biases_2 = tf.variable(tf.zeros([1, 1])) layer_2_outputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2) error_function = -tf.reduce_sum(desired_outputs * tf.log(layer_2_outputs)) train_step = tf.train.gradientdescentoptimizer(0.05).minimize(error_function) sess.run(tf.initialize_all_variables()) training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] training_outputs = [[0.0], [1.0], [1.0], [0.0]] in range(10000): train_step.run(feed_dict = {inputs: np.array(training_inputs), desired_outputs: np.array(training_outputs)}) print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 0.0]])})) print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[0.0, 1.0]])})) print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 0.0]])})) print(sess.run(layer_2_outputs, feed_dict = {inputs: np.array([[1.0, 1.0]])}))
it seems simple enough, print statements @ end show neural net near desired outputs, regardless of number of training iterations or learning rate. can see doing wrong?
thank you.
edit: i've tried following alternative error function:
error_function = 0.5 * tf.reduce_sum(tf.sub(layer_2_outputs, desired_outputs) * tf.sub(layer_2_outputs, desired_outputs))
that error function sum of squares of errors. results in network outputting value of 0.5-- indication of mistake somewhere in code.
edit 2: i've found code works fine , and or, not xor. i'm extremely puzzled now.
there several issues in code. in following i'm going comment each line bring solution.
note: xor not linearly separable. need more 1 hidden layer.
n.b: lines starts # [!]
lines wrong.
import numpy np import tensorflow tf sess = tf.interactivesession() # batch of inputs of 2 value each inputs = tf.placeholder(tf.float32, shape=[none, 2]) # batch of output of 1 value each desired_outputs = tf.placeholder(tf.float32, shape=[none, 1]) # [!] define number of hidden units in first layer hidden_units = 4 # connect 2 inputs 3 hidden units # [!] initialize weights random numbers, make network learn weights_1 = tf.variable(tf.truncated_normal([2, hidden_units])) # [!] biases single values per hidden unit biases_1 = tf.variable(tf.zeros([hidden_units])) # connect 2 inputs every hidden unit. add bias layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1) # [!] xor problem function not linearly separable # [!] mlp (multi layer perceptron) can learn separe non linearly separable points ( can # think learn hypercurves, not hyperplanes) # [!] lets' add new layer , change layer 2 output more 1 value # connect first hidden units 2 hidden units in second hidden layer weights_2 = tf.variable(tf.truncated_normal([hidden_units, 2])) # [!] same of above biases_2 = tf.variable(tf.zeros([2])) # connect hidden units second hidden layer layer_2_outputs = tf.nn.sigmoid( tf.matmul(layer_1_outputs, weights_2) + biases_2) # [!] create new layer weights_3 = tf.variable(tf.truncated_normal([2, 1])) biases_3 = tf.variable(tf.zeros([1])) logits = tf.nn.sigmoid(tf.matmul(layer_2_outputs, weights_3) + biases_3) # [!] error function chosen multiclass classification taks, not xor. error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs)) train_step = tf.train.gradientdescentoptimizer(0.05).minimize(error_function) sess.run(tf.initialize_all_variables()) training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] training_outputs = [[0.0], [1.0], [1.0], [0.0]] in range(20000): _, loss = sess.run([train_step, error_function], feed_dict={inputs: np.array(training_inputs), desired_outputs: np.array(training_outputs)}) print(loss) print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 0.0]])})) print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])})) print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 0.0]])})) print(sess.run(logits, feed_dict={inputs: np.array([[1.0, 1.0]])}))
i increased number of train iteration sure network converge no matter random initialization values are.
the output, after 20000 train iteration is:
[[ 0.01759939]] [[ 0.97418505]] [[ 0.97734243]] [[ 0.0310041]]
it looks pretty good.
Comments
Post a Comment