import React from "react";
import SyntaxHighlighter from "react-syntax-highlighter";

const Blog1 = () => {
  return (
    <div>
      <div className="blog-body-text">
        We hear many buzzwords surrounding Artificial Intelligence, and
        similarly, neural networks. They sound like these scary, complex systems
        that are tough to crack into educationally. Despite the wealth of
        resources on the internet regarding what a neural network is, I found it
        difficult to come across a programmatically well structured, flexible,
        basic neural network. So I made one myself!
      </div>

      <div className="blog-body-text">
        I am not going to get into how neural networks work, as there are many
        examples around like DeepLizards amazing channel{" "}
        <a
          href="https://www.youtube.com/channel/UC4UJ26WkceqONNF5S26OiVw"
          target="_blank"
          rel="noopener noreferrer"
        >
          here
        </a>{" "}
        that do a much better job of explaining the intuition behind a neural
        network and the math as well.
      </div>

      <div className="blog-body-text">
        What I set out to do was recreate the popular Keras library. While the
        concepts of a Node and Layer were familiar, I wanted to code the
        interactions myself to really understand what is going on. I think this
        hugely helps in understanding how neural networks, and has helped
        immensely when I stumble on some new exciting neural network
        architecture and try to follow along.
      </div>

      <div className="blog-section-header">
        Node – The basic unit of a neural network
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`from random import gauss
from math import sqrt
from activation_functions import sigmoid, relu, softmax

class Node(object):
    """
    Object to represent a node in a neural network
    """

    def __init__(self, number_of_inputs):
        """
        :param number_of_inputs: type int. the number of inputs into this node. 
                                0 if it's in the input layer.
        
        The weights here are represented as an array, where the value at
        i is the weight between this node, and the ith previous node

        We are going to use Xavier initialization in order to reduce the 
        unstable gradient issues that may arise. This centers the variance
        of the weights coming into this node at 1 / number of inputs to 
        the node.

        The weight is random value pulled from a Normal(0, 1) distribution.
        """
        self.value = 0
        if number_of_inputs > 0:
            xavier_coefficient = sqrt(1 / number_of_inputs)
            self.weights = [gauss(0, 1)*xavier_coefficient for _ in range(number_of_inputs)]
        else: # No weights coming into a node in the input layer
            self.weights = []

    def forward_update(self, activation_function, layer_input_matrix, index):
        """
        :param activation_function: type str. The activation function to be used in this node
        :param layer_input_matrix: type list. The matrix of weighted sums into this layer
        :param index: type int. The index of this node in the layer

        Updates the value of this node with the dot product, and subsequently the activated value
        """
        self.value = layer_input_matrix[index]
        self.__apply_activation_function(activation_function, layer_input_matrix, index)

    def __apply_activation_function(self, activation_function, layer_input_matrix, index):
        """
        :param activation_function: type str. The activation function to be used in the layer
        :param layer_input_matrix: type list. The input values to the nodes in this layer,
            the weighted sum of the previous layer's nodes and the weights in each node in this layer
        :param index: The index of this node in the layer matrix

        Applies the activation function to the value in the node
        """
        if activation_function == 'sigmoid':
            self.value = sigmoid(self.value)
        elif activation_function == 'relu':
            self.value = relu(self.value)
        elif activation_function == 'softmax':
            self.value = softmax(layer_input_matrix)[index]
        else:
            raise ValueError('Activation Function not found')`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        We see here that the Node object keeps track of its current value, as
        well as its weight connections to each node in the previous layer. We
        randomly initialize the weights using a standard normal distribution,
        and use the popular Xavier initialization in order to center the
        variance of the input value to the node around 1/(number_of_inputs).
        This helps prevent unstable gradient issues like vanishing gradient, or
        exploding gradient.
      </div>

      <div className="blog-body-text">
        When a node updates in the feed-forward process, it applies an
        activation function to the input. I've included a couple popular
        activation functions - Sigmoid, Relu, and Softmax (for the output layer)
      </div>

      <div className="blog-body-text">
        That's about it for the Node! All it needs to know is its input value
        and activation function. The Layer is where most of the brute force
        calculations are made in a neural network.
      </div>

      <div className="blog-section-header">
        Layer – The meat of the neural network
      </div>

      <div className="blog-body-text">
        Since this is where most of the computation goes on, I'll break it up
        incrementally.
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`def __init__(self, number_of_nodes, number_of_inputs=0):
    """
    :param number_of_nodes: type int. the number of nodes in this layer
    :param number_of_inputs: type int. The number of nodes in the previous layer
    """
    self.is_input_layer = False
    self.is_output_layer = False
    self.previous_layer = None
    self.next_layer = None
    self.set_learning_rate()
    self.nodes = [Node(number_of_inputs=number_of_inputs) for _ in range(number_of_nodes)]`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        The initialization process here creates some instance variables that let
        the layer know where it is (input, ouput, hidden is if its not either of
        those). It then initializes the nodes in this layer.
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`def forward_update(self, node_input_values):
    """
    :param node_input_values: type list(float). The output values of the nodes in the previous layer

    This function updates the current layer with the previous layer inputs
    during feed forward step.

    It applies the activation function over the nodes in the layer.
    """
    # print("Applying %s activation function" % (self.activation_function))
    self.layer_input_matrix = [dot_product(node_input_values, node.weights) for node in self.nodes]
    # We want to cache the inputs to this layer so it can be used in back propagation
    self.__forward_update(self.layer_input_matrix)`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        We then get to the forward updating process, where we calculate the dot
        product of the inputs to the node (which are the previous layer's nodes)
        and the weights that this node carries for each of those nodes. We want
        to store this matrix so that we can use it later in backpropagation. We
        then forward update each node with those values.
      </div>

      <div className="blog-body-text">
        Since the goal of a neural network is to minimize the loss function
        computed at the end of feeding forward, we must take the derivative of
        that loss function with respect to each weight in the network, allowing
        us to fine tune that weight for optimal network performance.
      </div>

      <div className="blog-body-text">
        We do that process using back propagation, which involves the
        multivariable partial derivatives of the loss function with respect to
        each weight first in the output layer, and then subsequently working our
        way back towards the input layer. It is important that we store each
        calculation in the layer, because the gradient calculated at one layer
        has parts used in the previous layer, and so on.
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`def __back_propagate_hidden_layer(self, activation_function_differential):
    """
    :param activation_function_differential: type function. The differential of the 
                                            activation function in the layer

    We know we are in a hidden layer here, so we do the calculation
    a bit differently.
    """
    self.loss_differentials_wrt_activation_output = []
    self.activation_differentials_wrt_node_input = []

    for current_index in range(len(self.nodes)):
        current_node = self.nodes[current_index]
        activation_differential = activation_function_differential(self.layer_input_matrix[current_index])
        self.activation_differentials_wrt_node_input.append(activation_differential)
        
        loss_differential = 0
        for next_node_index in range(len(self.next_layer.nodes)):
            next_layer_loss_differential = self.next_layer.loss_differentials_wrt_activation_output[next_node_index]
            next_layer_activation_differential = self.next_layer.activation_differentials_wrt_node_input[next_node_index]
            weight_between_this_node_and_that_node = self.next_layer.nodes[next_node_index].weights[current_index]
            loss_differential += (next_layer_loss_differential * next_layer_activation_differential * weight_between_this_node_and_that_node)
        
        self.loss_differentials_wrt_activation_output.append(loss_differential)

        for i in range(len(current_node.weights)):
            previous_activation_value = self.previous_layer.nodes[i].value
            total_differential = loss_differential * activation_differential * previous_activation_value
            current_node.weights[i] -= total_differential * self.learning_rate`}
          }
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        The goal here is to update each weight connected to a node using the
        predefined learning rate for the network. I won't get into the math too
        much here, it is readily available elsewhere, like{" "}
        <a
          target="_blank"
          rel="noopener noreferrer"
          href="https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c"
        >
          {" "}
          here{" "}
        </a>{" "}
        with a much better explanation than I could do!
      </div>

      <div className="blog-body-text">
        I want to get to the exciting part - the API to create your very own,
        self-defined neural network.
      </div>

      <div className="blog-section-header">Building your own network</div>

      <div className="blog-body-text">
        I struggled for a while deciding what API makes the most sense when
        thinking about creating a network. Here is what I came up with.
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`
def add_layer(self, layer):
    """
    :param layer: type Layer. A layer to be added to the network
    """
    if layer is None:
        raise ValueError('Make sure you create a layer before adding it to the network')
    if len(self.layers) is 0:
        layer.set_as_input_layer()
        self.layers.append(layer)
        return
    if len(self.layers) is 1:
        layer.set_as_output_layer()
        # Set the input layer's next layer
        self.layers[len(self.layers) - 1].next_layer = layer
        # Set this layer's previous layer
        layer.previous_layer = self.layers[len(self.layers) - 1]
        self.layers.append(layer)
        return
    # Update the previous layers status as a hidden layer
    self.layers[len(self.layers) - 1].set_as_hidden_layer()
    # Set the previous layer's next layer as this layer
    self.layers[len(self.layers) - 1].next_layer = layer
    # Set this layer's previous layer as the previous layer
    layer.previous_layer = self.layers[len(self.layers) - 1]
    # Set this layer as the new output layer
    layer.set_as_output_layer()
    self.layers.append(layer)`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        We first are able to add a layer to a network. Each time we add a layer,
        we must update that layer to let it know if it is an input layer, output
        layer, or hidden layer. We could let the user define that, but I thought
        better to let us do it. The Network keeps track of all the layers in an
        array, so really this is a sequential neural network.
      </div>

      <div className="blog-body-text">
        Now, we can create our network and train it!
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="python">
          {`input_values = [[2, 4, 6, 8, 10, 12, 14, 15]]
output_values = [[1, 0, 0]]

for _ in range(250):
    input_values[0].append(5)

network = Network()

# input layer - we don't need an activation function here
layer1 = Layer(number_of_nodes=len(input_values[0]))

layer2 = Layer(number_of_nodes=4, number_of_inputs=len(input_values[0])) # hidden layer
layer2.set_activation_function('relu')

layer3 = Layer(number_of_nodes=2, number_of_inputs=4)
layer3.set_activation_function('sigmoid')

# The output layer should use softmax for conversion to probabilities
layer4 = Layer(number_of_nodes=len(output_values[0]), number_of_inputs=2)
layer4.set_activation_function('softmax')

network.add_layer(layer1)
network.add_layer(layer2)
network.add_layer(layer3)
network.add_layer(layer4)

network.set_learning_rate(0.01)

network.set_input_values(input_values)
network.set_expected_output_values(output_values)

for i in range(50):
    network.feed_forward() # feed forward from input to layer 2
    network.back_propagate()
    print(f'Done with round: {i}')`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        Running the above code gives us sequential losses of:
      </div>

      <div className="blog-code-view">
        <SyntaxHighlighter language="text">
          {`Loss at output: 0.5364147272998702
Loss at output: 0.522763805591318
Loss at output: 0.5154013709847125
Loss at output: 0.5103218974989145
Loss at output: 0.5063594997303118
Loss at output: 0.5030354748485043
Loss at output: 0.5001138711369849
Loss at output: 0.4974639763177699
Loss at output: 0.4950071802214773
Loss at output: 0.49269321957403395
Loss at output: 0.49048838826598995
Loss at output: 0.48836920377805443
Loss at output: 0.4863187856735443
Loss at output: 0.48432467744857655
Loss at output: 0.48237748045072115
Loss at output: 0.4804699663089783
Loss at output: 0.4785964825752835
Loss at output: 0.47675254416647644
Loss at output: 0.4749345460243868
Loss at output: 0.47313955690662196
Loss at output: 0.47136516871888967
Loss at output: 0.46960938464422985
Loss at output: 0.4678705348667561
Loss at output: 0.4661472122447291
Loss at output: 0.46443822262077755
Loss at output: 0.4627425460173603
Loss at output: 0.4610593060277657
Loss at output: 0.45938774544790356
Loss at output: 0.45772720671031975
Loss at output: 0.45607711604937873
Loss at output: 0.4544369705915757
Loss at output: 0.45280632775829643
Loss at output: 0.45118479651097276
Loss at output: 0.44957203007486185
Loss at output: 0.44796771985763506
Loss at output: 0.4463715903396498
Loss at output: 0.4447833947592317
Loss at output: 0.44320291145212704
Loss at output: 0.44162994073214074
Loss at output: 0.4400643022217643
Loss at output: 0.4385058325587794
Loss at output: 0.4369543834184312
Loss at output: 0.4354098198016231
Loss at output: 0.4338720185482948
Loss at output: 0.43234086704216357
Loss at output: 0.4308162620787005
Loss at output: 0.42929810887284936
Loss at output: 0.4277863201867827
Loss at output: 0.4262808155611104
Loss at output: 0.4247815206355269`}
        </SyntaxHighlighter>
      </div>

      <div className="blog-body-text">
        We can see that this is consecutively decreasing. Hooray! Not bad for a
        network library we created ourselves.
      </div>

      <div className="blog-body-text">
        For the full code, check out the{" "}
        <a
          href="https://github.com/teddymarchildon/Neural-Network-from-scratch/"
          target="_blank"
          rel="noopener noreferrer"
        >
          GitHub repo
        </a>{" "}
        and let me know any comments you may have! You can also play around with
        it from a browser in the{" "}
        <a
          href="https://repl.it/@teddymarchildon/Neural-Network-from-Scratch"
          target="_blank"
          rel="noopener noreferrer"
        >
          repl
        </a>
      </div>

      <div className="blog-section-header">Next Steps</div>

      <div className="blog-body-text">
        If you are interested in learning more about various types of neural
        networks, some resources are here:
        <ul>
          <li>
            Towards Data Science has an{" "}
            <a
              href="https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464"
              target="_blank"
              rel="noopener noreferrer"
            >
              awesome compilation
            </a>{" "}
            of various types of neural networks. Some of them can be created
            with this library!
          </li>
          <li>
            If you're looking for a math refresher with regards to back
            propagation, the{" "}
            <a
              href="https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives"
              target="_blank"
              rel="noopener noreferrer"
            >
              Khan Academy series on multivariable differentiation
            </a>{" "}
            is amazing, and is the core of the back propagation math.
          </li>
          <li>
            As I mentioned above, Youtube user DeepLizard has a{" "}
            <a
              href="https://www.youtube.com/playlist?list=PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU"
              target="_blank"
              rel="noopener noreferrer"
            >
              playlist{" "}
            </a>{" "}
            that can get you just about everything you need to know on the
            fundamentals of neural networks.
          </li>
          <li>
            Play around and see if you can update the back propagation process
            to handle a batch of inputs instead of just one at a time. There is
            a helper function started in there already!
          </li>
        </ul>
      </div>

      <div className="blog-body-text">
        Thanks for reading! Contact me via the mediums below with comments.
      </div>
    </div>
  );
};

export default Blog1;
