Artificial neural networks and deep learning

Artificial intelligence has generated a lot of hype in the last decade as the performance possibilities of machines have improved, and therefore, more computational methods have become possible. In this article, we provide an overview of 2 popular supervised learning methods: artificial neural networks and deep learning.

In traditional machine learning (ML), a specialist can derive informative features (variables) from a complex dataset. This requires a lot of domain knowledge and expertise. Deep learning methods are also able to learn from the data and derive important features without interference from a human. Another difference among other supervised learning methods mentioned in previous articles, and the deep learning methods is related to the learning process. In deep learning, features can be extracted sequentially in a layered structure. These layers are also called neural networks (NNs) or perceptrons, a reference to neurobiology.

A single-layer perceptron

A general concept of the layered structures is that, for the first layer, the inputs are the values of variables from a dataset. The output of the last layer is related to the dependent variable (response) in a dataset. Now, what does such a layer do, and what does it look like? We can start with the explanation of a single-layer perceptron. An example of such a perceptron is given in Figure 1 .

In a single-layer NN, the input data (x ₀ , x ₁ , x ₂ ,…) are transformed via a system of weights (w ₀ , w ₁ , w ₂ ,…), which are the model coefficients that must be estimated during the learning process. In particular, the weighted input data are summed, and an optional numerical constant (threshold) <SPAN role=presentation tabIndex=0 id=MathJax-Element-1-Frame class=MathJax style="POSITION: relative" data-mathml='b’>𝑏
b
is added. This constant can be arbitrarily chosen depending on the desired output. For example, if we want higher values as an output, we can use a high value for <SPAN role=presentation tabIndex=0 id=MathJax-Element-2-Frame class=MathJax style="POSITION: relative" data-mathml='b’>𝑏
b
. The output we have now is nothing more than a linear combination of input data, similar to the ML models considered in the previous articles of this series. The amount of input data can always be chosen arbitrarily, as well as the amount of “cell bodies” in this single layer, also called activations. However, in a single-layer network, the output of all activations is directly combined into the output layer. Note how this is similar to the biological processes that occur in the human nervous system.

Activation functions

Without the activation function, cell bodies would only generate linear outputs. To obtain nonlinear outputs, the weighted sum of the input data, increased by threshold b , is used as an argument of an activation function to generate the output.

Several activation functions are used in different applications. The most commonly used functions are shown in Figure 2 . In essence, all those functions mimic an “activation” of a neuron after receiving a signal exceeding a particular value. For example, a sigmoid function is similar to the logistic function used in logistic regression. If the input value is large and positive, the output value will be very close to 1, whereas if the input value is large and negative, a value close to 0 will be obtained. Other very commonly used functions are the rectified linear unit (ReLU) and its variants. The main reason why they are used is to suppress negative input values and keep positive input values unchanged. ReLU sets negative inputs exactly to 0, whereas variants such as leaky ReLU and exponential linear units reduce them instead.