[ML] Activation Functions & Last Layer Functions


A sigmoid function, also known as a logistic function, is given by the relationship:

\[ \sigma(t) = \frac{1}{1+e^{-\beta t}} \]

Where \( \beta \) is a slope parameter.


Sigmoid function’s expensive computational cost lead to the emergence of ReLU, which is originally defined by:

\[ f(x) = max(0,x) \]

It has 3 major advantages over Sigmoid function:

  1. Avoid gradient vanishing.
  2. Provide more sparsity.
  3. Allow faster computation.


\[ \sigma(t)= tanh(t) = \frac{e^{t}-e^{-t}}{e^{t}+e^{-t}} \]


The softmax activation function is useful predominantly in the output layer of a clustering system. Softmax functions convert a raw value into a posterior probability. This provides a measure of certainty. The softmax activation function is given as:

\[ y_{i} = \frac{e^{\zeta_{i}}}{\sum_{j\in L}e^{\zeta_{j}}} \]

