[ML] Activation Functions & Last Layer Functions
Sigmoid
A sigmoid
function, also known as a logistic function
, is given by the relationship:
\[ \sigma(t) = \frac{1}{1+e^{-\beta t}} \]
Where \( \beta \) is a slope parameter.
ReLU
Sigmoid function’s expensive computational cost lead to the emergence of ReLU
, which is originally defined by:
\[ f(x) = max(0,x) \]
It has 3 major advantages over Sigmoid
function:
- Avoid
gradient vanishing
. - Provide more
sparsity
. - Allow
faster computation
.
Tanh
\[ \sigma(t)= tanh(t) = \frac{e^{t}-e^{-t}}{e^{t}+e^{-t}} \]
Softmax
The softmax activation function is useful predominantly in the output layer of a clustering system. Softmax functions convert a raw value into a posterior probability. This provides a measure of certainty. The softmax activation function is given as:
\[ y_{i} = \frac{e^{\zeta_{i}}}{\sum_{j\in L}e^{\zeta_{j}}} \]