image.png

$$ \text{L2 regularization: } R(W) = \sum_k \sum_l W_{k,l}^2 \\\text{L1 regularization: } R(W) = \sum_k \sum_l |W_{k,l}| $$

what is regularization and why is it used

to make the model work good not only on the train data but also in test(unseen) data

may be less better on the training data but better on test data (generalizability enhances)

In other words, it’s used to avoid overfitting.

image.png

intuition (occam’s razor 오컴의 면도날): it might not fit/classify the dataset perfectly, but is simpler.

f2 is more prefered

Occam’s razor:

Among multiple competing hypotheses, the simplest is the best, William of Ockham 1285-1347