
$$ \text{L2 regularization: } R(W) = \sum_k \sum_l W_{k,l}^2 \\\text{L1 regularization: } R(W) = \sum_k \sum_l |W_{k,l}| $$
to make the model work good not only on the train data but also in test(unseen) data
may be less better on the training data but better on test data (generalizability enhances)
In other words, it’s used to avoid overfitting.

intuition (occam’s razor 오컴의 면도날): it might not fit/classify the dataset perfectly, but is simpler.
f2 is more prefered
Occam’s razor:
Among multiple competing hypotheses, the simplest is the best, William of Ockham 1285-1347