lecture 5. image classification with cnns

Recap of Deep Learning Basics

(in 13 slides)

Image (or more generally) inputs are represented as vectors/matrices. For example an image can be 32x32x3(h x w x channel) = 3072.

This is passed to neural network $f(x, W)$ and the neural network predicts based on the task.

There are three ways to understand this:

Algebraic Viewpoint (Linear Algebra)
Visual Viewpoint
Geometric Viewpoint

How do we train them?

We need a loss function that optimizes the prediction of the neural network.

For example, we can use the ground truth label $y$ and compute the loss with $y - f(x,W)$.

There are two types of losses which are regularization loss and data loss ($y-f(x,W)$) and they are summed to get the final loss $L$.

After getting the loss function, we need to optimize (or update) the neural network’s parameter to make it perform better. There are multiple ways to do this such as:

SGD
SGD + Momentum
RMSProp