
Checkout Normalization: What, When, Why, and How?
During Traning, in each forward pass, randomly set some neurons to zero; usually 0.5 (half of the neuron)

intuition: makes the nn model more robust
In test time, we usually use all the neuron, thus we need to scale it.