Book: Generative Deep Learning by David Foster
Chapter 1: Generative Modeling
Generative Modeling
- model the probability of observing an observation
x
p(x)
Discriminative Modeling
- Model the probability of a label
y
given an observationx
p(y|x)
Conditional Generative Model
- Model the probability of an observation
x
given a labely
p(x|y)
Representation Learning
high-dimensional data
representation
latent space
encoder-decoder
manifold
The fundamentals of representational learning are very similar to the mathematical concepts of non-linear behavior in electrical engineering and digital communications theory.
Chapter 2: Deep Learning
Multilayer Perceptron (MLP)
- discriminative model
- supervised learning
- loss function: compare predicted to actual
- optimizer: used to adjust weights in neural network based on the gradient of the loss function
- Adam (Adaptive Moment Estimation)
- RMSProp (Root Mean Square Propagation)
Convolution Neural Network (CNN)
- Convolutional layer is a collection of filters
- strides: step size used to move the filter across input
- padding:
padding="same"
pads input data with zeros so the output layer is the same size as the input size ifstrides=1
- stacking
- Batch normalization - calculation of gradient grows too large causing weights to wildly oscillate
- covariate shift: weights move farther away from the random initial values
- training using batch normalization reduces covariate shift problem
- prediction using batch normalization
- trainable parameters
- scale (gamma)
- shift (beta)
- nontrainable parameters
- moving average
- standard deviation
- Dropout
- during training, choose a random set of units from the prior layer and set their output to zero
- reduces reliance on any one value so better at generalizing to unseen data
- Modern approaches tend to favor batch normalization