This posts introduce basic concepts and derivation of generative modeling.

Generative Models

Depending on whether annotation is available, machine learning mathods can be cetergorized as supervised and unsupervised learning. The different modeling over the distribution x can roughly divide these methods into two catergory: generative and discriminative. In supervised learning, the most common one is discriminative which models the conditional probability pθ(yx) parameterized by θ. With θ learnt from back propagation over the samples, models directly predict y=pθ(yx). Gernerative model, on the other hand, models the joint probability of p(x,y) and predict y=p(yx)=p(x,y)p(x). Knowing what generative model cares about, we focus on the unsupervised generative model that’s used to generate images as real as possible.

Modeling over p(x)

Given a distribution pθ(x) parameterized by θ, we denote the true distribution as $p_{\theta^}(x).Togeneratesamplexwithoutnoannotationascloseaspossibletop_{\theta^}(x),wewanttofindtheoptimal\theta^.Thatis,Wewanttofinda\thetathatmaximizethelogprobabilityofp_\theta(x)withx\sim p_{\theta^}(x)$.

(1)θ=argmaxθpxpθ(x),θ(x)

Empirically, this equals to maximize the likelihood of joint probability of all training samples:

(2)θ=argmaxθi=1mpθ(xi)

This can be rewrite as maximization of log likelihood and be approximated by the samples drawn from p(x):

(3)θ=argmaxθlog(pxpθ(x),θ(x))(4)=argmaxθlog(i=1mpθ(xi))(5)=argmaxθi=1mlogpθ(xi)

This can also be seem as maximizing the expection of log likelihood with respect to pθ(x), which can be rewrite as minimization of cross entropy between the two distributions. This equals to minimization of KL Divergence between the two distributions since the entropy of pθ(x) is fixed as a constant: (6)θ=argmaxθExpθ(x)[logpθ(x)](7)=argminθxpθ(x)log1pθ(x)dx(8)argminθxpθ(x)log1pθ(x)dxxpθ(x)1logpθ(x)dx(9)=argminθxpθ(x)logpθ(x)pθ(x)dx(10)=argminθDKL(pθ(x)∣∣pθ(x))

A taxonomy of deep generative models

We refer to NIPS2016 2016 Tutorial on GAN by Ian for taxonomy of generative models.

different generative model

Now that we have the goal: maximization the log likelihood of pxpθ(x),θ(x). One of the biggest problem is how to define pθ(x). Explicit density models define an explicit density function pθ(x), which can be directly optimized through backprop. The main difficulty present in explicit density models is designing a model that can capture all of the complexity of the data to be generated while still maintaining computational tractability.