In stead of fitting your generator \(q\) to your data distribution \(p\) via a discriminator in an adverserial manner, consider matching \(q\) and \(p\) via the maximum mean discrepency (MMD) criterion in a lower dimension space projected by \(f_\theta\) while keeping the ratios between \(p\) and \(q\) on both spaces matched by minimising \(\int q(x) \left( \frac{p(x)}{q(x)} - \frac{p\left(f_\theta(x)\right)}{q\left(f_\theta(x)\right)} \right)^2 dx\) w.r.t to \(\theta\).

No more adverserial :)

This yeilds an effective training methods called *generative ratio matching* (GRAM) that trains deep networks to generate data with better quality than GANs whiling being as stable as MMD networks.

How to estimate an infinte sum \(S = \sum_{k=1}^\infty T_k\) that converges?

Spin a roulette via \(K \sim P(\tau) = (1 - \rho_{\tau+1})\prod_{s=1}^\tau \rho_s\), where \(P(\tau)\) represents an abrititary discrete distribution.

Then your unbiased estimate of the sum is \(\hat{S}_K = \sum_{k=1}^K \frac{T_k}{p_k}\), where \(p_k = \prod_{j=1}^k \rho_j\).

The trick is used in my method *roulette-based amortized variational expectation* (RAVE) to perform amortized inference in deep models with an Indian bufeet process prior.

PS: Gambling is risky and, therefore, is not recommended.

What can you do to visualise a probabilistic classifier? Well, if the data is in 2D, you'll probably visulize its decision boundary directly. But what if the data lives in high dimension?

Dimensionality reduction is the wrong answer! Why? Because low dimensional embeddings is meaningless without the corresponding decision boundary. One can interpret the end classification arbitrarily if only sees the embeddings.

What should we do, instead?

The solution is to visulise both the data and the classifier in a lower dimension space.
This can be done by jointly performing dimension reduction and knowledge distillation via *Darksight*.