### To be adverserial or to be matched? This is not a question.

In stead of fitting your generator $$q$$ to your data distribution $$p$$ via a discriminator in an adverserial manner, consider matching $$q$$ and $$p$$ via the maximum mean discrepency (MMD) criterion in a lower dimension space projected by $$f_\theta$$ while keeping the ratios between $$p$$ and $$q$$ on both spaces matched by minimising $$\int q(x) \left( \frac{p(x)}{q(x)} - \frac{p\left(f_\theta(x)\right)}{q\left(f_\theta(x)\right)} \right)^2 dx$$ w.r.t to $$\theta$$.

This yeilds an effective training methods called generative ratio matching (GRAM) that trains deep networks to generate data with better quality than GANs whiling being as stable as MMD networks.

### Statistical machine learning is all about gambling.

How to estimate an infinte sum $$S = \sum_{k=1}^\infty T_k$$ that converges?

Spin a roulette via $$K \sim P(\tau) = (1 - \rho_{\tau+1})\prod_{s=1}^\tau \rho_s$$, where $$P(\tau)$$ represents an abrititary discrete distribution.

Then your unbiased estimate of the sum is $$\hat{S}_K = \sum_{k=1}^K \frac{T_k}{p_k}$$, where $$p_k = \prod_{j=1}^k \rho_j$$.

The trick is used in my method roulette-based amortized variational expectation (RAVE) to perform amortized inference in deep models with an Indian bufeet process prior.

PS: Gambling is risky and, therefore, is not recommended.

### Low dimensional embeddings are meaningless without the decision boundary

What can you do to visualise a probabilistic classifier? Well, if the data is in 2D, you'll probably visulize its decision boundary directly. But what if the data lives in high dimension?

Dimensionality reduction is the wrong answer! Why? Because low dimensional embeddings is meaningless without the corresponding decision boundary. One can interpret the end classification arbitrarily if only sees the embeddings.