Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features
Denoising Diffusion Probabilistic Models have shown an impressive generation
quality, although their long sampling chain leads to high computational costs.
In this paper, we observe that a long sampling chain also leads to an error
accumulation phenomenon, which is similar to the exposure bias problem in
autoregressive text generation. Specifically, we note that there is a
discrepancy between training and testing, since the former is conditioned on
the ground truth samples, while the latter is conditioned on the previously
generated results. To alleviate this problem, we propose a very simple but
effective training regularization, consisting in perturbing the ground truth
samples to simulate the inference time prediction errors. We empirically show
that, without affecting the recall and precision, the proposed input
perturbation leads to a significant improvement in the sample quality while
reducing both the training and the inference times. For instance, on CelebA
64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving
37.5% of the training time. The code is publicly available at
https://github.com/forever208/DDPM-IP