Adversarial Score Distillation: When score distillation meets GAN

Min Wei Jingkai Zhou Junyao Sun Xuesong Zhang

Overview

We proposed Adversarial Score Distillation (ASD) based on the WGAN paradigm. The generator could be a NeRF or 3DGS in the text-to-3d task, or a general generator, or just image pixels in the 2D distillation and image editing tasks. ASD maintains an optimizable discriminator by implementing parameters with an optimizable conditional embedding or LoRA, and updates this discriminator using the complete loss.

Abstract

Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale, manifested as over-smoothness or instability at small CFG scales, while over-saturation at large ones. To explain and analyze these issues, we revisit the derivation of Score Distillation Sampling (SDS) and decipher existing score distillation with the Wasserstein Generative Adversarial Network (WGAN) paradigm. With the WGAN paradigm, we find that existing score distillation either employs a fixed sub-optimal discriminator or conducts incomplete discriminator optimization, resulting in the scale-sensitive issue. We propose the Adversarial Score Distillation (ASD), which maintains an optimizable discriminator and updates it using the complete optimization objective. Experiments show that the proposed ASD performs favorably in 2D distillation and text-to-3D tasks against existing methods. Furthermore, to explore the generalization ability of our paradigm, we extend ASD to the image editing task, which achieves competitive results.

Generated 2D images

Similarly to ancestral sampling, ASD generates realistic images without over-saturated colors and noisy artifacts.

a dog with its reflection below

a DSLR photo of a hamburger inside a restaurant

a professional photo of a sunset behind the Grand Canyon

a dumpster full of trash

a castle in the middle of a marsh

a boy portrait with sunglasses

a cozy living room with a painting of a corgi

a group of elephants walking in muddy wate

a photograph of a hamster

a red fire hydrant spraying water

a small kitchen with a low ceiling

cliffs at day time

To explore the generalization ability of our paradigm, we extend ASD to the image editing task with caption modification, which achieves competitive results.

a cat is posing next to a laptop computer

a dog is posing next to a laptop computer

a cartoon elephant

a ~~cartoon~~ elephant

Generated 3D NeRFs

Thanks to the complete discriminator optimization, the proposed ASD can obtain clean and photorealistic 3D NeRFs.

a delicious hamburger

a baby bunny sitting on top of a stack of pancakes

an ice cream sundae

a delicious croissant

a small saguaro cactus planted in a clay pot

a steaming hot plate piled high with spaghetti and meatballs

a wedge of cheese on a silver platter

a pineapple

a tarantula, highly detailed

Sydney opera house, aerial view

Generated 3D Gaussians (experimental implementation)

Thanks to better generative stability, ASD can generate high-quality 3D models on 3DGS, which has weaker 3D consistency constraints. Several challenges are present, such as the deficiency of multi-view consistency within the rasterization procedure of 3DGS, which results in divergent 2D intersection planes when viewed from different perspectives. Moreover, employing an affine matrix to project a 3D Gaussian into ray space is only effective for producing precise projections in the vicinity of the center and leads to a loss of perspective accuracy in the peripheral areas.

A <Taylor_Swift> wearing sunglasses
* using Taylor Swift LoRA for SD2.1

A portrait of SCARLETT JOHANSSON, head, photorealistic, 8K, HDR.

3mm4s, head
* using Emma Stone LoRA for SD1.5

A portrait of IRONMAN, head, photorealistic, 8K, HDR.

Gandalf, white hair, head, HDR, photorealistic, 8K.

A portrait of BATMAN, head, photorealistic, 8K, HDR.

a delicious hamburger

a typewriter

a pineapple

Citation

Please consider citing the following papers if you make use of this work and/or the corresponding code:

@inproceedings{asd_cvpr_2024,
                    title = {Adversarial Score Distillation: When score distillation meets GAN},
                    author = {Wei, Min and Zhou, Jingkai and Sun, Junyao and Zhang, Xuesong},
                    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
                    month = {June},
                    year = {2024}
                }