SDXL VAE
The decoder shares the encoder's attention block, the KV weights.
WF-VAE operates in the wavelet domain, not the pixel domain: the input image is first decomposed via multi-level Haar wavelet transforms, before being processed by the encoder.
References
- 2411.17459
- 2510.22852 (Figure 5)