Overall pipeline of our PixelHacker. PixelHacker builds upon the latent diffusion architecture by introducing two fixed-size LCG embeddings to separately encode latent foreground and background features. We employ linear attention to inject these latent features into the denoising process, enabling intermittent structural and semantic multiple interactions. This design encourages the model to learn a data distribution that is both structurally and semantically consistent.
@misc{xu2025pixelhacker,
title={PixelHacker: Image Inpainting with Structural and Semantic Consistency},
author={Ziyang Xu and Kangsheng Duan and Xiaolei Shen and Zhifeng Ding and Wenyu Liu and Xiaohu Ruan and Xiaoxin Chen and Xinggang Wang},
year={2025},
eprint={2504.20438},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.20438},
}