Spaces:
Runtime error
Runtime error
| # MaX-DeepLab | |
| MaX-DeepLab is the first fully **end-to-end** method for panoptic segmentation | |
| [1], removing the needs for previously hand-designed priors such as object | |
| bounding boxes (used in DETR [2]), instance centers (used in Panoptic-DeepLab | |
| [3]), non-maximum suppression, thing-stuff merging, *etc*. | |
| The goal of panoptic segmentation is to predict a set of non-overlapping masks | |
| along with their corresponding class labels (e.g., person, car, road, sky). | |
| MaX-DeepLab achieves this goal directly by predicting a set of class-labeled | |
| masks with a mask transformer. | |
| <p align="center"> | |
| <img src="../img/max_deeplab/overview_simple.png" width=450> | |
| </p> | |
| The mask transformer is trained end-to-end with a panoptic quality (PQ) inspired | |
| loss function, which matches and optimizes the predicted masks to the ground | |
| truth masks with a PQ-style similarity metric. In addition, our proposed mask | |
| transformer introduces a global memory path beside the pixel path CNN and | |
| employs all 4 types of attention between the two paths, allowing the CNN to read | |
| and write the global memory in any layer. | |
| <p align="center"> | |
| <img src="../img/max_deeplab/overview.png" width=500> | |
| </p> | |
| ## Prerequisite | |
| 1. Make sure the software is properly [installed](../setup/installation.md). | |
| 2. Make sure the target dataset is correctly prepared (e.g., | |
| [COCO](../setup/coco.md)). | |
| 3. Download the ImageNet pretrained | |
| [checkpoints](./imagenet_pretrained_checkpoints.md), and update the | |
| `initial_checkpoint` path in the config files. | |
| ## Model Zoo | |
| We explore MaX-DeepLab model variants that are built on top of several backbones | |
| (e.g., ResNet model variants [4]). | |
| 1. **MaX-DeepLab-S** replaces the last two stages of ResNet-50-beta with | |
| axial-attention blocks and applies a small dual-path transformer. | |
| (ResNet-50-beta replaces the ResNet-50 stem with the Inception stem [5].) | |
| ### COCO Panoptic Segmentation | |
| We provide checkpoints pretrained on COCO 2017 panoptic train set and evaluated | |
| on the val set. If you would like to train those models by yourself, please find | |
| the corresponding config files under the directory | |
| [configs/coco/max_deeplab](../../configs/coco/max_deeplab). | |
| All the reported results are obtained by *single-scale* inference and | |
| *ImageNet-1K* pretrained checkpoints. | |
| Model | Input Resolution | Training Steps | PQ \[\*\] | PQ<sup>thing</sup> \[\*\] | PQ<sup>stuff</sup> \[\*\] | PQ \[\*\*\] | |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------: | :------------: | :-------: | :-----------------------: | :-----------------------: | :---------: | |
| MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_100k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_100k_coco_train.tar.gz)) | 641 x 641 | 100k | 45.9 | 49.2 | 40.9 | 46.36 | |
| MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_200k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_200k_coco_train.tar.gz)) | 641 x 641 | 200k | 46.5 | 50.6 | 40.4 | 47.04 | |
| MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res641_400k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res641_400k_coco_train.tar.gz)) | 641 x 641 | 400k | 47.0 | 51.3 | 40.5 | 47.56 | |
| MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res1025_100k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res1025_100k_coco_train.tar.gz)) | 1025 x 1025 | 100k | 47.9 | 52.1 | 41.5 | 48.41 | |
| MaX-DeepLab-S ([config](../../configs/coco/max_deeplab/max_deeplab_s_os16_res1025_200k.textproto), [ckpt](https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/max_deeplab_s_os16_res1025_200k_coco_train.tar.gz)) | 1025 x 1025 | 200k | 48.7 | 53.6 | 41.3 | 49.23 | |
| \[\*\]: Results evaluated by the official script. \[\*\*\]: Results evaluated by | |
| our pipeline. See Q4 in [FAQ](../faq.md). | |
| Note that the results are slightly different from the paper, because of the | |
| implementation differences: | |
| 1. Stronger pretrained checkpoints are used in this repo. | |
| 2. A `linear` drop path schedule is used, rather than a `constant` schedule. | |
| 3. For simplicity, Adam [6] is used without weight decay, rather than Radam [7] | |
| LookAhead [8] with weight decay. | |
| ## Citing MaX-DeepLab | |
| If you find this code helpful in your research or wish to refer to the baseline | |
| results, please use the following BibTeX entry. | |
| * MaX-DeepLab: | |
| ``` | |
| @inproceedings{max_deeplab_2021, | |
| author={Huiyu Wang and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, | |
| title={{MaX-DeepLab}: End-to-End Panoptic Segmentation with Mask Transformers}, | |
| booktitle={CVPR}, | |
| year={2021} | |
| } | |
| ``` | |
| * Axial-DeepLab: | |
| ``` | |
| @inproceedings{axial_deeplab_2020, | |
| author={Huiyu Wang and Yukun Zhu and Bradley Green and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, | |
| title={{Axial-DeepLab}: Stand-Alone Axial-Attention for Panoptic Segmentation}, | |
| booktitle={ECCV}, | |
| year={2020} | |
| } | |
| ``` | |
| ### References | |
| 1. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr | |
| Dollar. "Panoptic segmentation." In CVPR, 2019. | |
| 2. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, | |
| Alexander Kirillov, and Sergey Zagoruyko. "End-to-End Object Detection with | |
| Transformers." In ECCV, 2020. | |
| 3. Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, | |
| Hartwig Adam, and Liang-Chieh Chen. "Panoptic-DeepLab: A Simple, Strong, and | |
| Fast Baseline for Bottom-Up Panoptic Segmentation." In CVPR 2020. | |
| 4. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual | |
| learning for image recognition." In CVPR, 2016. | |
| 5. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew | |
| Wojna. "Rethinking the inception architecture for computer vision." In | |
| CVPR, 2016. | |
| 6. Diederik P. Kingma, and Jimmy Ba. "Adam: A Method for Stochastic | |
| Optimization" In ICLR, 2015. | |
| 7. Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng | |
| Gao, and Jiawei Han. "On the Variance of the Adaptive Learning Rate and | |
| Beyond" In ICLR, 2020. | |
| 8. Michael R. Zhang, James Lucas, Geoffrey Hinton, and Jimmy Ba. "Lookahead | |
| Optimizer: k steps forward, 1 step back" In NeurIPS, 2019. | |