Generate depth maps from images
Describe images and extract text with Florence-2
Segment and caption objects in images and videos