Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka,Yu Yoshida,Yuiga Wada,Shumpei Hatanaka,Komei Sugiura,Yui Iioka,Yu Yoshida,Yuiga Wada,Shumpei Hatanaka,Komei Sugiura
In this study, we aim to develop a model that comprehends a natural language instruction (e.g., “Go to the living room and get the nearest pillow to the radio art on the wall”) and generates a segmentation mask for the target everyday object. The task is challenging because it requires (1) the understanding of the referring expressions for multiple objects in the instruction, (2) the prediction of...