说明: We propose a method that can generate an unambiguous descr iption (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described. <sc823387242> 在 上传 | 大小:1048576