||
Machine Intelligence Research (MIR)专题"Multi-Modal Representation Learning"现公开征集原创稿件,截稿日期为2023年7月1日。欢迎赐稿!
The past decade has witnessed the impressive and steady development of single-modal (e.g., vision, language) AI technologies in several fields, thanks to the emergence of deep learning. Less studied, however, is multi-modal AI – commonly considered the next generation of AI – which utilizes complementary context concealed in different-modality inputs to improve performance.
One typical example of multi-modal AI is the contrastive language–image pre-training (CLIP) model, which has recently demonstrated strong generalization ability in learning visual concepts under the supervision of natural language. CLIP can be applied to a broad range of vision-language tasks, such as visual-language retrieval, human-machine interaction, and visual question answering. Similarly, other more recent approaches, such as BERT and its variants, also tend to focus on two modalities, e.g., vision and language.
By contrast, humans naturally learn from multi different modalities (i.e., sight, hearing, touch, smell, and taste), even when some are incomplete or missing, to form a global concept. Thus, in addition to the two popular modalities, other types of data such as depth, infrared information, events (captured by event cameras), audio, and user interaction, are also important for multi-modal learning in real-world scenes (e.g., contactless virtual social networks). Further, to address the inefficiencies that still exist in modal-modality representation learning, algorithms should, (1) consider human attention mechanisms, (2) address missing modalities, (3) guarantee the privacy of data from certain modalities, and (4) try to use a limited number of training samples, like humans.
Our goal for this special issue is to bring smart solutions together for robust representation learning in multi-modal scenes. We are interested in works related to theoretical, algorithmic, metric, and dataset advances, as well as new applications. This special issue will provide a timely collection of highly novel and original ideas for the broader communities, e.g., computer vision, image processing, natural language processing, pattern analysis and machine intelligence.
Topics of interest include, but are not limited to:
1) Theoretical aspects of robust multi-modal learning models;
2) Efficient multi-modal representation architectures, including human attention mechanisms for specific modalities;
3) Novel multi-modal representation models for image (RGB-D, RGB-T) and video domains;
4) Generative models for multi-modal networks;
5) Multi-modal representation models under point-cloud, 3D, 360°, and 4D scenes;
6) Multi-modal models under different levels of supervision, e.g., fully-/semi-/self-/unsupervised learning;
7) Uncertainty techniques for multi-modal learning;
8) Multi-modal learning combining visual data with audio, text, events, and tactile senses;
9) Novel metrics for multi-modal representation learning;
10) Large-scale datasets specific to multi-modal learning. Data should be publicly available without requiring access permission from the PI, and any related codes should be open source;
11) Multi-modal representation learning designs for low-level vision tasks, e.g., image restoration, saliency detection, edge detection, Interactive image segmentation, and medical image segmentation;
12) SLAM techniques for multi-modal learning;
13) Lightweight and general backbone designs for multi-modal representation;
14) Applications for AR/VR, automatic driving, robotics, and social good such as human interaction and ecosystems;
15) Federated learning models for multi-modal representation;
16) Innovative learning strategies that can exploit imperfect/incomplete/synthesized labels for multi-modal representation;
17) Out of Distribution models.
截稿日期:2023年7月1日
投稿地址(已开通):
https://mc03.manuscriptcentral.com/mir
投稿时,请在系统中选择:
“Step 6 Details & Comments: Special Issue and Special Section---Special Issue on Multi-Modal Representation Learning”.
Deng-Ping Fan (*primary contact), Researcher, Computer Vision Lab, ETH Zurich, Switzerland, denfan@ethz.ch.
Nick Barnes, Professor (Former leader of CSIRO Computer Vision), Australian National University, Australia. nick.barnes@anu.edu.au.
Ming-Ming Cheng, Professor (TPAMI AE), Nankai University, China. cmm@nankai.edu.cn.
Luc Van Gool, Professor (Head of Toyota Lab TRACE), ETH Zurich, Switzerland. vangool@vision.ee.ethz.ch.
We would like to thank the following scholars for their help (e.g., reviewers, organize, etc.) and suggestions in this special issue:
Zongwei Zhou, Postdoc, Johns Hopkins University, USA. zzhou82@jh.edu.
Mingchen Zhuge, Ph.D. Student, KAUST AI Initiative. mingchen.zhuge@kaust.edu.sa.
Ge-Peng Ji, Ph.D. Student, ANU. gepeng.ji@anu.edu.au.
Machine Intelligence Research(简称MIR,原刊名International Journal of Automation and Computing)由中国科学院自动化研究所主办,于2022年正式出版。MIR立足国内、面向全球,着眼于服务国家战略需求,刊发机器智能领域最新原创研究性论文、综述、评论等,全面报道国际机器智能领域的基础理论和前沿创新研究成果,促进国际学术交流与学科发展,服务国家人工智能科技进步。期刊入选"中国科技期刊卓越行动计划",已被ESCI、EI、Scopus、中国科技核心期刊、CSCD等数据库收录。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-23 15:08
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社