Chenfiona的个人博客分享 http://blog.sciencenet.cn/u/Chenfiona

博文

专题征稿 | 用于视频理解的多模态学习、时序建模及基础模型

已有 245 次阅读 2024-10-22 17:48 |个人分类:最新资讯|系统分类:博客资讯

标签.jpg

MIR专题"Multimodal Learning, Temporal Modeling, and Foundation  Models for Video Understanding"现公开征集原创稿件,截稿日期为2025年3月31日。欢迎赐稿!

专题征稿

Special Issue on Multimodal Learning, Temporal Modeling, and Foundation  Models for Video Understanding专题简介

Video understanding focuses on interpreting dynamic visual information from video data to recognize  objects, actions, interactions, and environments in a time-structured manner. It has emerged as a critical  area of research in computer vision due to its wide-ranging applications in autonomous systems, video surveillance, entertainment, healthcare, and human-computer interaction. Recent advancements in deep  learning, especially in areas like spatiotemporal processing, multimodal learning, and graph-based  modeling, have significantly enhanced model’s ability to comprehend complex video scenes. Despite significant advancements, the following key challenges continue to pose obstacles to developing accurate, efficient, and robust systems:

1. High Dimensionality and Computational Complexity: Videos are inherently high-dimensional data, with multiple frames contributing to a vast amount of information. Analyzing these sequences requires models to be capable of efficiently processing both spatial and temporal  information, which often leads to high computational costs. Balancing the accuracy of video understanding models with the need for real-time processing in applications such as autonomous  driving or video surveillance is a pressing challenge. 

2. Temporal Coherence and Long-term Dependencies: Understanding events in a video often relies  on tracking objects and interpreting their actions over time. Capturing temporal coherence, especially over long sequences, is difficult due to the need to model both short-term interactions  and long-term dependencies between different entities. Traditional methods struggle with maintaining consistent object tracking and event detection across extended time frames. 

3. Multimodal Integration: Video data encompasses more than just visual information—auditory cues, text description, and motion data are also essential for comprehending scenes. The challenge lies in effectively fusing these modalities to provide a holistic understanding of the  scene. Many systems still struggle with aligning and integrating multimodal inputs in a meaningful  way that improves recognition and interpretation accuracy. 

4. Ambiguity in Action and Event Recognition: Distinguishing between similar actions or events in a video can be highly ambiguous. For example, the actions of sitting down and falling can appear  visually similar, yet have vastly different meanings. Accurately recognizing and categorizing these nuanced actions requires models with a deep understanding of spatiotemporal context, which is challenging to achieve, especially in complex environments with multiple actors and activities. 

5. Occlusion and Viewpoint Variations: In real-world scenarios, objects or people in a video often  get occluded or appear from different angles. These occlusions and viewpoint changes can obscure key parts of the scene, leading to ambiguity in identifying actions and objects. Models  need to be robust enough to handle partial visibility, changes in camera angles, and dynamic  environments, but current systems frequently fall short in such situations. 

6. Data Annotation and Scalability: Training effective video understanding models often requires  large, annotated datasets. However, manually labeling video data is time-consuming and expensive, particularly when considering both spatial and temporal dimensions. The scalability of  current solutions is limited by the availability of largely annotated datasets, and the development of models capable of learning from less data or through self-supervision is still in its infancy. 

7. Adaptation and Generality: Many state-of-the-art models are trained on curated datasets that  may not fully represent the complexity of real-world environments. When deployed in the real  world, these models often encounter variations in lighting, weather, and unpredictable interactions, leading to performance degradation. Ensuring that models can generalize to unseen environments and adapt to changing conditions is an ongoing challenge.

征稿范围(包括但不限于)

We believe that this special issue will offer a timely collection of research outcomes to benefit video  understanding in the long run. Topics of interest include but are not limited to: 

• Temporal Dynamics and Spatiotemporal Feature Extraction: Leveraging advanced techniques to model temporal dependencies and relationships between objects and events over time, e.g.,  graph neural networks and transformers. 

• Multimodal Learning for Video Understanding: Integrating visual, auditory, text, or motion  information to improve scene comprehension. 

• Scene Segmentation in Videos: Enhancements in accurately segmenting dynamic scenes across frames, e.g., video semantic segmentation, video instance segmentation, video panoptic segmentation, video object segmentation, motion segmentation, scene change detection, interactive video segmentation, and video salient object detection.

• Object Tracking in Videos: Advancements in accurately tracking objects across video frames,  e.g., single/multiple object tracking, long-term object tracking, trajectory prediction, video object/person re-identification, multimodal object tracking, 3D object tracking, and joint tracking and segmentation. 

• Action Recognition and Event Detection: New methods for identifying and distinguishing complex actions and events in videos, e.g., action segmentation, video summarization/captioning, action label prediction, video prediction, video retrieval, procedure  and action understanding, and video grounding. 

• Data/Label Efficient Video Learning: Developing new techniques for self-supervised learning, unsupervised learning, few-shot learning, and semi-supervised learning with videos. 

• Personalization of Large Foundation Models for Video Understanding: Advanced techniques for personalizing large foundation models (LFMs) for video understanding, e.g., using LFMs for  video segmentation and tracking.

投稿指南

1) 截稿日期:2025年3月31日

2) 投稿地址(已开通)

https://mc03.manuscriptcentral.com/mir

投稿时,请在系统中选择:

“Step 6 Details & Comments: Special Issue and Special Section---Special Issue on Multimodal Learning, Temporal Modeling, and Foundation  Models for Video Understanding”.

3) 投稿及同行评议指南:

Full length manuscripts and peer reviewing will follow the MIR guidelines. For details: https://www.springer.com/journal/11633

客座编委

Yun Liu

Agency for Science, Technology and Research (A*STAR), Singapore

Email: vagrantlyun@gmail.com 

Guolei Sun

ETH Zurich, Switzerland

Email:  sunguolei.kaust@gmail.com 

Radu Timofte

University of Wurzburg, Germany & ETH Zurich, Switzerland

Email: Radu.Timofte@uni-wuerzburg.de 

Ender Konukoglu

ETH Zurich, Switzerland

Email:  ender.konukoglu@vision.ee.ethz.ch 

Luc Van Gool

ETH Zurich, Switzerland & KU Leuven, Belgium & Institute for Computer Science,  Artificial Intelligence and Technology (INSAIT), Bulgaria

Email: vangool@vision.ee.ethz.ch

纸刊免费寄送

Machine Intelligence Research

MIR为所有读者提供免费寄送纸刊服务,如您对本篇文章感兴趣,请点击下方链接填写收件地址,编辑部将尽快为您免费寄送纸版全文!

说明:如遇特殊原因无法寄达的,将推迟邮寄时间,咨询电话010-82544737

收件信息登记:

https://www.wjx.cn‍/vm/eIyIAAI.aspx#  

关于Machine Intelligence Research

Machine Intelligence Research(简称MIR,原刊名International Journal of Automation and Computing)由中国科学院自动化研究所主办,于2022年正式出版。MIR立足国内、面向全球,着眼于服务国家战略需求,刊发机器智能领域最新原创研究性论文、综述、评论等,全面报道国际机器智能领域的基础理论和前沿创新研究成果,促进国际学术交流与学科发展,服务国家人工智能科技进步。期刊入选"中国科技期刊卓越行动计划",已被ESCI、EI、Scopus、中国科技核心期刊、CSCD等20余家国际数据库收录,入选图像图形领域期刊分级目录-T2级知名期刊。2022年首个CiteScore分值在计算机科学、工程、数学三大领域的八个子方向排名均跻身Q1区,最佳排名挺进Top 4%,2023年CiteScore分值继续跻身Q1区。2024年获得首个影响因子(IF) 6.4,位列人工智能及自动化&控制系统两个领域JCR Q1区。

往期目录

2024年第5期 | 大语言模型,无人系统,统一分类与拒识...

2024年第4期 | 特约专题: 多模态表征学习

2024年第3期 | 分布式深度强化学习,知识图谱,推荐系统,3D视觉,联邦学习...

2024年第2期 | 大语言模型、零信任架构、常识知识推理、肿瘤自动检测和定位...

2024年第1期 | 特约专题: AI for Art

2023年第6期 | 影像组学、机器学习、图像盲去噪、深度估计...

2023年第5期 | 生成式人工智能系统、智能网联汽车、毫秒级人脸检测器、个性化联邦学习框架... (机器智能研究MIR)

2023年第4期 | 大规模多模态预训练模型、机器翻译、联邦学习......

2023年第3期 | 人机对抗智能、边缘智能、掩码图像重建、强化学习... 

2023年第2期 · 特约专题 | 大规模预训练: 数据、模型和微调

2023年第1期 | 类脑智能机器人、联邦学习、视觉-语言预训练、伪装目标检测... 

2022年第6期 | 因果推理、视觉表征学习、视频息肉分割...

2022年第5期 | 重磅专题:类脑机器学习

2022年第4期 | 来自苏黎世联邦理工学院Luc Van Gool教授团队、清华大学戴琼海院士团队等

2022年第3期 | 聚焦自然语言处理、机器学习等领域;来自复旦大学、中国科学院自动化所等团队

2022年第2期 | 聚焦知识挖掘、5G、强化学习等领域;来自联想研究院、中国科学院自动化所等团队

主编谭铁牛院士寄语, MIR第一期正式出版!

好文推荐

自动化所刘成林团队 | 统一分类与拒识: 一种一对多框架

上海交大张拳石团队 | 综述: 基于博弈交互理论的神经网络可解释性研究

专题好文 | 再思考人群计数中的全局上下文

专题好文 | Luc Van Gool团队: 基于分层注意力的视觉Transformer

浙江大学孔祥维团队 | 综述: 迈向真正以人为本的XAI

澳大利亚国立大学Nick Barnes团队 | 对息肉分割的再思考: 从分布外视角展开

前沿观点 | Segment Anything并非一直完美: SAM模型在不同真实场景中的应用调查

精选好文 | 推荐系统的波纹知识图谱卷积网络

复旦邱锡鹏团队 | MOSS: 一个开源的对话式大语言模型

自动化所黄凯奇团队 | 分布式深度强化学习:综述与多玩家多智能体学习工具箱

约翰霍普金斯大学Alan Yuille团队 | 从时序和高维数据中定位肿瘤的弱标注方法

专题综述 | 大语言模型中的知识生命周期

精选综述 | 零信任架构的自动化和编排: 潜在解决方案与挑战

欧洲科学院院士蒋田仔团队 | 脑成像数据的多模态融合: 方法与应用

金耀初团队&郑锋团队 | 综述: 深度工业图像异常检测

专题好文 | 创新视听内容的联合创作: 计算机艺术面临的新挑战

下载量TOP好文 | 人工智能领域高下载文章集锦 (2022-2023年)

引用量TOP好文 | 人工智能领域高引用文章集锦 (2022-2023年)

综述 | 清华张学工教授: 肺癌影像组学中的机器学习

哈工大江俊君团队 | DepthFormer: 利用长程关联和局部信息进行精确的单目深度估计

Luc Van Gool团队 | 通过Swin-Conv-UNet和数据合成实现实用图像盲去噪

贺威团队&王耀南院士团队 | 基于动态运动基元的机器人技能学习

乔红院士团队 | 类脑智能机器人:理论分析与系统应用 (机器智能研究MIR)

南科大于仕琪团队 | YuNet:一个速度为毫秒级的人脸检测器

上海交大严骏驰团队 | 综述: 求解布尔可满足性问题(SAT)的机器学习方法

西电公茂果团队 | 综述: 多模态数据的联邦学习

高文院士团队 | 综述: 大规模多模态预训练模型

前沿观点 | 谷歌BARD的视觉理解能力如何?对开放挑战的实证研究

港中文黄锦辉团队 | 综述: 任务型对话对话策略学习的强化学习方法

南航张道强教授团队 | 综述:用于脑影像基因组学的机器学习方法

ETHZ团队 | 一种基于深度梯度学习的高效伪装目标检测方法 (机器智能研究MIR)

Luc Van Gool团队 | 深度学习视角下的视频息肉分割

专题综述 | 高效的视觉识别: 最新进展及类脑方法综述

北大黄铁军团队 | 专题综述:视觉信息的神经解码

专题综述 | 迈向脑启发计算机视觉的新范式

专题好文 | 新型类脑去噪内源生成模型: 解决复杂噪音下的手写数字识别问题

戴琼海院士团队 | 用以图像去遮挡的基于事件增强的多模态融合混合网络

ETH Zurich重磅综述 | 人脸-素描合成:一个新的挑战

华南理工詹志辉团队 | 综述: 面向昂贵优化的进化计算

东南大学张敏灵团队 | 基于选择性特征增广的多维分类方法

联想CTO芮勇团队 | 知识挖掘:跨领域的综述

复旦邱锡鹏团队 | 综述:自然语言处理中的范式转换

MIR资讯

挺进Q1区前10名!MIR首个影响因子发布

专题征稿 | Special Issue on Embodied Intelligence

专题征稿 | Special Issue on Transformers for Medical Image Analysis

特别提醒!请认准MIR官方渠道,谨防受骗

2024年 AI 领域国际学术会议参考列表

MIR 优秀编委 & 优秀审稿人 & 高被引论文 (2023年度)

年终喜报!MIR科技期刊世界影响力指数跻身Q1区 (含100份龙年礼包)

最新 | 2023研究前沿及热点解读 (附完整PDF)

前进20名!MIR再度跻身国际影响力TOP期刊榜单

喜报 | MIR入选图像图形领域 T2级 “知名期刊”!

双喜!MIR入选”2022中国科技核心期刊”,并被DBLP收录 | 机器智能研究MIR

报喜!MIR入选2022年国际影响力TOP期刊榜单

喜报 | MIR被 ESCI 收录!

喜报 | MIR 被 EI 与 Scopus 数据库收录

片尾名片.jpg



https://blog.sciencenet.cn/blog-749317-1456492.html

上一篇:智能科学创新讲堂 | 华中科技大学王兴刚:面向视觉理解生成和规划的高效率视觉表征学习
下一篇:90s解读AI | 西电公茂果团队: 综述-多模态数据的联邦学习
收藏 IP: 159.226.178.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-23 01:24

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部