博文

[转载]【计算机科学】【2017.11】计算机视觉深度学习的几何与不确定特性

已有 1913 次阅读 2019-1-5 09:58 |系统分类:科研笔记|文章来源:转载

本文为英国剑桥大学（作者：AlexGuy Kendall）的博士论文，共208页。

深度学习和卷积神经网络已经成为计算机视觉的主要工具，这些技术擅长于使用监督学习从数据中学习复杂的表示。特别地，在一定约束条件下，图像识别模型的性能已经超越了人类的能力。然而，计算机视觉的目的是要设计出能够智能观察的机器。这需要从图像和视频中提取比识别更丰富的信息模型。一般来说，将这些深度学习模型从识别应用到计算机视觉中的其它问题更具挑战性。

本文针对计算机视觉中的一些核心问题进行研究：场景理解、摄像机姿态估计、立体视觉和视频语义分割，并提出了端到端的深度学习体系架构。我们的模型优于传统方法，并在许多具有挑战性的计算机视觉基础上推进了最新技术。然而，这些端到端的模型通常无法解释，并且需要大量的训练数据。

为了解决这个问题，我们提出了两种思想：（一）我们不需要从零开始学习一切，因为我们已经对物理世界了解很多；（二）我们不需要从数据中了解一切，我们的模型应该需要知道他们目前所不知道的。本文利用几何学和不确定性的概念来探讨这些思想。特别地，我们展示了如何通过利用问题的底层几何结构来改进端到端的深度学习模型。我们明确了诸如利用核面几何进行无监督学习的建模概念，从而提高了性能。其次，我们引入概率建模和贝叶斯深度学习的思想来理解计算机视觉模型中的不确定性。我们展示了如何量化不同类型的不确定性，以提高实际应用中的安全性。

Deep learning and convolutional neural networks have become thedominant tool for computer vision. These techniques excel at learning complicatedrepresentations from data using supervised learning. In particular, imagerecognition models now out-perform human baselines under constrained settings.However, the science of computer vision aims to build machines which can see.This requires models which can extract richer information than recognition,from images and video. In general, applying these deep learning models fromrecognition to other problems in computer vision is signifcantly morechallenging. This thesis presents end-to-end deep learning architectures for anumber of core computer vision problems; scene understanding, camera poseestimation, stereo vision and video semantic segmentation. Our modelsoutperform traditional approaches and advance state-ofthe-art on a number ofchallenging computer vision benchmarks. However, these end-to-end models areoften not interpretable and require enormous quantities of training data. Toaddress this, we make two observations: (i) we do not need to learn everythingfrom scratch, we know a lot about the physical world, and (ii) we cannot knoweverything from data, our models should be aware of what they do not know. Thisthesis explores these ideas using concepts from geometry and uncertainty.Specifcally, we show how to improve end-to-end deep learning models byleveraging the underlying geometry of the problem. We explicitly model conceptssuch as epipolar geometry to learn with unsupervised learning, which improvesperformance. Secondly, we introduce ideas from probabilistic modeling andBayesian deep learning to understand uncertainty in computer vision models. Weshow how to quantify different types of uncertainty, improving safety for realworld applications.

1 引言

2 场景理解

3 定位

4 立体视觉

5 运动

6 结论

下载英文原文地址：

http://page5.dfpan.com/fs/clc7j2d21f29e1690a1/

更多精彩文章请关注微信号：

转载本文请联系原作者获取授权，同时请注明本文来自刘春静科学网博客。
链接地址：https://blog.sciencenet.cn/blog-69686-1155423.html

上一篇：[转载]【图片新闻】俄罗斯的新型高超音速导弹每秒飞行近两英里，“锆石Zircon”很可能是不可阻挡的尖端防空武器
下一篇：[转载]【读书2】【2014】基于MATLAB的雷达信号处理基础（第二版）——数据积累与相位变化建模(2)

收藏 IP: 114.222.209.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

刘春静

扫一扫，分享此博文

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2017.11】计算机视觉深度学习的几何与不确定特性

1 引言

2 场景理解

3 定位

4 立体视觉

5 运动

6 结论

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

相关博文

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2017.11】计算机视觉深度学习的几何与不确定特性

1 引言

2 场景理解

3 定位

4 立体视觉

5 运动

6 结论

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)