|||
Understanding a residual block is quite easy. In traditional neural networks, each layer feeds into the next layer. In a network with residual blocks, each layer feeds into the next layer and directly into the layers about 2–3 hops away. That’s it. But understanding the intuition behind why it was required in the first place, why it is so important and how similar it looks to some other state of the art architectures is where we are going to focus on. There are more than one interpretations of why residual blocks are awesome and how & why they are one of the key ideas that can make a neural network show state of the art performances on wide range of tasks. Before diving into the details, here is a picture of how a residual block actually looks like.
We know neural networks are universal function approximators and that the accuracy increases with increasing number of layers. But there is a limit to the number of layers added that result in accuracy improvement. So, if neural networks were universal function approximators then it should have been able to learn any simplex or complex function. But it turns out that, thanks to some problems like vanishing gradients and curse of dimensionality, if we have sufficiently deep networks, it may not be able to learn simple functions like an identity function. Now this is clearly undesirable.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-10-19 22:25
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社