|||
Depth (d): Scaling network depth is the most common way
used by many ConvN. The intuition is that deeper
ConvNet can capture richer and more complex features, and
generalize well on new tasks. However, deeper networks
are also more difficult to train due to the vanishing gradient
problem (Zagoruyko & Komodakis, 2016). Although several techniques, such as skip connections (He et al., 2016)
and batch normalization (Ioffe & Szegedy, 2015), alleviate
the training problem, the accuracy gain of very deep network
diminishes: for example, ResNet-1000 has similar accuracy
as ResNet-101 even though it has much more layers.
Width (w): Scaling network width is commonly used for small size model. Wider networks tend to be able to capture more fine-grained features and are easier to train. However,extremely wide but shallow networks tend to have difficulties in capturing higher level features.
Resolution (r): With higher resolution input images, ConvNets can potentially capture more fine-grained patterns.
Starting from 224x224 in early ConvNets, modern ConvNets tend to use 299x299 (Szegedy et al., 2016) or 331x331
(Zoph et al., 2018) for better accuracy. Recently, GPipe
(Huang et al., 2018) achieves state-of-the-art ImageNet accuracy with 480x480 resolution. Higher resolutions, such as
600x600, are also widely used in object detection ConvNets
(He et al., 2017; Lin et al., 2017). Figure 3 (right) shows the
results of scaling network resolutions, where indeed higher
resolutions improve accuracy, but the accuracy gain diminishes for very high resolutions (r = 1.0 denotes resolution
224x224 and r = 2.5 denotes resolution 560x560).
Observation 1 – Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.
Observation 2 – In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-23 06:21
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社