Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

  • Link : https://arxiv.org/abs/1512.03385

ResNet์œผ๋กœ ๋” ์œ ๋ช…ํ•œ ๋…ผ๋ฌธ

Residual ์ž”์ฐจ

Neural Network์—์„œ ํŠน์ • layer๋ฅผ ํ†ต๊ณผํ•œ input๊ณผ ํ•ด๋‹น layer์˜ output์˜ ์ฐจ์ด๋ฅผ Residual(์ž”์ฐจ)๋ผ๊ณ  ํ•œ๋‹ค.

Residual Learning(์ž”์ฐจ ํ•™์Šต)์€ ์ด ์ฐจ์ด, ์ฆ‰ ์ž”์ฐจ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ๋งํ•œ๋‹ค.

๋”ฅ๋Ÿฌ๋‹์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” layer๋ฅผ ํ†ต๊ณผํ•œ input $x$ ์— ๋Œ€ํ•œ output $H(x)$ ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. Residual Learning์—์„œ๋Š” ์ด output $H(x)$ ๋Œ€์‹ , input $x$ ์™€ $H(x)$ ์‚ฌ์ด์˜ Residual(์ž”์ฐจ) $F(x)=H(x)โˆ’x$๋ฅผ ์ง์ ‘์ ์œผ๋กœ ํ•™์Šตํ•œ๋‹ค.

Residual Learning์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š”, ๋งŒ์•ฝ $H(x)$๊ฐ€ $x$ ์™€ ๋งค์šฐ ๊ทผ์ ‘ํ•œ ๊ฐ’์ด๋ผ๋ฉด $F(x)$ ๋Š” ๊ฑฐ์˜ $0$์— ๊ฐ€๊นŒ์šธ ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ, $H(x)$ ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋Œ€์‹  $F(x)$ ๋ฅผ ์˜ˆ์ธกํ•จ์œผ๋กœ์จ, network๋Š” ๋” ์‰ฝ๊ฒŒ ํ•™์Šต๋  ์ˆ˜ ์žˆ๊ณ (0์ด๋ผ๋Š” ์ˆซ์ž ๊ฐœ๋…์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ๋” ์‰ฝ๊ธฐ ๋•Œ๋ฌธ), ์ด๋Š” ๋ชจ๋ธ์ด ๊นŠ์–ด์งˆ๋•Œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์—์„œ gradient vanishing problem์„ ์™„ํ™”์‹œํ‚ค๊ณ , ํ•™์Šต์„ ๋” ํšจ์œจ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

โ€œinput๊ณผ ๋™์ผํ•œ ๊ฐ’์„ ๋„˜๊ธฐ๋„๋ก ํ•˜๋Š” ๊ฒƒ๊ณผ์˜ ์ฐจ์ด๋Š” ๋ญ์ง€?โ€

โ†’ ๋‹จ์ˆœํžˆ input๊ณผ ๋™์ผํ•œ ๊ฐ’์„ ๋„˜๊ธฐ๋Š” ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค $H(x)-x=0$์„ ๋ชฉํ‘œ๋กœ ํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต ํšจ์œจ์„ฑ์„ ๋†’์ด๊ณ , ๊นŠ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

image

Bottleneck Block ๋ณ‘๋ชฉ ๋ธ”๋ก

Bottleneck block์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ ๋ฉ”๋ชจ๋ฆฌ์™€ ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ํšจ์œจ์ ์ธ ๊ตฌ์กฐ์ด๋‹ค. ์ฃผ๋กœ 1x1, 3x3, 1x1 ํฌ๊ธฐ์˜ convolution layer๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์Œ“์•„์„œ ๊ตฌ์„ฑํ•˜๋ฉฐ, ์ฃผ๋กœ ResNet๊ณผ ๊ฐ™์€ deep neural network์—์„œ ์‚ฌ์šฉ๋œ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์ด ์žˆ๋‹ค.

  • Dimension Reduction : 1x1 convolution๊ณผ ๊ฐ™์€ ์ž‘์€ convolution layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ input data์˜ dimension์„ ์ค„์—ฌ, ๋” ์ž‘์€ ์ฐจ์›์—์„œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด์— ๋”ฐ๋ผ ๋” ์ ์€ parameter๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ํ•˜์—ฌ ๋ชจ๋ธ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ธ๋‹ค.
  • Deep Block : ๊ทธ ๋‹ค์Œ, ์ž‘์€ ์ฐจ์›์—์„œ ์—ฐ์‚ฐ๋œ output์„ deep convolution layer์™€ ํ•จ๊ป˜ ์ฒ˜๋ฆฌํ•˜์—ฌ, ๋” ๋ณต์žกํ•˜๊ณ  ์ถ”์ƒํ™”๋œ ํŠน์ง•์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค.
  • Dimension Increase : ๋งˆ์ง€๋ง‰์œผ๋กœ ์ตœ์ข… ouptut dimension์„ ์›๋ž˜์˜ ์ฐจ์›์œผ๋กœ ๋ณต์›ํ•˜๋Š” convolution layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ข… output์„ ๊ตฌํ•œ๋‹ค.

Abstract

Deep Neural Network๋Š” ๋ชจ๋ธ์˜ depth๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Deeper ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ Residual Learning Framework ์ œ์•ˆํ•œ๋‹ค.

Residual network ์ด์šฉํ•˜์—ฌ ๋” ์‰ฝ๊ฒŒ optimize ํ•˜๊ณ , depth๊ฐ€ ์ƒ๋‹นํžˆ ๊นŠ์–ด์ง์— ๋”ฐ๋ผ ๋†’์€ ์ •ํ™•๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฝํ—˜์ ์ธ ์ฆ๊ฑฐ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

VGG net๋ณด๋‹ค 8๋ฐฐ๋‚˜ ๊นŠ์€ 152๊ฐœ์˜ layers๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ VGG net๋ณด๋‹ค ๋ณต์žก์„ฑ์ด ๋‚ฎ๊ณ , ImageNet test set์—์„œ ๋” ๋‚ฎ์€ error๋ฅผ ๋ณด์—ฌ์คŒ

Introduction

Deep CNN์€ image classification์—์„œ ํš๊ธฐ์ ์ธ(breakthroughs) ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋Š”๋ฐ, ๋ชจ๋ธ์ด ๊นŠ์–ด์งˆ ์ˆ˜๋ก ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ๋‹ค. ๋ชจ๋ธ์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก vanishing/exploding gradients problem์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋Š” normalized initialization, intermediate normalization layers ๋“ฑ์œผ๋กœ ์–ด๋А์ •๋„ ํ•ด๊ฒฐ์„ ํ•ด์™”๋‹ค.

ํ•˜์ง€๋งŒ, netwrok๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก accuracy๊ฐ€ ๋–จ์–ด์ง€๋Š” Degradation Problem์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋Š” overfitting ๋ฌธ์ œ์™€ ๋‹ฌ๋ฆฌ train accuracy์™€ test accuracy๊ฐ€ ๋ชจ๋‘ ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์„ ๋ณด์ธ๋‹ค.(overfitting์€ train accuracy๋Š” ์˜ฌ๋ผ๊ฐ€๋‚˜, test accuracy๊ฐ€ ๋–จ์–ด์ง€๋Š” ํ˜•ํƒœ์ž„)

์—ฌ๊ธฐ์„œ๋Š” layer๊ฐ€ ๋” ๊นŠ์ด ์Œ“์ผ์ˆ˜๋ก optimize๊ฐ€ ๋ณต์žกํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ๋ผ๊ณ  ๋ณด๊ณ , shallow architecture(์–•์€ ๊ตฌ์กฐ์˜ ๋ชจ๋ธ)๊ณผ deep architecutre๋ฅผ ๋น„๊ตํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. ํ•™์Šต๋œ ์–•์€ ๋ชจ๋ธ์— identity mapping(์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์ด ๋™์ผํ•œ ๋งคํ•‘ ํ•จ์ˆ˜. ์—ฌ๊ธฐ์„œ๋Š” skip connection์„ ํ†ตํ•ด ์ง์ ‘์ ์œผ๋กœ ์ž…๋ ฅ์„ ์ถœ๋ ฅ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” ๊ณผ์ •์„ ์˜๋ฏธ)์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹จ์ˆœํžˆ ๊นŠ๊ฒŒ ์Œ“๋Š” deep architecture๋ฅผ ๋งŒ๋“ค์—ˆ์ง€๋งŒ ์ข‹์€ solution์€ ์•„๋‹ˆ์—ˆ๋‹ค.

image

์ด ๋…ผ๋ฌธ์—์„œ๋Š” degradation problem์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด deep residual learning framework๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ด๋Š” ๊ธฐ์กด mapping์ธ $H(x)$๋ฅผ $F(x) := H(x)-x$ ์ฆ‰, $H(x):=F(x)+x$๋กœ mapping ํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค. ์ด residual mappingd ๊ธฐ์กด์˜ mapping๋ณด๋‹ค optimizeํ•˜๊ธฐ ๋” ์‰ฌ์šด๊ฒƒ์œผ๋กœ ๊ฐ€์ •ํ•œ๋‹ค.

$F(x)+x$๋ฅผ shortcut connection์ด๋ผ๊ณ ๋„ ๋งํ•˜๋Š”๋ฐ, ์ด๋Š” ์œ„ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ํ•œ ๊ฐœ ์ด์ƒ์˜ layer๋ฅผ ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. Shortcut connection์˜ ๋˜ ๋‹ค๋ฅธ ์žฅ์ ์€ ๋ณต์žกํ•œ ๊ณ„์‚ฐ์„ skipํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

Deeper model ์ผ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ์™€๋Š” ๋‹ฌ๋ฆฌ, ๋ชจ๋ธ์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ๋ฐœ์ƒํ•˜๋Š” Degradtion problem์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, residual learning์„ ๋„์ž…ํ•œ Residual Network(ResNet)์„ ์ œ์•ˆํ•˜์—ฌ, ๋ชจ๋ธ์ด ๊นŠ์–ด์ ธ๋„ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. 50-layer ์ด์ƒ๋ถ€ํ„ฐ๋Š” bottleneck block์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋Ÿ‰์„ ์ค„์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, residual learning์„ ์‚ฌ์šฉํ•œ ResNet์ด VGG๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋งŒ๋“  plain network๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

Reference

https://velog.io/@jinhoyoho/๋…ผ๋ฌธ-๋ถ„์„-Deep-Residual-Learning-for-Image-Recognition

Categories:

Updated:

Leave a comment