Deformable DETR: Deformable Transformers for End-to-End Object Detection

Deformable DETR: Deformable Transformers for End-to-End Object Detection

  • Link : https://arxiv.org/abs/2010.04159

๐Ÿ’ก ์ œ์•ˆ ๋ฐฐ๊ฒฝ

Deformable DETR์€ ๋ชจ๋ธ์˜ ํšจ์œจ์„ฑ๊ณผ ์ˆ˜๋ ด ์†๋„๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ชจ๋ธ์ด๋‹ค. ์ด ๋…ผ๋ฌธ์€ DETR์˜ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ , ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ์œ„์น˜์˜ ๊ฐ์ฒด๋ฅผ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.

๊ธฐ์กด์˜ DETR์€ ์ˆ˜๋ ด ์†๋„๊ฐ€ ๋А๋ฆฌ๊ณ  ์ž‘์€ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. DETR์ด ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ํ•™์Šต ์‹œ๊ฐ„์ด ์ƒ๋‹นํžˆ ๊ธด ์ด์œ ๋Š” ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์œ„์น˜์— ๋Œ€ํ•ด attention์„ ์ ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋˜ํ•œ, DETR์€ ํฐ ํ•ด์ƒ๋„์˜ ๊ฐ์ฒด ํƒ์ง€์—๋Š” ์œ ๋ฆฌํ•˜์ง€๋งŒ, ์ž‘์€ ๊ฐ์ฒด ํƒ์ง€์—๋Š” ์•ฝ์ ์„ ๋ณด์ธ๋‹ค.

๐Ÿ’ก Deformable Attention Mechanism

deformabledetr1

Deformable DETR์€ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ์œ„์น˜์— ๋Œ€ํ•ด attention์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋Œ€์‹ , ๊ฐ์ฒด ์ค‘์‹ฌ์˜ sampling points๋งŒ์„ ์„ ํƒํ•˜์—ฌ attention์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด๋กœ์จ ์—ฐ์‚ฐ ํšจ์œจ์ด ํฌ๊ฒŒ ๊ฐœ์„ ๋˜๊ณ , ๋ชจ๋ธ์˜ ์ˆ˜๋ ด ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ๋‹ค.

๊ฐ ์ฟผ๋ฆฌ ํ† ํฐ์€ Reference Points๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ด€์‹ฌ ์œ„์น˜์— ๋Œ€ํ•œ offset์„ ํ•™์Šตํ•˜์—ฌ ๊ฐ์ฒด์˜ ํŠน์ง•์ ์ธ ์œ„์น˜๋งŒ์„ ์„ ํƒ์ ์œผ๋กœ ํƒ์ง€ํ•œ๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ๋ชจ๋ธ์€ ์ค‘์š”ํ•œ ์œ„์น˜์—๋งŒ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

๐Ÿ’ก Multi-Scale Feature Aggregation

deformabledetr2

Deformable DETR์€ FPN(Feature Pyramid Network) ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํƒ์ง€ํ•œ๋‹ค. ๊ฐ ํŠน์ง• ๋ ˆ๋ฒจ(Feature Level)์˜ ์ •๋ณด๋ฅผ ๋ณ‘ํ•ฉํ•จ์œผ๋กœ์จ ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ์˜ ๊ฐ์ฒด ํƒ์ง€๊ฐ€ ๊ฐ€๋Šฅํ•ด์ง€๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ž‘์€ ๊ฐ์ฒด์— ๋Œ€ํ•œ ํƒ์ง€ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋œ๋‹ค.

๐Ÿ’ก Reference Points ์„ค์ • ๋ฐฉ๋ฒ•

Deformable DETR์—์„œ๋Š” ๊ฐ query ํ† ํฐ์— ๋Œ€ํ•ด โ€˜์ฐธ์กฐ์ (reference points)โ€™์„ ์„ค์ •ํ•œ๋‹ค. ์ด ์ฐธ์กฐ์ ์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ query ํ† ํฐ์ด ์ฃผ๋ชฉํ•ด์•ผ ํ•  ์ค‘์‹ฌ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด ์ฐธ์กฐ์ ์„ ๊ธฐ์ค€์œผ๋กœ deformable attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด โ€˜์ƒ˜ํ”Œ๋ง ํฌ์ธํŠธ(sampling points)โ€™๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

๋ชจ๋ธ์€ ํ•™์Šต ๊ณผ์ •์„ ํ†ตํ•ด ๊ฐ query์™€ ๊ฐ€์žฅ ๋ฐ€์ ‘ํ•˜๊ฒŒ ์—ฐ๊ด€๋œ๋‹ค๊ณ  ํŒ๋‹จ๋˜๋Š” ์ด๋ฏธ์ง€ ๋‚ด์˜ ํŠน์ • ์œ„์น˜๋กœ ์ฐธ์กฐ์ ์„ ์„ค์ •ํ•˜๊ฒŒ ๋œ๋‹ค.

๐Ÿ’ก Sampling Points ์„ค์ • ๋ฐฉ๋ฒ•

Deformable Attention์€ ๊ฐ query์— ๋Œ€ํ•ด ๊ณ ์ •๋œ ๊ฐœ์ˆ˜์˜ sampling points๋ฅผ ์„ค์ •ํ•œ๋‹ค. ์ด ๊ณผ์ •์—์„œ reference points๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฐ query๋งˆ๋‹ค ์ผ์ •ํ•œ ๊ฐœ์ˆ˜์˜ offset ๊ฐ’์„ ํ•™์Šตํ•œ๋‹ค. ์ด offset์€ query๋ณ„๋กœ ๋‹ค๋ฅด๊ฒŒ ํ•™์Šต๋˜์–ด ๋‹ค์–‘ํ•œ ์œ„์น˜์˜ sampling points๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ reference points์™€ ํ•™์Šต๋œ offset ๊ฐ’์„ ํ•ฉ์‚ฐํ•˜์—ฌ sampling points๋ฅผ ํ™•์ •ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ฒฐ์ •๋œ points๋Š” ์ฃผ๋กœ ๊ฐ์ฒด์˜ ๊ฒฝ๊ณ„๋‚˜ ์ค‘์š” ๋ถ€๋ถ„์„ ํฌ์ฐฉํ•˜๋Š” ์œ„์น˜๊ฐ€ ๋œ๋‹ค.

์ด๋ ‡๊ฒŒ ๊ฒฐ์ •๋œ sampling points์— ๋Œ€ํ•ด attention ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ, ์ค‘์š”ํ•œ ์œ„์น˜์—๋งŒ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ๊ฐ query ํ† ํฐ์€ ์ค‘์š” ์œ„์น˜์— ์ง‘์ค‘๋œ attention ๊ฐ’์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ฒŒ ๋œ๋‹ค.

Categories:

Updated:

Leave a comment