Generative Adversarial Nets

Generative Adversarial Nets

  • Link : https://arxiv.org/abs/1406.2661

Abstract

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฒฝ์Ÿ์„ ํ†ตํ•ด Generative model์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๊ฒฝ์Ÿ์€ Generative model G์™€ Discriminative model D ์‚ฌ์ด์—์„œ ์ด๋ค„์ง€๋Š”๋ฐ, G๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋ชจ์‚ฌํ•˜๋Š” ๋ชจ๋ธ์ด๊ณ , D๋Š” ๋ฐ์ดํ„ฐ $x$๊ฐ€ ์ฃผ์›Œ์กŒ์„ ๋•Œ, $x$๊ฐ€ G๊ฐ€ ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ sampling๋œ๊ฑด์ง€, ์‹ค์ œ ๋ฐ์ดํ„ฐ์ธ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.

๋ชจ๋ธ G์˜ ํ•™์Šต ๋ชฉ์ ์€ D๊ฐ€ ์ž˜๋ชป ํŒ๋‹จํ•  ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”์‹œํ‚ค๋Š” ๊ฒƒ์œผ๋กœ, ์ด๋Ÿฌํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” minimax two-player game๊ณผ ๊ฐ™๋‹ค.

์ž„์˜์˜ ํ•จ์ˆ˜ G์™€ D์— ๋Œ€ํ•ด, uniqueํ•œ solution์ด ์กด์žฌํ•˜๋ฉฐ ๊ทธ solution์€ G๊ฐ€ training data๋ฅผ ์™„๋ฒฝํžˆ ๋ชจ์‚ฌํ•˜์—ฌ, D๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ถ„ํ•ด๋‚ผ ํ™•๋ฅ ์ด 1/2์ผ ๋•Œ์ด๋‹ค.(์ฐ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค)

G์™€ D๊ฐ€ multilayer perceptrons์ผ ๊ฒฝ์šฐ, ์ „์ฒด ์‹œ์Šคํ…œ์€ Markov chains๋‚˜ unrolled approximate inference networks์—†์ด backpropagation์œผ๋กœ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

1 Introduction

Backpropagation๊ณผ dropout๊ณผ ๊ฐ™์€ ์ฃผ์š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋•์— Deep Learing์—์„œ๋„ ํŠนํžˆ Discriminative model์ด ํฐ ์„ฑ๊ณต์„ ์ด๋ค˜๋‹ค.

  • Discriminative model์€ ๊ณ ์ฐจ์›์˜ ๋งŽ์€ ์ •๋ณด๊ฐ€ ๋‹ด๊ธด ๋ฐ์ดํ„ฐ์— class label์„ mappingํ•˜๋Š” ๋ชจ๋ธ์„ ๋งํ•œ๋‹ค.

ํ•˜์ง€๋งŒ Deep generative model์—์„œ๋Š” ์ฒ˜๋ฆฌํ•˜๊ธฐ ์–ด๋ ค์šด probablistic computation์œผ๋กœ ์ธํ•ด ๊ทธ ์˜ํ–ฅ์ด ์ ์—ˆ๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์–ด๋ ค์›€๋“ค์„ ํ”ผํ•˜๋Š” ์ƒˆ๋กœ์šด generative model ์ถ”์ • ๊ณผ์ •์„ ์ œ์•ˆํ•œ๋‹ค.

Adversarial nets framework์—์„œ generative model์€ discriminative model์„ ์ ์œผ๋กœ ์ƒ๋Œ€ํ•˜๋Š”๋ฐ, discriminative model์€ sample์ด model distribution์—์„œ ๋‚˜์˜จ๊ฑด์ง€, data distribution์—์„œ ๋‚˜์˜จ๊ฑด์ง€ ๊ตฌ๋ณ„ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค.

Generative model์„ ์ง„์งœ์ธ์ง€ ๊ฐ€์งœ์ธ์ง€ ์‹๋ณ„๋˜์ง€ ์•Š๋Š” ๊ฐ€์งœ ํ†ตํ™”๋ฅผ โ€˜์ œ์กฐโ€™ํ•˜๋ ค๊ณ  ํ•˜๋Š” ์‚ฌ๊ธฐ๊พผ์ด๋ผ ํ•œ๋‹ค๋ฉด, discriminative model์€ ์‚ฌ๊ธฐ๊พผ์ด ๋งŒ๋“  ๊ฐ€์งœ ํ†ตํ™”๋ฅผ โ€˜์‹๋ณ„โ€™ํ•˜๋ ค๊ณ  ํ•˜๋Š” ๊ฒฝ์ฐฐ์ด๋ผ ๋น„์œ ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋‘˜์˜ ๊ฒฝ์Ÿ์€ ์ง„์งœ ํ†ตํ™”์™€ ๊ฐ€์งœ ํ†ตํ™”๊ฐ€ ๋” ์ด์ƒ ๊ตฌ๋ณ„์ด ๋˜์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ ์ด๋ค„์ง€๋ฉฐ, ๊ทธ๋™์•ˆ ์‚ฌ๊ธฐ๊พผ์˜ ๊ฐ€์งœ ํ†ตํ™” โ€˜์ œ์กฐโ€™๋ฐฉ๋ฒ•๊ณผ ๊ฒฝ์ฐฐ์˜ ๊ฐ€์งœ ํ†ตํ™” โ€˜์‹๋ณ„โ€™๋ฐฉ๋ฒ•์€ ๊ฐœ์„ ๋˜๊ฒŒ ๋œ๋‹ค.

์ด๋Ÿฐ (๊ฒฝ์Ÿ)ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋งŽ์€ ๋ชจ๋ธ์˜ ๊ตฌ์ฒด์ ์ธ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

์ด ๋…ผ๋ฌธ์—์„œ๋Š” multilayer perceptron์ธ generative model์˜ input์œผ๋กœ random noise๋ฅผ ๋„ฃ์–ด sample์„ ๋งŒ๋“ค๊ณ , discriminative model ๋˜ํ•œ multilayer perceptron์ธ ํŠน๋ณ„ํ•œ ์ผ€์ด์Šค๋ฅผ ํƒ์ƒ‰ํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค.

์šฐ๋ฆฌ๋Š” ์ด๋Ÿฐ ํŠน๋ณ„ํ•œ ์ผ€์ด์Šค๋ฅผ Adversarial Nets๋ผ ํ•œ๋‹ค.

์ด ๊ฒฝ์šฐ์—, ๋‘ ๋ชจ๋ธ ๋ชจ๋‘ ์˜ค์ง backpropagtion๊ณผ dropout ์•Œ๊ณ ๋ฆฌ์ฆ˜๋งŒ์„ ์ด์šฉํ•ด์„œ ํ›ˆ๋ จ์‹œํ‚ค๊ณ , froward propagtion๋งŒ์œผ๋กœ generative model์œผ๋กœ sample์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

Adversarial nets

Adversarial modeling framework๋Š” ๋ชจ๋ธ๋“ค์ด ๋ชจ๋‘ multilayer perceptrons(deep learning models)์ผ ๋•Œ ํšจ๊ณผ์ ์ด๋‹ค.

  • ๋ฐ์ดํ„ฐ $x$์— ๋Œ€ํ•œ generator์˜ distribution์„ $p_g$
  • input noise ๋ณ€์ˆ˜๋ฅผ $p_z(z)$
  • $p_z(z)$๋ฅผ data space์— mapping ํ•˜๋Š” ๋ชจ๋ธ $G(z:\theta_g)$

G๋Š” $\theta_g$๋กœ ์ด๋ฃจ์–ด์ง„ multilayer perceptron ๋ชจ๋ธ์ด๋ฉฐ, $D(x;\theta_d)$ ๋˜ํ•œ multilayer perceptron์œผ๋กœ x๊ฐ€ $p_g$๊ฐ€ ์•„๋‹Œ real data์˜ ๋ฐ์ดํ„ฐ์ผ ํ™•๋ฅ ์„ single scalar๋กœ ์ถœ๋ ฅํ•œ๋‹ค.

Training example G๊ฐ€ ๋งŒ๋“  samples์— ๋Œ€ํ•ด D๊ฐ€ label์„ ์ •ํ™•ํ•˜๊ฒŒ ๋ถ€์—ฌํ•  ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋„๋ก D๋ฅผ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋™์‹œ์—, G๋Š” $log(1-D(G(z)))$๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ๋‹ค.

G๊ฐ€ $log(1-D(G(z)))$๋ฅผ ์ตœ์†Œํ™”์‹œํ‚ค๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์€ D๊ฐ€ $G(z)$๋ฅผ 1์— ๊ฐ€๊น๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์ด๊ณ , ์ด๋Š” G๊ฐ€ $z$๊ฐ€ real data๋กœ ๋ถ„๋ฅ˜๋  ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

$\underset{G}{min}\ \underset{D}{max}V(D,G) = E_{x\sim p_{data(x)}}[logD(x)]+E_{z\sim p_{z(z)}}[log(1-D(G(z)))]$

Adversarial nets์˜ ์ด๋ก ์ ์ธ ๋ถ„์„์„ ์‚ดํŽด๋ณด์ž. ์—ฌ๊ธฐ์„œ๋Š” D์™€ G์˜ ํ•™์Šต ๊ณผ์ •์„ ๋ณด์—ฌ์ค€๋‹ค.

image

(a) ํ•™์Šต์ด ๋˜์ง€ ์•Š์€ ์ดˆ๊ธฐ ์ƒํƒœ์ด๋‹ค. $p_g$์™€ $p_{data}$์˜ ํ˜•ํƒœ๊ฐ€ ๋น„์Šทํ•œ ์ •๋„์ด๋ฉฐ, discriminative function์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

(b) D์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด๋ถ€์—์„œ D๊ฐ€ ๋ฐ์ดํ„ฐ์˜ sample์„ ์‹๋ณ„ํ•˜๋„๋ก ํ•™์Šต๋  ์ˆ˜ ์žˆ๊ฒŒ, D๊ฐ€ $D^*(x)=\frac{p_{data}(x)}{p_{data}(x)+p_g(x)}$(=data x๊ฐ€ G๊ฐ€ ๋งŒ๋“  ๋ฐ์ดํ„ฐ์ผ์ง€, ์‹ค์ œ ๋ฐ์ดํ„ฐ์ผ์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ )๋กœ ์ˆ˜๋ ดํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค.

(c) G๋ฅผ ํ•™์Šต์‹œํ‚จ๋‹ค. ์ด๋•Œ, D์˜ gradient๊ฐ€ G(z)๊ฐ€ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜๋˜๊ฒŒ๋” ํ•™์Šต๋˜๋„๋ก ํ•™์Šต์‹œํ‚จ๋‹ค.

(d) ์ด๋ ‡๊ฒŒ ํ•™์Šต์˜ ๊ณผ์ •์„ ๊ฑฐ์น˜๋‹ค ๋ณด๋ฉด, ์ฆ‰ G์™€ D๊ฐ€ ์ถฉ๋ถ„ํ•œ ๋Šฅ๋ ฅ์ด ๋˜๋ฉด ๋‘˜์˜ ์„ฑ๋Šฅ์ด ๋” ์ด์ƒ ํ–ฅ์ƒ๋˜์ง€ ๋ชปํ•˜๋Š” ํฌ์ธํŠธ์— ๋‹ค๋‹ค๋ฅธ๋‹ค. ๊ทธ ์ด์œ ๋Š” $p_g=p_{data}$๊ฐ€ ๋˜์–ด, discriminator๊ฐ€ ๋” ์ด์ƒ ๋‘ ๊ฐœ์˜ ๋ถ„ํฌ๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ๋ชปํ•˜๋Š” $D(x)=\frac{1}{2}$๊ฐ€ ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

  • $D(x)=\frac{1}{2}$๋ผ๋Š” ๊ฒƒ์€, D๊ฐ€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ง„์งœ์ธ์ง€ ๊ฐ€์งœ์ธ์ง€ ๊ตฌ๋ถ„ํ•˜๋Š”๊ฒŒ ์ฐ๋Š”๊ฑฐ๋‚˜ ๋‹ค๋ฆ„์—†๋‹ค๋Š” ๋œป์ด๋‹ค. ์ฆ‰, ์ง„์งœ์ผ ํ™•๋ฅ ๊ณผ ๊ฐ€์งœ์ผ ํ™•๋ฅ ์ด 50%๋Œ€ 50%๋ผ๋Š” ๊ฒƒ

ํ•™์Šต ๊ณผ์ •์˜ inner loop์—์„œ D๊ฐ€ ์ตœ์ ํ™”๋˜๋„๋ก ๊ณ„์† ํ•™์Šต์‹œํ‚ค๋Š” ๊ฑด, ๊ณ„์‚ฐ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ  overfitting์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ๋Œ€์‹ ์— D๋ฅผ k๋ฒˆ ํ•™์Šตํ•˜๊ณ , G๋ฅผ 1๋ฒˆ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด, D๊ฐ€ optimalํ•œ solution์— ๊ฐ€๊น๋„๋ก ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๊ณ , ์ด๋ฅผ ํ†ตํ•ด G๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ฒœ์ฒœํžˆ ํ–ฅ์ƒ๋˜๋„๋ก ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.

ํ•™์Šต ์ดˆ๋ฐ˜์—๋Š” G์˜ ์„ฑ๋Šฅ์ด ํ˜•ํŽธ ์—†๋Š”๋ฐ, ์ด๋•Œ๋Š” D๊ฐ€ data๋ฅผ ๋„ˆ๋ฌด ์ž˜ ๊ตฌ๋ณ„ํ•ด๋ฒ„๋ ค์„œ $log(1-D(G(z)))$๊ฐ€ saturate(ํฌํ™”)๋˜์–ด gradient๊ฐ’์ด ๋„ˆ๋ฌด ์ž‘์•„, ํ•™์Šต์ด ์ž˜ ๋˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ, G๊ฐ€ $log(1-D(G(z)))$๋ฅผ ์ตœ์†Œํ™”์‹œํ‚ค๋„๋ก ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, G๊ฐ€ $D(G(z))$๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚ค๋ฉด, ํ•™์Šต ์ดˆ๋ฐ˜์— ๋” ๋†’์€ gradient๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šต์˜ ํšจ์œจ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

4 Theoretical Results

4.1 Global Optimality of $p_g=p_{data}$

minmax game์ด $p_g=p_{data}$์˜ global optimum์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ์ฆ๋ช…ํ•œ๋‹ค. ์ฆ‰, $\underset{G}{min}\ \underset{D}{max}V(D,G)$์‹์ด $p_g=p_{data}$๊ฐ€ ๋˜๋„๋ก ํ•™์Šต๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค.

Proposition 1.

๊ณ ์ •๋œ G๊ฐ€ ์žˆ์„ ๋•Œ, ์ตœ์ ์˜ discriminator D๋Š” $D^*G(x)=\frac{p{data}(x)}{p_{data}(x)+p_g(x)}$๊ฐ€ ๋œ๋‹ค.

Proof. ์–ด๋–ค ๊ณ ์ •๋œ G๊ฐ€ ์žˆ์„ ๋•Œ, discriminator D์˜ training criterion์€ $V(G,D)$๋ฅผ ์ตœ๋Œ“๊ฐ’์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค. $V(G,D)= \int_{x}^{}p_{data}(x)log(D(x))dx\ + \ \int_{z}^{}p_{z}(z)log(1-D(g(z)))dz\ = \ \int_{x}^{}p_{data}(x)log(D(x))\ +\ p_{x}(x)log(1-D((x)))dx$

์ด ์‹์ด ์ตœ๋Œ“๊ฐ’์„ ๊ฐ€์งˆ ๋•Œ์˜ $D(x)$๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด ๋ฏธ๋ถ„ํ•œ๋‹ค. $V(G,D)$๋Š” $y\rightarrow a\ log(y)+b\ log(1-y)$์˜ ํ˜•ํƒœ๋กœ, ๋ฏธ๋ถ„ํ•˜๋ฉด $a/y - b/(1-y)$๊ฐ€ ๋‚˜์˜ค๊ณ , ์ด ์‹์„ $y$์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜๋ฉด, $y=a/(a+b)$๊ฐ€ ๋‚˜์˜จ๋‹ค. $a=p_{data}(x), \ b=p_g(x)$์ด๋ฏ€๋กœ, $D(x)=p_{data}(x)/(p_{data}(x)+p_g(x))$์ผ๋•Œ ์ตœ๋Œ“๊ฐ’์„ ๊ฐ€์ง„๋‹ค.

D๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ชฉ์ ์€ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ  $P(Y=y x)$์˜ log-likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋ผ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋•Œ $Y$๋Š” $x$๊ฐ€ $p_{data}(y=1)$์—์„œ ์˜จ๊ฑด์ง€, $p_g(y=0)$์—์„œ ์˜จ๊ฑด์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

๋”ฐ๋ผ์„œ, $\underset{G}{min}\ \underset{D}{max}V(D,G)$ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹ค์‹œ ์“ธ ์ˆ˜ ์žˆ๋‹ค.

$C(G) = \underset{D}{max}V(G,D) = E_{x \sim p_{data}}[log\frac{p_{data}(x)}{p_{data}(x)+p_g(x)}] + E_{x \sim p_g}[log\frac{p_g(x)}{p_{data}(x)+p_g(x)}]$

Theorem 1.

$C(G)$์˜ ํ•™์Šต ๊ธฐ์ค€(training criterion)์˜ global minimu๊ฐ’์€ $p_g=p_{data}$์ผ๋•Œ๋งŒ ์–ป์–ด์ง„๋‹ค. ์ด๋•Œ, $C(G)=-log4$์ด๋‹ค.

Proof.

$p_g=p_{data}$์ผ ๋•Œ, $D^_G(x)=\frac{1}{2}$ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ($D^G(x)=p{data}(x)/(p_{data}(x)+p_g(x))$)

๋”ฐ๋ผ์„œ, $C(G)=log\frac{1}{2}+log\frac{1}{2}=-log4$๊ฐ€ ๋œ๋‹ค.

์ฆ‰, $C(G)$์˜ global minimum๊ฐ’์ด $-log4$๊ฐ€ ๋˜๊ณ , G๊ฐ€ ์‹ค์ œ data์˜ distribution์„ ์™„๋ฒฝํ•˜๊ฒŒ ๋ชจ์‚ฌํ•  ๋•Œ ($p_g=p_{data}$) ์–ป์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

6 Advantages and disadvantages

Disadvantages

1) $p_g(x)$์— ๋Œ€ํ•œ ๋ช…์‹œ์ ์ธ ํ‘œํ˜„์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค.

2) ํ›ˆ๋ จ ์ค‘ D์™€ G๊ฐ€ ์„œ๋กœ ์ž˜ ๋™๊ธฐํ™” ๋˜์–ด์•ผ ํ•œ๋‹ค. ์ฆ‰, D๊ฐ€ ํ•™์Šต๋˜๊ธฐ ์ „์— G๊ฐ€ ๋„ˆ๋ฌด ๋นจ๋ฆฌ ๋งŽ์ด ํ•™์Šต๋˜์–ด์„œ๋Š” ์•ˆ๋œ๋‹ค. D์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์—…๋ฐ์ดํŠธ ๋˜๊ธฐ ์ „์— G์˜ ๊ฐ€์ค‘์น˜๋งŒ ๋„ˆ๋ฌด ๋งŽ์ด ์—…๋ฐ์ดํŠธ ๋˜๋ฉด, G๊ฐ€ $p_{data}$์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ผ๋Š”๋ฐ ์ถฉ๋ถ„ํ•œ ๋‹ค์–‘์„ฑ์„ ์ง€๋‹ˆ์ง€ ๋ชปํ•˜๊ฒŒ ๋˜์–ด โ€˜Helvetica scenarioโ€™์— ๋น ์ง€๊ฒŒ ๋  ์ˆ˜๋„ ์žˆ๋‹ค.

Adavantages

1) backpropagation์œผ๋กœ gradient๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์–ด Markov chain์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค.

2) ํ•™์Šต ์ค‘ inference๋ฅผ ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.

3) ๋‹ค์–‘ํ•œ ํ•จ์ˆ˜์™€ Adversarian nets framework๋ฅผ ํ•ฉ์น  ์ˆ˜ ์žˆ๋‹ค.

4) Generator network๊ฐ€ data sample์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง์ ‘ ์—…๋ฐ์ดํŠธ ๋˜๋Š”๊ฒŒ ์•„๋‹Œ, discriminator๋ฅผ ํ†ตํ•ด ์–ป์€ gradient๋กœ ์—…๋ฐ์ดํŠธ๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ statistical advantage๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

7 Conclusions and future work

  • G, D์— c๋ฅผ input์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋ฉด, Conditional Generative Model $p(x c)$์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  • Auxiliary network๊ฐ€ $x$๋ฅผ ๊ฐ€์ง€๊ณ  $z$๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ํ›ˆ๋ จํ•˜์—ฌ Learned approximate inference๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • Parameters๋ฅผ ๊ณต์œ ํ•˜๋Š” conditional model์„ ํ•™์Šตํ•˜์—ฌ, $x$์˜ ๋ชจ๋“  ๋ถ€๋ถ„์ง‘ํ•ฉ $S$์— ๋Œ€ํ•ด ์กฐ๊ฑด๋ถ€ํ™•๋ฅ  $p(x_s x_{\not{s}})$๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  • Semi-supervised learning : labeled ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์„ ๋•Œ discriminator๋‚˜ inference net์—์„œ ์–ป์€ features๋กœ classifier์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.
  • Efficiency improvements : G์™€ D๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋” ์ข‹์€ ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ•˜๊ฑฐ๋‚˜ ํ›ˆ๋ จ ์‹œ $z$๋ฅผ sampleํ•˜๋Š” ๋” ์ข‹์€ ๋ฐฉ์‹์„ ์ •ํ•˜์—ฌ ํ•™์Šต ์†๋„๋ฅผ ํฌ๊ฒŒ ๊ฐ€์†ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.

์ด ๋…ผ๋ฌธ์€ adversarial modeling framework์˜ ์‹ค์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ฆ๋ช…ํ•˜์˜€๊ณ , ์ด ์—ฐ๊ตฌ์˜ ๋ฐฉํ–ฅ์„ฑ์ด ์œ ์šฉํ•จ์„ ์‹œ์‚ฌํ•œ๋‹ค.

Categories:

Updated:

Leave a comment