June 29, 2022 3 min to read

Multimodal Unsupervised Image-to-Image Translation

논문 요약

Assumption

$x_1 \in X_1$이고 $x_2 \in X_2$ 인 두 도메인의 샘플들이 존재한다고 하겠습니다. 본 논문의 목표는 image to image translation model $p(x_{1 \rightarrow 2}|x_1)$, $p(x_{2 \rightarrow 1}|x_2)$를 배워 두 조건부 확률 $p(x_2|x_1)$과 $p(x_1|x_2)$를 추정하는 문제입니다. 일반적으로 $p(x_2|x_1)$과 $p(x_1|x_2)$는 복잡한 분포이기 때문에 deterministic translation model이 좋은 성능을 내지는 못합니다. 이러한 문제를 해결하기 위해 본 논문에서는 latent space를 부분적으로 공유하고 그 분포를 추정합니다. 본 논문에서 사용하는 latent code는 content code $c$, style code $s$입니다. 두 도메인이 공유하는 latent code는 c이며, $s$는 각 도메인이 개별적으로 가집니다. 즉 이것은 joint distribution에 속하는, 대응되는 한 이미지 쌍 $(x_1, x_2)$가 $x_1=G_1^(c,s_1)$, $x_2=G_2^(c,s_2)$를 통해 생성될 수 있다는 것을 뜻합니다. 여기서 $c, s_1, s_2$는 $x_1$ 또는 $x_2$로부터 나온 content code 및 style code 이며, 각 도메인의 $G_1^$과 $G_2^$은 generator들 입니다. 또한, 본 논문의 G는 deterministic function이며 각각 대응되는 Encoder E를 가집니다. 본 논문은 이전 UNIT에서 제안된 shared latent space assumption과 비슷한데, UNIT의 경우 latent space를 전체적으로 공유하나 본 논문의 MUNIT에서는 content space만 공유하고 style의 경우 공유하지 않습니다.

Method

Model overview

본 논문의 모델은 위 2가지의 AutoEncoder로 이뤄져있으며, 각 AE의 latent code는 content code $c$와 style code $s$로 구성됩니다. 본 논문에서는 real 이미지에서 변환된 이미지가 타겟 도메인에서 구분되지 않도록 adversarial 하게 모델을 학습했습니다. 그와 동시에 bidirectional reconstruction loss를 통해 두 이미지와 latent code를 복원할 수 있게 합니다.

latent code
- content code $c$
- style code $s$
loss function
- adversarial loss
- reconstruction loss

Bidirectional reconstruction loss

Image reconstruction loss

$L_{recon}^{x_1} = E_{x_1 \sim p(x_1)}[

G_1(E_1^c(x_1),E_1^s(x_1))-x_1

_1]$

Latent reconstruction loss

$L_{recon}^{c_1} = E_{c_1 \sim p(c_1), s_2 \sim q(s_2)}[

E_2^c(G_2(c_1,s_2))-c_1

_1]$

$L_{recon}^{s_2} = E_{c_1 \sim p(c_1), s_2 \sim q(s_2)}[||E_2^s(G_2(c_1,s_2))-s_1||1]$ $q(s_2) \sim N(0,I), p(c_1)$은 $c_1 = E_1^c(x_1)$과 $x_1 \sim p(x_1)$에서 얻어집니다. $L{recon}^{x_2},L_{recon}^{c_2}, L_{recon}^{s_1}$은 위와 비슷합니다. $L_1\ loss$는 sharp한 출력 이미지를 얻는데 도움이 되기 때문에 사용했습니다.

Adversarial loss

$L_{GAN}^{x_2} = E_{c_1 \sim p(c_1), s_2 \sim q(s_2)}[log(1-D_2(G_2(c_1,s_2)))] + E_{x_2 \sim p(x_2)}[logD_2(x_2)]$ $L_{GAN}^{x_1}$도 위와 비슷합니다.

Total loss

$\underset{E_1, E_2, G_1, G_2\ }{\overset{min}{}} \underset{\ D_1, D_2}{\overset{max}{}}L(E_1,E_2,G_1,G_2,D_1,D_2) = L_{GAN}^{x_1} + L_{GAN}^{x_2} + \lambda_x(L_{recon}^{x_1} + L_{recon}^{x_2}) + \lambda_c(L_{recon}^{c_1} + L_{recon}^{c_2}) + \lambda_s(L_{recon}^{s_1} + L_{recon}^{s_2})$ 여기서 $\lambda_x, \lambda_c, \lambda_s$는 각 reconstruction term의 중요도를 제어하는 weight

본 논문에서의 AutoEncoder 구조

Proposition

$E_1^, E_2^, G_1^, G_2^$이 존재하고, $E_1^=(G_1^)^{-1}$이고, $E_2^=(G_2^)^{-1}$이며, $p(x_{1 \rightarrow 2})=p(x_2)$와 $p(x_{2 \rightarrow 1})=p(x_1)$ 이다.
Latent distribution matching
1. p가 1번 도메인의 분포, q가 2번 도메인의 분포라고 할 때, optimality에 도달하면 다음과 같습니다. $p(c_1)=p(c_2), p(s_1)=q(s_1),p(s_2)=q(s_2)$
Joint distribution matching
1. optimality에 도달하면 다음과 같습니다.
  $p(x_1, x_{1 \rightarrow 2}) = p(x_{2 \rightarrow 1}, x_2)$
Style-augmented Cycle Consistency
1. $h_1 = (x_1, s_2) \in H_1, h_2=(x_2,s_1) \in H_1$이라 하겠습니다. 본 논문의 모델은 $H_1$에서 $H_2$로 변환되는 것을 목적으로 하고 있고, 이러한 mapping을 $F_{1 \rightarrow 2}$로 정의합니다. $F_{1 \rightarrow 2}(h_1) = F_{1 \rightarrow 2}(x_1, s_2)$와 같은 연산을 수행하게 되고, 이는 $(G_2(E_1^c(x_1),s_2),E_1^s(x_1))$으로 정의됩니다. optimality에 도달하면 다음과 같습니다.
  $F_{1 \rightarrow 2} = (F_{2 \rightarrow 1})^{-1}$
2. cycle consistency loss
  $L_{cc}^{x_1} = E_{x_1 \sim p(x_1), s_2 \sim q(s_2)}[|| G_1(E_2^c(G_2(E_1^c(x_1),s_2)), E_1^s(x_1))-x_1 ||_1]$

Experiment results

Reference

paper : https://arxiv.org/abs/1804.04732

RebuilderAI

Multimodal Unsupervised Image-to-Image Translation

Assumption

Method

Model overview

Bidirectional reconstruction loss

Adversarial loss

Total loss

본 논문에서의 AutoEncoder 구조

Proposition

Experiment results

Reference

Sanghyeon An

Multimodal Unsupervised Image-to-Image Translation

Assumption

Method

Model overview

Bidirectional reconstruction loss

Adversarial loss

Total loss

본 논문에서의 AutoEncoder 구조

Proposition

Experiment results

Reference

Share

Sanghyeon An