Optimal transport

banner-982162_1920

mathematics-1044070_1920

geometry-1023846_1920

lego-1124010_1920

Optimal transport

The Optimal transport (OT) problem has attracted a surge of research interest because it expresses the distance between probability distributions, known as the Wasserstein distance. This strong property enables us to apply this distance to diverse machine learning problems such as generative adversarial networks, graph optimal transport, clustering, and domain adaptation.

The Kantorovich relaxation formulation of the optimal transport (OT) problem is explained briefly. Let ${\bf a}$ and ${\bf b}$ be {probabilities} or positive weight vectors as ${\bf a}=(a_1, a_2, \ldots, a_m)^T \in \mathbb{R}_+^m$ and ${\bf b}=(b_1, b_2, \ldots, b_n)^T \in \mathbb{R}_+^n$ , respectively. Given two empirical distributions, i.e., discrete measures, ${\bf \nu}=\sum_{i=1}^m a_i{\bf \delta}_{{x}_i}$ , ${\bf \mu}=\sum_{j=1}^n b_j{\bf \delta}_{{y}_j}$ and the ground cost matrix ${\bf C} \in \mathbb{R}^{m \times n}$ between their supports, the problem can be formulated as

${\displaytyle \min_{\scriptsize {\bf T} \in \mathcal{U}({\bf a},{\bf b})}}\ \langle {\bf C} ,{\bf T}\rangle}$ ,

where ${\bf T} \in \mathbb{R}^{m \times n}$ represents the transport matrix, and where the domain $\mathcal{U}({\bf a},{\bf b})$ is defined as

$\mathcal{U}({\bf a},{\bf b}) = \Bigl\{ {\bf T} \in \mathbb{R}^{m \times n}_{+}: {\bf T}{\bf 1}_n = {\bf a},{\bf T}^{T}{\bf 1}_m = {\bf b} \Bigr\}$ ,

where ${\bf T}{\bf 1}_n = {\bf a}$ and ${\bf T}^{T}{\bf 1}_m = {\bf b}$ are the marginal constraints. Moreover, we present the sum of the two vectors respectively as $\alpha > 0$ and $\beta > 0$ , i.e., $\sum_{i=1}^m {{\bf a}_i} = \alpha$ and $\sum_{i=1}^n {{\bf b}_i} = \beta$ . Note that $\alpha$ is equal to $\beta$ in the standard OT formulation. The obtained OT matrix ${\bf T}^*$ brings powerful distances as $\mathcal{W}_p({\bf \nu},{\bf \mu}) = \langle {\bf T}^*,{\bf C} \rangle^{\frac{1}{p}}$ , which is known as the $p$ -th order Wasserstein distance. It is used in various fields according to the value of $p$ . Especially, the distance is applied to computer vision when $p=1$ , and clustering when $p=2$ . When ${\bf C}$ is the ground cost matrix and $p=1,$ we specifically designate the $1$ -th order Wasserstein distance as the OT distance.