Semi-relaxed OT

banner-982162_1920
mathematics-1044070_1920
geometry-1023846_1920
lego-1124010_1920
previous arrow
next arrow

Semi-Relaxed (SR) Optimal transport

Optimal transport (OT), which provides a distance between two probability distributions by considering their spatial locations, has been applied to widely diverse applications. Computing an OT problem requires a solution of linear programming with tight mass-conservation constraints. This requirement hinders its application to large-scale problems. One relaxes only one of the two marginal constraints in the standard optimal transport (OT) problem. This is designated as semi-relaxed OT (SROT) problem, or the semi-constrained OT (SCOT), which is formally defined as

\min_{\scriptsize {{\bf T}\geq {\bf 0},{\bf T}^T{\bf 1}_m = {\bf b}}}\ \langle {\bf C} ,{\bf T} \rangle + \Phi({\bf T}{\bf 1}_n,{\bf a}),

where \Phi({\bf x}, {\bf y}) is a smooth divergence measure function. This setting is useful, for example, with color transfer problems. Recently, the SROT formulation has been applied for graph dictionary learning. It also exhibits the robustness to outliers in generative models. 

Fast block-coordinate Frank-Wolfe (BCFW) for SROT

Addressing a convex semi-relaxed OT, we consider a fast block-coordinate Frank-Wolfe (BCFW) algorithm, which gives sparse solutions. Specifically, we provide their upper bounds of the worst convergence iterations and equivalence between the linearization duality gap and the Lagrangian duality gap. Three fast variants of the proposed BCFW are also proposed. Numerical evaluations of color transfer problems demonstrate that the proposed algorithms outperform state-of-the-art algorithms across different settings.

We particularly address the SROT problem with \Phi({\bf x},{\bf y})=\frac{1}{2\lambda}\|{\bf x}-{\bf y}\|_2^2 because it is not only smooth but also convex. The problem of interest is formally defined as

\min_{\scriptsize {\bf T}\geq {\bf 0}, {\bf T}^T{\bf 1}_m = {\bf b}}\left\{f({\bf T}):=\langle {\bf T},{\bf C} \rangle + \frac{1}{2\lambda}\|{\bf T}{\bf 1}_n-{\bf a}\|_2^2\right\},

where \lambda is a relaxation parameter. The domain is transformed into

\mathcal{M} = {b}_1\Delta_m \times {b}_2\Delta_m \times \cdots \times {b}_n \Delta_m,

where b_i\Delta_m represents the simplex of the summation b_i.

Semi-relaxed Sinkhorn (SR-Sinkhorn)

Addressing the KL divergence for the SROT problem, we formally define an SROT problem with the KL divergence as

\min_{{\bf T}\geq {\bf 0}, {\bf T}^T{\bf 1}_n={\bf b}}\Bigl\{ f({\bf T}) :=\langle {\bf C} ,{\bf T} \rangle + \tau\mathrm{KL}({\bf T}{\bf 1}_n,{\bf a})\Bigr\},

where \mathrm{KL}({\bf x},{\bf y}) stands for the KL divergence between {\bf x} \in \mathbb{R}_+^n and {\bf y} \in \mathbb{R}_+^n, which is defined as \sum_i {\bf x}_i \log {({\bf x}_i/{\bf y}_i)} - {\bf x}_i + {\bf y}_i. \tau is a regularization parameter of this KL divergence. We then define an entropy regularized SROT problem as

\min_{{\bf T}\geq {\bf 0}, {\bf T}^T{\bf 1}_n={\bf b}}}\Bigl\{ g({\bf T}) :=\langle {\bf C} ,{\bf T}\rangle + \tau\mathrm{KL}({\bf T}{\bf 1}_n,{\bf a})-\eta \mathrm{H}({\bf T})\Bigr\},

where \mathrm{H}({\bf T}) represents the entropy term as \mathrm{H}({\bf T})=-\sum_{i,j}{\bf T}_{i,j}(\log {\bf T}_{ij}-1), and where \eta is its regularization parameter. Under this formulation, considering the dual form of the equation above, the SR–Sinkhorn algorithm can be derived. For evaluation of how the constraint relaxation affects the algorithm behavior and solution, it is vitally necessary to present the theoretical convergence analysis in terms not only of the functional value gap, but also of the marginal constraint gap as well as the OT distance gap. However, no existing work has addressed all analyses simultaneously. To this end, this paper presents a comprehensive convergence analysis for SR–Sinkhorn. After presenting the \epsilon-approximation of the functional value gap based on a new proof strategy and exploiting this proof strategy, we give the upper bound of the marginal constraint gap. We also provide its convergence to the \epsilon-approximation when two distributions are in the probability simplex. Furthermore, the convergence analysis of the OT distance gap to the \epsilon-approximation is given as assisted by the obtained marginal constraint gap. The latter two theoretical results are the first results presented in the literature related to the SROT problem.