*DSW 2019: Proceedings of the IEEE Data Science Workshop 2019*, pp.237-244. 2019.

**Abstract:** Co-Clustering, the problem of simultaneously identifying clusters across multiple aspects of a data set, is a natural generalization of clustering to higher-order structured data. Recent convex formulations of bi-clustering and tensor co-clustering, which shrink estimated centroids together using a convex fusion penalty, allow for global optimality guarantees and precise theoretical analysis, but their computational properties have been less well studied. In this note, we present three efficient operator-splitting methods for the convex co-clustering problem: a standard two-block ADMM, a Generalized ADMM which avoids an expensive tensor Sylvester equation in the primal update, and a three-block ADMM based on the operator splitting scheme of Davis and Yin. Theoretical complexity analysis suggests, and experimental evidence confirms, that the Generalized ADMM is far more efficient for large problems.

**Publisher DOI:** 10.1109/DSW.2019.8755599

**Working Copy:** ArXiv 1901.06075

**Summary:** Chi and Lange (J. Computational and Graphical Statistics, 2015) established that splitting methods are an easy-to-implement highly-performant way to solve convex clustering problems. In this work, I extend their analysis to the general case of tensor co-clustering, considering three operator-splitting schemes:

- The classic ADMM
- A generalized (quadratically perturbed) ADMM
- Davis-Yin splitting (equivalent to AMA for this problem)

Classical ADMM has the best *per iteration* performance, but requires solving a (tensor) Sylvester equation at each iteration, which is typically prohibitive. To avoid this, I construct a generalized ADMM scheme that avoids any matrix decompositions or inverses in the updates.

\[\begin{align*}
\mathbf{U}^{(k+1)} &= \left(\alpha \mathbf{U}^{(k)} + \mathbf{X} + \rho\mathbf{D}_{\text{row}}^T(\mathbf{V}^{(k)} - \rho^{-1}\mathbf{Z}^{(k)}_{\text{row}} - \mathbf{D}_{\text{row}}\mathbf{U}^{(k)}) \right. \\ &\left.\qquad+ \rho(\mathbf{V}^{(k)}_{\text{col}} - \rho^{-1}\mathbf{Z}^{(k)}_{\text{col}} - \mathbf{U}^{(k)}\mathbf{D}_{\text{col}})\mathbf{D}_{\text{col}}^T \right)/(1+\alpha) \\
\begin{pmatrix} \mathbf{V}^{(k+1)}_{\text{row}} \\ \mathbf{V}^{(k+1)}_{\text{col}} \end{pmatrix} &= \begin{pmatrix} \text{prox}_{\lambda / \rho \|\cdot\|_{\text{row}, q}}(\mathbf{D}_{\text{row}}\mathbf{U}^{(k+1)} + \rho^{-1}\mathbf{Z}^{(k)}_{\text{row}}) \\
\text{prox}_{\lambda / \rho \|\cdot\|_{\text{col}, q}}(\mathbf{U}^{(k+1)}\mathbf{D}_{\text{col}} + \rho^{-1}\mathbf{Z}^{(k)}_{\text{col}}) \end{pmatrix} \\
\begin{pmatrix} \mathbf{Z}^{(k+1)}_{\text{row}} \\ \mathbf{Z}^{(k+1)}_{\text{col}} \end{pmatrix} &= \begin{pmatrix} \mathbf{Z}^{(k)}_{\text{row}} + \rho(\mathbf{D}_{\text{row}}\mathbf{U}^{(k+1)} - \mathbf{V}^{(k+1)}_{\text{row}}) \\ \mathbf{Z}^{(k)}_{\text{col}} + \rho(\mathbf{U}^{(k+1)}\mathbf{D}_{\text{col}} - \mathbf{V}^{(k+1)}_{\text{col}})
\end{pmatrix}
\end{align*}\]
This method has slightly worse *per iteration* performance, but has superior wall clock performance and better computational complexity.

The generalized ADMM is used internally in my **clustRviz** software for both the exact solver and the **CBASS** pathwise algorithm.

**Related Software:** *clustRviz*

**Citation:**

```
@INPROCEEDINGS{Weylandt:2019-BiClustering,
TITLE="Splitting Methods for Convex Bi-Clustering and Co-Clustering",
AUTHOR="Michael Weylandt",
CROSSREF={DSW:2019},
DOI="10.1109/DSW.2019.8755599",
PAGES={237-244}
}
@PROCEEDINGS{DSW:2019,
BOOKTITLE="{DSW} 2019: Proceedings of the 2\textsuperscript{nd} {IEEE} Data Science Workshop",
YEAR=2019,
EDITOR="George Karypis and George Michailidis and Rebecca Willett",
PUBLISHER="{IEEE}",
LOCATION="Minneapolis, Minnesota"
}
```