Graph analysis

previous arrow
next arrow

Graph embedding, network embedding

Many practical data systems use network structured data, or graph structured data, which can include web page networks, social networks, road traffic infrastructure, biological networks, and information networks. It is challenging to handle such network data effectively for analytical tasks such as link prediction, node classification, node recommendation, and network visualization. Classical topology-based network representation techniques are hindered by onerous bottlenecks encountered in handling large-scale and high-dimensional network data because they handle an input adjacency matrix directly and because they are adversely affected by noise, outliers, and redundant data. By contrast, network embedding (NE) or graph embedding (GE) has come to be a promising approach that seeks low-dimensional vector representations of nodes in a network to preserve and extract its latent network structure efficiently. Actually, NE alleviates such problems through low-dimensional representation while preserving the original intrinsic structure. Hence, it allows various off-the-shelf machine learning tools to apply this representation directly.

Sequential semi-orthogonal NMF with negative residual reduction (SSO-NRR-NMF)

One recent advanced NE has been achieved in approaches attempting to factorize a given designed matrix to obtain low-dimensional representation assuming that the node connectivity matrix is globally low-rank. It is, however, not always true when the matrix consists of a complex structure. This structure hinders ineffective representations from capturing all the observed connectivity patterns. In this regard, a novel multi-level NE framework (BoostNE) using nonnegative matrix factorizations (NMFs) has been proposed. The framework learns multiple embedding representations with different granularities, i.e., globally and locally low-rankness. However, its sequential multi-level embedding discards negative residuals to enforce residual matrices that are nonnegative at the succeeding level. It also does not consider mutually redundant bases across multiple levels. To alleviate these problems, building on BoostNE, this research presents a proposal of a sequential semi-orthogonal NMF with negative residual reduction, designated as SSO-NRR-NMF. Convergence analysis of SSO-NRR-NMF is also given. Numerical evaluations illustrate the effectiveness of SSO-NRR-NMF when used with several real-world datasets. 


  • R. Hashimoto and HK, “Sequential Semi-Orthogonal Multi-Level NMF with Negative Residual Reduction for Network Embedding,” IEEE ICASSP2020.


Graph matching

Graph-structured representation has attracted a surge of research interest for estimation of graph similarity in various graph-based problems. One example representing fundamental problems in machine learning, statistics, and computer vision is graph matching (GM). This problem seeks and aligns optimal correspondence between a pair of graph-structured data to minimize their vertices and edges disagreements. This is a challenging problem in theory and in practice. GM problem arises frequently in many applications including those of computer vision , network analysis, bioinformatics, biology, and data mining. One application is the detection of many variations of malware samples. Coding samples with the graph-structured representation can reformulate such a detection difficulty as a GM difficulty, where unknown samples are compared with already-known malware samples and clean samples. Another application is graph classification of graph-represented proteins against known proteins in databases, which identifies a new protein obtained in experiments.

The GM problem is generally divided into two categories. The first category is exact graph matching, also known as one-to-one mapping, which requires strict correspondence between the two graphs with the same number of vertices. The second one is, on the other hand, {\it inexact} graph matching, also called many-to-many mapping or best graph matching. This process relaxes the requirement in exact graph matching to allow the merging of multiple vertices to match another one. Therefore, inexact graph matching is applicable to matching between graphs with different number of vertices.

Unpublished works (under review)

  • J. Huang, Z. Fang and H. Kasai, “LCS graph kernel based on Wasserstein distance in longest common subsequence metric space,” arXiv preprint 2012.03612, 2020 (under review).

  • J, Huang and H. Kasai, “Graph embedding using multi-layer adjacent point merging model,” arXiv preprint 2010.14773, 2020 (under review).

  • Z. Fang, J. Huang, and H. Kasai, “YYY,” 2020 (under review).