Sunday, April 9, 2017

Distance and persistence diagrams

We assume we have a Morse-type function $f:X\to \R$, whose associated persistence diagram is $D(f) = \{f_1,\dots,f_n\}$, which we will think of as a collection of persistence birth-death pairs $f_i$ in the extended real plane $(\R^*)^2$. If the topological space $X$ was filtered without such a function, define one by $x\mapsto i$ where $i$ is the smallest index such that $x\in X_i$.

Definition: Let $f,g:X\to \R$ be two Morse-type functions with associated persistence diagrams $D(f)$, $D(g)$. The (Wasserstein) $q$-distance between $f$ and $g$ is defined as
\[
W_q(f,g) := \inf_{\sigma\in S_n} \left(\sum_{i=1}^n ||f_i-g_{\sigma(i)}||^q_\infty\right)^{1/q}.\]The bottleneck distance between $f$ and $g$ is
\begin{align*}
W_\infty(f,g) & := \lim_{q\to\infty} \left\{W_q(f,g)\right\} & (\text{limit of $q$-distances}) \\
& = \max_i\left\{||f_i-g_{\sigma(i)}||_\infty\ :\ \sigma = \arg W_q(f,g)\right\}. & (\text{length of longest edge in best matching})
\end{align*}
Example: Consider the torus of inner and outer radius 1 embedded in the natural way. Left $f,g:T^2\to \R$ be height functions of the torus, but projecting to the planes $z=-2$ and $z=x-4$, respectively. Note all critical points occur on the plane $y=0$. Below, the slice at this plane is given (distances along planes from the first critical point are shown), as well as $D(f), D(g)$ on the same diagram (degrees of homology classes are shown).
For $D(f) = \{(0,\infty),(2,\infty),(4,\infty),(6,\infty)\}$ and $D(g) = \{(0,\infty), (2,\infty),(2\sqrt 2,\infty),(2+2\sqrt 2,\infty)\}$, it is clear that $\sigma=\id$ will be the best matching. The $q$-distance between $f$ and $g$ is then given by
\[
W_q(f,g) = \left(||(4,\infty)-(2\sqrt 2,\infty)||^q_\infty+||(6,\infty)-(2+2\sqrt 2,\infty)||^q_\infty\right)^{1/q} = 2^{1/q}(4-2\sqrt 2),\]with bottleneck distance $4-2\sqrt2$. However, we would like to say that these two functions are the same in some way, as no critical points are switched, and extended persistence allows us to do that. The decomposed extended persistence module is given below.
The extended persistence classes have length 3 ($(1,4)$ for the 0-class, $(4,1)$ for the 2-class) and 1 ($(2,3)$ and $(3,2)$ for the 1-classes), no matter if we use $f$ or $g$ to define the $X_i$ and $X^j$.

Remark: An interesting question to ask is how long does it take for an essential homology class to be built? Some things to keep in mind while resolving this question:
  • The 0-class case should be treated spearately because of reduced homology
  • A class may be encountered several times (like the first 1-class in the example above)
  • What does it mean for a class to be "begin being built" (this is probably the key)
  • A class is certainly "done being built" (the first time) when it first appears in the persistence module
It seems that the extended persistence pair gives the length between when the class is "done being built" the first time $f$ encounters it fully and when it "begins to be built" the last time $f$ encounters it.

The bottleneck distance satisfies a nice stability condition for tame functions $f:X\to \R$, which have finite dimensional homology groups $H_k(f^{-1}(-\infty,a])$ for all $a\in \R$.

Theorem (Cohen-Steiner, Edelsbrunner, Harer 2007): Let $f,g:X\to \R$ be tame. Then $W_\infty(f,g) \leqslant ||f-g||_\infty$.

This bound is reached when $g=f+c$ for some constant $c$, and the Wasserstein distance is 0 when $g(p_i)=f(p_i)$ for all critical values. Hence it seems without stronger assumptions about $f$ and $g$, this bound is as good as we can get.

References: Edelsbrunner and Morozov (Persistent homology: theory and practice), Cohen-Steiner, Edelsbrunner and Harer (Stability of persistence diagrams)