mat-blag: algorithm

Sunday, March 12, 2017

Optimal sampling and arrangement on an n-sphere

The goal of this post is to create a "good" algorithm for sampling and arranging points on the $n$-sphere. We find the $\epsilon$-covering number of the $n$-sphere and arrange the points in a Hamiltonian path of small pairwise consecutive distance. This post relates to several previous posts:

2017-02-12: Generalizing planar detection to $k$-plane detection
2016-12-22: Sampling points uniformly on parametrized manifolds
2016-05-26: Reconstructing a manifold from sample data, with noise

Thanks to Professor Cheng Ouyang for a helpful discussion.

Although rejection sampling is a standard method to sample points uniformly on the $n$-sphere (sample points uniformly on the $(n+1)$-cube, check if the norm is less than or equal to 1, if it is, normalize the point to the $n$-sphere), this is not best for our scenario (the arranging part). A better suited approach is to take a parametrization $f$ from an $n$-cube into $\R^{n+1}$ of the unit $n$-sphere. We use
\[
\begin{array}{r c l}
f\ :\ [0,2\pi]^{n-1}\times[0,\pi) & \to & \R^{n+1}, \\
(\alpha_1,\dots,\alpha_n) & \mapsto & \big(\cos(\alpha_1), \\ && \sin(\alpha_1)\cos(\alpha_2),\\ && \vdots\\ && \sin(\alpha_1)\cdots\sin(\alpha_{n-1})\cos(\alpha_n), \\ && \sin(\alpha_1)\cdots\sin(\alpha_{n-1})\sin(\alpha_n)\big).
\end{array}
\]
Adapting the main Proposition from the "Sampling points" post, we have following proposition.

Proposition: The probability density function $g_n:[0,2\pi]^{n-1}\times[0,\pi] \to \R_{\geqslant0}$, defined as
\[
g_n(\alpha_1,\dots,\alpha_n)=\frac{\prod_{k=1}^{n-1}|\sin^{n-k}(\alpha_k)|}{2^{n-1}\pi\prod_{k=1}^{n-1}\int_0^\pi \sin^{n-k}(\alpha_k)\ d\alpha_k},
\]
is uniform on the natural embedding of the unit $n$-sphere $S^n$ in $\R^{n+1}$.

The denominator of $g_n$ does not seem to have closed form, though the ratios between consecutive terms are given by the denominators of $\Gamma(\frac{\ell+3}2)/\Gamma(\frac{\ell+2}2)$ and $\ell!!/(\ell+1)!!$, with appropriate powers of $\pi$. The first few terms of this sequence are
\[
4\pi,4\pi^2,\frac{32}3\pi^2,8\pi^3,\frac{256}{15}\pi^3,\frac{32}3\pi^4,\frac{2048}{105}\pi^4,\dots.
\]
Next, recall the $n$-surface of an $n$-sphere and $k$-volume of a $k$-ball are
\[
\text{surf}(n,r) = \frac{2\pi^{(n+1)/2}r^n}{\Gamma((n+1)/2)},\hspace{2cm}
\text{vol}(k,r) = \frac{\pi^{k/2}r^k}{\Gamma((k+2)/2)}.
\]
Adapting Proposition 3.2 of Niyogi, Smale and Weinberger, similarly to the "Reconstructing a manifold" post, we have the following proposition.

Proposition: A collection of $N$ points sampled uniformly from $S^n$ is $\epsilon$-dense in $S^n$ with certainty $1-\delta$, given
\[
N \geqslant \frac{\text{surf}(n,1)}{(1-\frac{\epsilon^2}{16})^{n/2}\text{vol}(n,\frac\epsilon2)}\log\left(\frac{\text{surf}(n,1)}{\delta(1-\frac{\epsilon^2}{64})^{n/2}\text{vol}(n,\frac\epsilon4)}\right).
\] Bauer and Polthier sample points "evenly" on the 2-hemisphere and then connect them with a winding path, which winds around the hemisphere 6 times. Generalizing this approach, suppose we wanted to have a path that wind around the $n$-sphere $\ell$ times and has a small distance between consecutive vertices of the path. The following algorithm describes one way of doing this.

Algorithm: SpherePathFinder
Input: Positive integers $n,\ell$ and real numbers $\epsilon,\delta\in (0,1)$
Output: A path on $S^n$ that winds around $\ell$ times, whose vertices are $\epsilon$-dense on $S^n$ with certainty $1-\delta$

Sample $\lceil N\rceil$ points on $[0,2\pi]^{n-1}\times[0,\pi]$ according to $g_n$ in a set $X$
Initiate an empty path $P=()$
for $k_n\in\{1,\dots,\ell\}$:
    for $k_{n-1}\in\{1,\dots,2\ell\}$:
       $\vdots$
           for $k_2\in\{1,\dots,2\ell\}$:
               Set $L=\{\alpha\in X\ :\ \alpha_n\in[(k_n-1)\frac\pi\ell,k_n\frac\pi\ell], \alpha_{n-t}\in[(k_{n-t}-1)\frac{2\pi}{2\ell},k_{n-t}\frac{2\pi}{2\ell}],1<t<n-1\}$
               Order $L$ by increasing values of $\alpha_1$
               Append $L$ to the end of $P$ and set $X=X\setminus L$
Return $P$

Since the sample space is $[0,2\pi]^{n-1}\times[0,\pi]$, finding the appropriate points in the nested for loop is very easy. We conclude with an experimental example with $n=2$, $\ell=12$, $\epsilon=.1$, and $\delta=.01$. We must sample at least 87 points, and we do so below.

Example: To demonstrate the results of the SpherePathFinder algorithm, we sample 100, 300, and 600 points on the 2-sphere. Only the paths are shown, which wind around 12 times. The range of distances $d$ between consecutive ordered points is also given, with an average $\widetilde d$.

As $N$ increases and the winding number stays the same, the path gets more and more jagged. To make the path smoother, we would need to increase the number of times the path winds around the sphere.

References: Bauer and Polthier (Detection of Planar Regions in Volume Data for Topology Optimization), Niyogi, Smale, and Weinberger (Finding the homology of submanifolds with high confidence from random samples), Sloane (OEIS A036069, A004731), Wikipedia (article "N-sphere")

Sunday, February 12, 2017

Generalizing planar detection to k-plane detection

In this post the planar detection algorithm in $\R^3$ of Bauer and Polthier in Detection of Planar Regions in Volume Data for Topology Optimization is generalized to detect $k$-planes with largest density in $\R^n$. Let $\Omega\subset \R^n$ be the compact support of a piecewise-constant probability density function $\rho:\R^n\to \R_{\geqslant 0}$.

Definition: Let $(G,\rho)$ be a grid, where $G \subset \lambda \Z^n + c \subset \R^n$ is a lattice in $\Omega$. A cell $x$ of the grid is $B_\infty(x,\lambda/2) = \{y\in \R^n\ :\ ||x-y||_\infty\leqslant\lambda/2\}$, for $x\in G$. Every cell is assigned a value
\[
\int_{B_\infty(x,\lambda/2)}\rho\ dx,
\]
called the mass of the cell, which may be though of as a type of Radon transform of $\rho$.

Assuming that $k$ is a global variable, running $Recursive(G,w,k)$ will give the desired result. This algorithm is the naive generalization of Bauer and Polthier, and suffers from calculating mass along the same $k$-plane several times, whenever $k<n-1$ (as any $k$-plane does not lie in a unique $(k+1)$-plane).

Algorithm: $k$PlaneFinder
$Recursive(G,w,k)$:
Input: A grid $(G,\rho)$, a width $w$ of fattened $k$-planes, the current plane dimension $k\leqslant k'<n$
Output: A $k$-planar connected component covering most mass in $G$

discretize the unit $(k'-1)$-hemisphere in an appropriate manner
order the vertices by a Hamiltonian path
for each vertex $\textbf{n}$:
    sort the grid cells in direction $\textbf{n}$
    discretize the range in direction $\textbf{n}$ equidistantly
    for each $k'$-plane $(\textbf{n},d)$:
        collect the cells closer than $w$ to the $k'$-plane into a graph $G'$
        if $k'\neq k$:
            run $Recursive(G',w,k'-1)$
        else:
            compute the connected component having the most mass in $G'$
return the connected $k$-component having most mass (and the corresponding $k$-plane)

Measuring along connected components of a $k$-plane works the same way as in the original version, as the gird on $\R^n$ similarly induces a connectivity graph.

Remark: Bauer and Polthier cite Kantaforoush and Shahshahani in evenly sampling points on the unit 2-sphere, but it is not clear how their method (using the inscribed icosahedron) generalizes. Another method would be uniformly sampling random points on $S^{k-1}$ and take all on one hemisphere. A Hamiltonian path could then be taken from an arbitrary point and then using the greedy algorithm (with respect to Euclidean distance) to find consecutive vertices (to keep down the time of consecutive sorting operations).

Recall the Grassmannian $Gr(n,k)$ of all $k$-planes in $\R^n$ through the origin, a compact manifold of dimension $k(n-k)$. Note that any $k$-plane $P\subset \R^n$ is a translation of an element $Q\in Gr(n,k)$ by an element of $Q^\perp$ (we conflate notation for $Q$ and its natural embedding in $\R^n$).

Remark: $Gr(n,k)$ is parametrizable, so by choosing directions in the unit $(n-k)$-hemisphere, the process of choosing $k$-planes in the algorithm may be completely parametrized. The quick sorting of points that was available in Bauer and Polthier's $n=3,k=2$ case may be replaced by an iterated restriction of the original data set through a complete flag $P\subset \cdots \subset \R^n$.

References: Bauer and Polthier (Detection of Planar Regions in Volume Data for Topology Optimization), Katanforoush and Shahshahani (Distributing points on the Sphere 1)