We follow the article [3] and add more background and clarifications. Some assumptions are made that are not explicitly mentioned in the article, to make calculations easier.
Background in probability, measure theory, topology
Let X be a random variable over a space A. Recall that the expression P(X) is a number in [0,1] describing the probability of the event X happening. This is called a probability distribution. Here we will consider continuous random variables, so P(X=x)=0 for any single element x∈A.
Definition: The probability density function of X is the function f:A→R satisfying
Definition: The probability density function of X is the function f:A→R satisfying
- f(x)⩾0 for all x∈A, and
- ∫Bf(x) dx=P(X∈B) for any B⊆A.
The second condition implies ∫Af(x) dx=1.
Often authors use just P instead of f, and write P(x) instead of P(X=x).
Definition: Let Y=g(X) be another random variable. The expected value of Y is
E[Y]=E[g(X)]=∫Ag(x)f(x) dx.
The mean of X is μ=E[X], and the variance of X is σ2=E[(X−μ)2]. If →X=(X1 ⋯ Xn)T is a multivariate random variable, then →μ=E[→X] is an n-vector, and the variance is an (n×n)-matrix given as
Σ=E[(→X−E[→X])(→X−E[→X])T]orΣij=E[(Xi−E[Xi])(Xj−E[Xj])].
The covariance of X and Y is E[(X−E[X])(Y−E[Y])]. Note that the covariance of X with itself is just the usual variance of X.
Often authors use just P instead of f, and write P(x) instead of P(X=x).
Definition: Let Y=g(X) be another random variable. The expected value of Y is
E[Y]=E[g(X)]=∫Ag(x)f(x) dx.
The mean of X is μ=E[X], and the variance of X is σ2=E[(X−μ)2]. If →X=(X1 ⋯ Xn)T is a multivariate random variable, then →μ=E[→X] is an n-vector, and the variance is an (n×n)-matrix given as
Σ=E[(→X−E[→X])(→X−E[→X])T]orΣij=E[(Xi−E[Xi])(Xj−E[Xj])].
The covariance of X and Y is E[(X−E[X])(Y−E[Y])]. Note that the covariance of X with itself is just the usual variance of X.
Example: One example of a probability distribution is the normal (or Gaussian) distribution, and we say a random variable with the normal distribution is normally distributed. If a random variable X is normally distributed with mean μ and variance σ2, then the probability density function of X is
f(x)=exp(−(x−μ)22σ2)σ√2π.
If →X=(X1 ⋯ Xn)T is a normally distributed multivariate random variable, then →μ=(E[X1] ⋯ E[Xn])T and the probability density function of →X is
f(→x)=exp(−12(→x−→μ)TΣ−1(→x−→μ))√(2π)ndet(Σ).
Definition: A measure on RD is a function m:{subsets of RD}→[0,∞] such that m(∅)=0 and m(⋃i∈IEi)=∑i∈Im(Ei) for {Ei}i∈I a countable sequence of disjoint subsets of RD. A probability measure on RD is a measure m on RD with the added condition that m(RD)=1.
A probability distribution is an example of a probability measure.
Definition: Let U={Ui}i∈I be a covering of a topological space M. The nerve of the covering U is a set N of subsets of I given by
N={J⊂I : ⋂j∈JUj≠∅}.
Note that this makes N into an abstract simplicial complex, as J∈N implies J′∈N for all J′⊆J.
Let M be a smooth compact submanifold of Rd. By the tubular neighborhood theorem (see Theorem 2.11.4 in [3]), every smooth compact submanifold M of Rd has a tubular neighborhood for some ϵ>0.
Definition: For a particular embedding of M, let the condition number of M be τ=sup{ϵ : M has an ϵ−tubular neighborhood}.
Distributions on a manifold
Let M be a d-dimensional manifold embedded in RD, with D>d. Recall that every element in NM⊆RD, the normal bundle of M, may be represented as a pair (→x,→y), where →x∈M and →y∈T⊥ (since M is a manifold, all the normal spaces are isomorphic). Hence we may consider a probability distribution P on NM, with →X the d-multivariate random variable representing points on M and →Y the (D−d)-multivariate random variable representing points on the space normal to M at a point on M. We make the assumption that →X and →Y are independent, or that
P(→X,→Y)=PM(→X)PT⊥(→Y).
That is, PT⊥ is a probability distribution that is the same at any point on the manifold.
Definition: Let P be a probability distribution on NM and fM the probability density function of PM. In the context described above, P satisfies the strong variance condition if
P(→X,→Y)=PM(→X)PT⊥(→Y).
That is, PT⊥ is a probability distribution that is the same at any point on the manifold.
Definition: Let P be a probability distribution on NM and fM the probability density function of PM. In the context described above, P satisfies the strong variance condition if
- there exist a,b>0 such that fM(→x)∈[a,b] for all →x∈M, and
- PT⊥(→Y) is normally distributed with →μ=0 and Σ=σ2I.
The second condition implies that the covariance of Yi with Yj is trivial iff i≠j, and that the vairance of all the Yis is the same. From the normally distributed multivariate example above, this also tells us that the probability density function f⊥ of →Y is
f⊥(→y)=exp(−σ22D−d∑i=1y2i)σD−d√(2π)D−d.
Theorem: In the context described above, let P be a probability distribution on NM satisfying the strong variance condition, and let δ>0. If there is c>1 such that
σ<cτ(√9−√8)9√8(D−d),
then there is an algorithm that computes the homology of M from a random sample of n points, with probability 1−δ. The number n depends on τ,δ,c,d,D, and the diameter of M.
f⊥(→y)=exp(−σ22D−d∑i=1y2i)σD−d√(2π)D−d.
Theorem: In the context described above, let P be a probability distribution on NM satisfying the strong variance condition, and let δ>0. If there is c>1 such that
σ<cτ(√9−√8)9√8(D−d),
then there is an algorithm that computes the homology of M from a random sample of n points, with probability 1−δ. The number n depends on τ,δ,c,d,D, and the diameter of M.
The homology computing algorithm
Below is a broad view of the algorithm described in sections 3, 4, and 5 of [1]. Let M be a d-manifold embedded in RD, and P a probability measure on NM satisfying the strong variance condition.
1. Calculate the following numbers:
τ=condition number of Mvol(M)=volume of Mσ2=variance of P
2. Define (or choose) the following numbers:
1. Calculate the following numbers:
τ=condition number of Mvol(M)=volume of Mσ2=variance of P
2. Define (or choose) the following numbers:
δ∈(0,1)r∈(2√2(D−d)σ,τ9(3−2√2))n>function(a,r,τ,d,δ,vol(M))(max(A,B) in Proposition 9 of [1])s=4rdeg>3a4(1−(r2τ)2)d/2vol(Bd(r,0))R=(9r+τ)/2
3. Choose n points randomly from NM according to P.
4. From these n points, construct the nearest neighbor graph G with distance s.
5. Remove from G all the vertices of degree <deg to get a refined graph G′.
6. Set U=⋃→x∈V(G′)BD(R,→x) and construct the simplicial complex K of its nerve.
7. Compute the homology of K, which is the homology of M, with probability 1−δ.
3. Choose n points randomly from NM according to P.
4. From these n points, construct the nearest neighbor graph G with distance s.
5. Remove from G all the vertices of degree <deg to get a refined graph G′.
6. Set U=⋃→x∈V(G′)BD(R,→x) and construct the simplicial complex K of its nerve.
7. Compute the homology of K, which is the homology of M, with probability 1−δ.
References:
[1] Niyogi, Smale, and Weinberger (A topological view of unsupervised learning from noisy data)
[2] Folland (Real analysis, Chapter 10.1)
[3] Bredon (Topology and Geometry, Chapter 2.11)
[1] Niyogi, Smale, and Weinberger (A topological view of unsupervised learning from noisy data)
[2] Folland (Real analysis, Chapter 10.1)
[3] Bredon (Topology and Geometry, Chapter 2.11)
No comments:
Post a Comment