Processing math: 100%

Saturday, July 2, 2016

On the separation of nearest neighbors

We work through Lemma 3 (called the "AB Lemma" or the "cleaning procedure") of [2], adopting a cleaner and more thorough approach.

Necessary tools

Definition: The inverse of the complex-valued function f(z)=zez is called the Lambert W-function and denoted by W=f1. When restricted to the real numbers, it is multi-valued on part of its domain, so it is split up into two branches W0 (for positive values) and W1 (for negative values).

Hoeffding's inequality gives an upper bound on how much we should expect a sum of random variables to deviate from their combined mean. The authors of [2] use a similar inequality called the Chernoff bound, but Hoeffding gives a tighter bound on the desired event.

Proposition:
(Hoeffding - Theorem 2 and Equation (1.4) of [1])
Let X1,,Xn be independent random variables, with Xi bounded on the interval [ai,bi]. Then
P(|1nni=1Xi1nni=1E[Xi]|t)2exp(2t2n2ni=1(biai)2).

The union bound (or Boole's inequality) says that the probability of one of a collection of events happening is no larger than the sum of the probabilities of each of the events happening.

Proposition:
Let A1,A2, be a countable collection of events. Then P(iAi)iP(Ai).

The setup

Let P be a probability distribution P on Rn and X={x1,,xk}Rn a finite set of points drawn according to P. These points may be considered as random variables X1,X2,,Xk on the sample space Rn, with Xi evaluating to 1 only on xi, and 0 otherwise. Choose s>0 and construct the nearest neighbor graph G on X, with parameter s. Write X=AB and set
η:=infaA,bB{||ab||},αs:=infaA{P(Bn(s,a))},βs=supbB{P(Bn(s,b))},
with h=(αsβs)/2. We assume that
  • η>0, so A and B are disjoint;
  • s<η/2, so A and B are in separate components of G; and
  • αs>βs, so any point in A is more likely to be chosen than every point in B.
Proposition: Choose δ(0,1). If |X|>W1(δh2e2h2)/(2h2), then for all aA and bB, with probability 1δ,
degG(a)k1>αs+βs2anddegG(b)k1<αs+βs2.
The statement holds also for α,β instead of αs,βs, such that αsα>ββs, which may be useful to bound the degree of vertices in G.

The proof

For each i=1,,k, define new random variables Yij on the sample space X, with Yij evaluating to 1 on xj iff xjBn(s,xi), and evaluating to 0 otherwise. The mean of Yij is P(Bn(s,xi)). Since the Yij are independent with the same mean, Hoeffding's inequality gives that
(the probability that the sampled xjhave clustered around a point more thana distance h away from Bn(s,xi))=P(|1k1jiYijP(Bn(s,xi))|hevent Ai)2e2h2(k1).
The union bound gives that
(the probability that atleast one Ai occurs)=P(ki=1Ai)<ki=1P(Ai)2ke2h2(k1).
Note that jiYij=degG(xi) for every i, so whenever δ>2ke2h2(k1), with probability 1δ
|degG(xi)k1P(Bn(s,xi))|<horP(Bn(s,xi))h<degG(xi)k1<P(Bn(s,xi))+h.
When xiA (xiB) we have a lower (upper) bound of αs (βs) on P(Bn(s,xi)). Indeed:
degG(a)k1>αsh=αs+βs2anddegG(b)k1<βs+h=αs+βs2.
To find how many points we need to sample, we solve for k in the inequality δ>2ke2h2(k1). With the aid of a computer algebra system, we find that
k>12h2W1(δh2e2h2),
completing the proof.

References:
[1] Hoeffding (Probability inequalities for sums of bounded random variables)
[2] Niyogi, Smale, and Weinberger (A topological view of unsupervised learning from noisy data)

No comments:

Post a Comment