We work through Lemma 3 (called the "A−B Lemma" or the "cleaning procedure") of [2], adopting a cleaner and more thorough approach.
Necessary tools
Definition: The inverse of the complex-valued function f(z)=zez is called the Lambert W-function and denoted by W=f−1. When restricted to the real numbers, it is multi-valued on part of its domain, so it is split up into two branches W0 (for positive values) and W−1 (for negative values).
Hoeffding's inequality gives an upper bound on how much we should expect a sum of random variables to deviate from their combined mean. The authors of [2] use a similar inequality called the Chernoff bound, but Hoeffding gives a tighter bound on the desired event.
Proposition: (Hoeffding - Theorem 2 and Equation (1.4) of [1])
Let X1,…,Xn be independent random variables, with Xi bounded on the interval [ai,bi]. Then
P(|1nn∑i=1Xi−1nn∑i=1E[Xi]|⩾t)⩽2exp(−2t2n2∑ni=1(bi−ai)2).
The union bound (or Boole's inequality) says that the probability of one of a collection of events happening is no larger than the sum of the probabilities of each of the events happening.
Proposition: Let A1,A2,… be a countable collection of events. Then P(⋃iAi)⩽∑iP(Ai).
The setup
Let P be a probability distribution P on Rn and X={x1,…,xk}⊆Rn a finite set of points drawn according to P. These points may be considered as random variables X1,X2,…,Xk on the sample space Rn, with Xi evaluating to 1 only on xi, and 0 otherwise. Choose s>0 and construct the nearest neighbor graph G on X, with parameter s. Write X=A∪B and set
η:=infa∈A,b∈B{||a−b||},αs:=infa∈A{P(Bn(s,a))},βs=supb∈B{P(Bn(s,b))},
with h=(αs−βs)/2. We assume that
degG(a)k−1>αs+βs2anddegG(b)k−1<αs+βs2.
The statement holds also for α,β instead of αs,βs, such that αsα>ββs, which may be useful to bound the degree of vertices in G.
The proof
For each i=1,…,k, define new random variables Yij on the sample space X, with Yij evaluating to 1 on xj iff xj∈Bn(s,xi), and evaluating to 0 otherwise. The mean of Yij is P(Bn(s,xi)). Since the Yij are independent with the same mean, Hoeffding's inequality gives that
(the probability that the sampled xjhave clustered around a point more thana distance h away from Bn(s,xi))=P(|1k−1∑j≠iYij−P(Bn(s,xi))|h⏟event Ai)⩽2e−2h2(k−1).
The union bound gives that
(the probability that atleast one Ai occurs)=P(k⋃i=1Ai)<k∑i=1P(Ai)⩽2ke−2h2(k−1).
Note that ∑j≠iYij=degG(xi) for every i, so whenever δ>2ke−2h2(k−1), with probability 1−δ
|degG(xi)k−1−P(Bn(s,xi))|<horP(Bn(s,xi))−h<degG(xi)k−1<P(Bn(s,xi))+h.
When xi∈A (xi∈B) we have a lower (upper) bound of αs (βs) on P(Bn(s,xi)). Indeed:
degG(a)k−1>αs−h=αs+βs2anddegG(b)k−1<βs+h=αs+βs2.
To find how many points we need to sample, we solve for k in the inequality δ>2ke−2h2(k−1). With the aid of a computer algebra system, we find that
k>−12h2W−1(−δh2e−2h2),
completing the proof.
References:
[1] Hoeffding (Probability inequalities for sums of bounded random variables)
[2] Niyogi, Smale, and Weinberger (A topological view of unsupervised learning from noisy data)
Necessary tools
Definition: The inverse of the complex-valued function f(z)=zez is called the Lambert W-function and denoted by W=f−1. When restricted to the real numbers, it is multi-valued on part of its domain, so it is split up into two branches W0 (for positive values) and W−1 (for negative values).
Hoeffding's inequality gives an upper bound on how much we should expect a sum of random variables to deviate from their combined mean. The authors of [2] use a similar inequality called the Chernoff bound, but Hoeffding gives a tighter bound on the desired event.
Proposition: (Hoeffding - Theorem 2 and Equation (1.4) of [1])
Let X1,…,Xn be independent random variables, with Xi bounded on the interval [ai,bi]. Then
P(|1nn∑i=1Xi−1nn∑i=1E[Xi]|⩾t)⩽2exp(−2t2n2∑ni=1(bi−ai)2).
The union bound (or Boole's inequality) says that the probability of one of a collection of events happening is no larger than the sum of the probabilities of each of the events happening.
Proposition: Let A1,A2,… be a countable collection of events. Then P(⋃iAi)⩽∑iP(Ai).
The setup
Let P be a probability distribution P on Rn and X={x1,…,xk}⊆Rn a finite set of points drawn according to P. These points may be considered as random variables X1,X2,…,Xk on the sample space Rn, with Xi evaluating to 1 only on xi, and 0 otherwise. Choose s>0 and construct the nearest neighbor graph G on X, with parameter s. Write X=A∪B and set
η:=infa∈A,b∈B{||a−b||},αs:=infa∈A{P(Bn(s,a))},βs=supb∈B{P(Bn(s,b))},
with h=(αs−βs)/2. We assume that
- η>0, so A and B are disjoint;
- s<η/2, so A and B are in separate components of G; and
- αs>βs, so any point in A is more likely to be chosen than every point in B.
degG(a)k−1>αs+βs2anddegG(b)k−1<αs+βs2.
The statement holds also for α,β instead of αs,βs, such that αsα>ββs, which may be useful to bound the degree of vertices in G.
The proof
For each i=1,…,k, define new random variables Yij on the sample space X, with Yij evaluating to 1 on xj iff xj∈Bn(s,xi), and evaluating to 0 otherwise. The mean of Yij is P(Bn(s,xi)). Since the Yij are independent with the same mean, Hoeffding's inequality gives that
(the probability that the sampled xjhave clustered around a point more thana distance h away from Bn(s,xi))=P(|1k−1∑j≠iYij−P(Bn(s,xi))|h⏟event Ai)⩽2e−2h2(k−1).
The union bound gives that
(the probability that atleast one Ai occurs)=P(k⋃i=1Ai)<k∑i=1P(Ai)⩽2ke−2h2(k−1).
Note that ∑j≠iYij=degG(xi) for every i, so whenever δ>2ke−2h2(k−1), with probability 1−δ
|degG(xi)k−1−P(Bn(s,xi))|<horP(Bn(s,xi))−h<degG(xi)k−1<P(Bn(s,xi))+h.
When xi∈A (xi∈B) we have a lower (upper) bound of αs (βs) on P(Bn(s,xi)). Indeed:
degG(a)k−1>αs−h=αs+βs2anddegG(b)k−1<βs+h=αs+βs2.
To find how many points we need to sample, we solve for k in the inequality δ>2ke−2h2(k−1). With the aid of a computer algebra system, we find that
k>−12h2W−1(−δh2e−2h2),
completing the proof.
References:
[1] Hoeffding (Probability inequalities for sums of bounded random variables)
[2] Niyogi, Smale, and Weinberger (A topological view of unsupervised learning from noisy data)
No comments:
Post a Comment