Asymptotic Expansion of Central Limit Theorem

1  Introduction

The impetus for revisiting the Central Limit Theorem (CLT) came from a study of synchrotron radiation. In an electron storage ring, depending on the kinetic energy, a single electron emits typically a few hundred photons per revolution. If one asks what is the distribution of the net energy loss per turn, then one is faced by needing to calculate the cumulative sum of many independent random events. Though the motivation was a specific problem, the solution technique is of general application and furnishes a way to compute multiple convolutions of a function with itself without performing any integrations. Essentially, the method is to build an asymptotic expansion about a gaussian kernel and relies upon the relations of cumulative moments to the CLT. The example of the ``uniform distribution'' was chosen to show the generality.

1.1  CLT problem

The CLT is discussed in References[1,3,2,4]. The CLT is concerned with the following problem. Let Ft be the probability density distribution of a random variable t. Consider now the cumulative sum Tn = åj = 1ntj, where j is the event index, and ask what is its distribution FTn for large n. FTn is the n-fold self-convolution of Ft, or symbolically FTn = Õj = 1nFt*, where * denotes convolution product. For small n, typically n = 2,3, the convolution can be performed numerically; but this quickly becomes daunting as n increases. Contrarily, in the asymptotic limit n®¥, the function FTn approaches gaussian form. It is the intention of this paper to ``bridge the gap'' between these two extremes and give an approximate method of computing multiple self-convolutions that completely avoids any integrations.

1.2  Strategy & Overview

We shall adopt the following strategy. First we shall find how the moments [`(Tnk)] evolve with n. (Let [`(t2)] = s2.) Next we shall construct a density function that has the same moments up to some order, and which approximates FTn. Essentially, this amounts to building a Tchebychev-Hermite expansion[2] about a gaussian core. In the following we lay the ground work and then derive the expansion; and then follow up (section 7) with some observations in the frequency domain relating to the issue ``how large does n have to be for the CLT to apply?'' But first an overview of the terrain.

2  Heuristic introduction to CLT

2.1  Characteristic function

It is convenient to introduce the characteristic function f(s) which is the Fourier transform of the probability distribution F(t).
f(s)
=
ó
õ
+¥

-¥ 
e+i s tF(t)dt
(1)
F(t)
=
1
2 p
ó
õ
+¥

-¥ 
e-i s tf(s)ds
(2)
Here i = [Ö(-1)]. Equation (1) gives the forward transform and (2) the inverse transform. We shall denote the operation of forward transform by FT and the inverse by FT-1.

If we expand the exp(i s t) term as a Taylor series, then it is clear that the characteristic function f(s) contains all the moments, [`(tk)], of F(t).

ft(s) = 1+i s
1!
_
t
 
 - s2
2!

t2
 
 -i s3
3!

t3
 
 + s4
4!

t4
 
 +...
(3)
The probability distribution of Tn is FTn the n-fold self-convolution of Ft. Now the transform of a convolution is the product of the individual transforms. Hence it follows that the Fourier transform of FTn is simply ft(s)n.

2.1.1  gaussian example

Let [`(t2)] = s2. Suppose, for example,
Ft(t) = exp[-t2/(2s2)] 1
s   æ
Ö

2p
 
and ft(s) = exp[-s2 s2/2]  .
(4)
Then it follows
FT{FTn} = [ft(s)]n = exp[-s2 ns2/2]  .
(5)
Forming the inverse transform with s,Tn as conjugate variables we find
FTn = 1
Ön
Ft æ
ç
è
t = Tn
Ön
ö
÷
ø
  .
(6)

2.2  Brief statement of CLT

The gaussian case is a particularly simple example of a general idea. The Central Limit Theorem states that for any function Ft for which a transform ft(s) exists,
FTn® exp é
ê
ë
-Tn2
2ns2
ù
ú
û
1
  æ
Ö

p2ns2
 
as n®¥  .
(7)
Hence for large enough n, the n-fold convolution of any reasonable Ft tends to a gaussian with r.m.s. width Ön times greater than that of FT1 = Ft. This immediately raises the central questions concerning the CLT.

  1. How large does n have to be for the CLT to apply?
  2. How accurately do you wish to represent the distribution of the cumulative sum?
  3. What moments of the distribution are important, and how faithfully should they be represented?

The second question is an indispensable complement to the first and means ``what relative fractional accuracy maintained over what range of the argument?'' Typically we shall consider a range of ±3 standard deviations because 99% of events lie within this range for the gaussian distribution. The third question is particularly profitable when only certain moments contribute to the evolution of some characteristic of a physical system. We shall delay a deeper discussion of the CLT until after introduction of further useful mathematical tools.

3  Moments

Moments play an important role in the CLT because of their relation to the characteristic function. Moreover, a probability density function can be reconstructed from its moments.

3.1  Normalized moments

Moments are essentially non-linear and so changing the length scale of the distribution will change the values of the moments. For example the uniform distribution equal to 1/2 over the interval [-1,+1] and zero elsewhere has moments [`(tn)] = 1/(n+1) for n even. But the uniform distribution equal to 1 over the interval [-1/2,+1/2] and zero elsewhere has moments [`(tn)] = (1/2)n/(n+1) for n even.

Let s2 º [`(t2)] be the variance, then s is the r.m.s. width about the mean; and for the gaussian distribution s is equal to one standard deviation. If in place of Ft(t) we deal with Fx(x) = sFt(t = xs) where x º t/s, then we shall have a set of moments which are length-scale invariant. Thus we define the normalized moments as:


xk
 
=

tk
 

sk
  , with
tk
 
= ó
õ
+¥

-¥ 
tk Ft(t) dt  .
(8)
It is the normalized moments, such as skewness [`(x3)] and curtosis [`(x4)], which really characterize the distribution.

3.2  Cumulative moments

Let us adopt the normalized moments, introduce Xn = Tn/s, and study the distribution of single events, Ft(t) = Fx(x = t/s)/s, and of cumulative events FTn(t) = FXn(x = t/s)/s, respectively. Consider now the characteristic functions of Fx and FXn expanded in Taylor series:
FT{ Fx} = ¥
å
k = 0 
aksk º A(s) and FT{ FXn} = ¥
å
k = 0 
ck(n)sk º C(s),
(9)
where the coefficients ck depend on n. From the previous discussion, we know that the coefficients of sk are directly proportional to the kth moments of the two distribution functions, that is

xk
 
= [i-k k! ]ak and
Xnk
 
= [i-k k!] ck(n)   .
(10)
Using these definitions, we can state how each of the moments of the distribution FXn evolves as n increases. We note that a0 = 1 because the integral ò-¥+¥Ftdt = ò-¥+¥Fxdx = 1. Note also that a1 = i[`t]/s. By a suitable choice of the origin of the coordinate t, it is always possible to make the first moment [`t] identically zero.

Now we have the relation

FT{ FXn} = [ FT{Fx}]n  , or C(s) = An(s)  .
(11)
To find the coefficients ck(n), we multiply out An and compare coefficients of powers of s. Hence the following correspondencies:

Xn0
 
= c0 = a0n = 1  ,
Xn1
 
= c1 = na1 = 0  ,
(12)
i2
2!

Xn2
 
= c2 = na2 = n i2
2!

x2
 
  , i3
3!

Xn3
 
= c3 = na3 = n i3
3!

x3
 
  .  
(13)
Comparing coefficients of i2, we have proven the additive property of variances under self-convolution: [`(Tn2)] = n[`(t2)]. Also we have shown the additive property of third moments [`(Tn3)] = n[`(t3)] provided that [`t] = 0. Further development is facilitated by the introduction of semi-invariants.

3.3  Exponential form and semi-invariants

Suppose that we can find a suitable function B so that A(s) º exp[B(s)] then
FT{FXn} = [A(s)]n = exp[n×B(s)]  .
(14)
Given that the moments [`(xl)] are computed, the task is as follows. Given a power series of order l, e.g. A(s) = a0+a1s +a2s2+...+alsl, find the power series B(s) = b0+b1s+b2s2+...+blsl such that A = eB up to order sl in A.

The coefficients bk are called semi-invariants[2]. The obvious method to find the general coefficient bk is to form the kth derivative of A and of B, then set s = 0 and compare the two expressions. This procedure has to be followed with k ascending from 0 to l and back-substitution of all previous solutions bj(a0,a1,a2,...) with 0 £ j < k to find closed form expressions. The first few terms are as follows:

a0 = 1, b0 = ln(a0) = 0, b1 = a1, b2 = a2-a12/2, b3 = a3-a1a2+a13/3  .
(15)
Now a1 can be made zero by a suitable choice of the origin of coordinate x. Hence the first few coefficients in B become: b0 = b1 = 0,
b2 = a2, b3 = a3, b4 = a4-a22/2, b5 = a5-a2a3, b6 = a6-a2a4+a23/3-a32/2
(16)
The coefficients in A are given by ak = [`(xk)]ik/k!. When normalized moments are employed, it automatically follows that [`(x2)] º 1, in which case which case a2 = -1/2 and so:
b2 = -1/2, b3 = a3, b4 = a4-1/8, b5 = a5+a3/2, b6 = a6+(a4-a32)/2-1/24  .
(17)

3.3.1  cumulative moments

We are now ready to give simpler expressions for the cumulative moments. Recall that C(s) = enB(s). We expand enB in a Taylor series and compare coefficients of powers of s with those in the series C. Hence the following correspondencies:

Xn0
 
= c0 = enb0 = 1
Xn1
 
= c1 = nb1 = 0
(18)
i2
2!

Xn2
 
= c2 = nb2 = n i2
2!

x2
 
i3
3!

Xn3
 
= c3 = nb3 = n i3
3!

x3
 
(19)
i4
4!

Xn4
 
= c4 = n[a4+a22(n-1)/2]
=
n[b4+nb22/2]
(20)
i5
5!

Xn5
 
= c5 = n[a5+a2a3(n-1)]
=
n[b5+nb2b3]
(21)
i6
6!

Xn6
 
= c6
=
n[a6+a32(n-1)/2+a2a4(n-1)+a23(n-1)(n-2)]
=
n[b6+n(b32/2+b2b4+nb23/6)]
(22)
Equation (22) is valid for n ³ 2. It is evident that the relations are more compactly formulated in terms of the coefficients bk rather than the al.

3.3.2  remainder form

Not only does the exponential form eB facilitate construction of powers of A, but it also gives a way to factor out the gaussian part of any characteristic function. We chose to split B(s) into a quadratic part and higher order remainder terms R(s). Thus we set
B(s) = -s2/2+R(s) where R(s) = åk = 3¥ bk sk. Hence A(s)/e-s2/2 = expR(s).

3.4  Reduced variable

The Fourier transform of the n-fold convolution of FXn is exp[n×B(s)]. Now the transform of a product is the convolution of the individual transforms, and so
FXn(x) º 1
  æ
Ö

2pn
 
e-x2/ 2n *G(x,n) where G(x,n) = FT-1{ en×R(s)}  .
(23)
Evidently, as n increases, so the length scale of FXn increases as Ön or faster. Thus it is inevitable that we should introduce the variable Zn = Xn/Ön = Tn/(sÖn) which has a length scale independent of n. From this definition it follows that [`(Xnk)] = [`(Znk)][Ö(nk)]. When considering FXn for various n, comparison is facilitated by graphing FXn versus Xn/Ön so that the range of the plot remains fixed at Zn = [-3,+3].

4  Central Limit Theorem

We are now prepared for a compact derivation and brief discussion of the CLT. Let us recall as follows:
FTn(Tn) = 1
s
FXn æ
ç
è
Xn = Tn
s
ö
÷
ø
  ,
(24)
FXn(Xn) = 1
Ön
FZn æ
ç
è
Zn = Xn
Ön
ö
÷
ø
  .
(25)
2pFXn(x) = ó
õ
+¥

-¥ 
ds e-isxenB(s) = ó
õ
+¥

-¥ 
ds e-is xe-ns2/2enb3 s3enb4 s4×...
(26)
We make the replacement sÞ s/Ön and find
2pFXn(x) = 1
Ön
ó
õ
+¥

-¥ 
dse-is(x/Ön)enB(s/Ön)  .
(27)
Comparing (25) and (27) we find that
2pFZn(z = x/Ön) = ó
õ
+¥

-¥ 
dse-iszenB(s/Ön) = ó
õ
+¥

-¥ 
ds e-is z e-s2/2eb3 s3/Öneb4 s4/n×...
(28)
Usually it is suggested that as n tends to infinity, so the higher order terms R(s/Ön) = åk = 3¥ (s/Ön)kbk tend to zero and enR® 1. This should make the reader uncomfortable, as the integration limits have s tending to ±¥. Indeed, modelling a function of infinite range by the first few terms of a Taylor expansion about the origin s = 0 should have already made the reader sceptical. However, the palliative to these worries also furnishes a simple method (section 7) to estimate how large n must be for the CLT to apply. For the moment, let us accept that it is so; in which case:
FTn(t)
=
1
sÖn
FZn æ
ç
è
z = t
sÖn
ö
÷
ø
(29)
2pFZn(z)
®
ó
õ
+¥

-¥ 
dse-is ze-s2/2 =   æ
Ö

2p
 
e-z2/2
(30)
FTn(t)
®
e-z2/2
s   æ
Ö

2pn
 
ê
ê
ê
ê
ê
ê



z = t/(sÖn) 
(31)

4.0.1  qualitative argument

Those familiar with Fourier transforms will recognize that the basic shape and characteristic length scale of FZn is set by the lowest frequencies present in its transform. The high frequency content of the transform is responsible for fine structure and localization. Localization is cutting off FZn to some finite region. Consequently, the important aspects of FZn come from near the origin of the transform space s/Ön. Ignoring the high frequency components, will have the effect of introducing longer (gaussian) tails into the approximation for FZn than are really present; but for practical applications the probabilities in these tails are so small that their effect is negligible. In the context of these ideas, it is evident that the gaussian is the transform-pair with the least possible structure in the domains of both conjugate coordinates.

4.1  Gaussian distributed moments

From the definitions of section 3, it should be clear that for small n the kth moment of FXn cannot be accurately predicted without knowing bk. However, for large values of n, the term in b2 dominates the moments:

Zn2
 
=
1
(32)
-i

Zn3
 

3!
=
b3
Ön
® 0
(33)

Zn4
 

4!
=
b22
2!
+ b4
n
® b22
2!
= æ
ç
è
-1
2
ö
÷
ø
2

 
1
2!
(34)
i

Zn5
 

5!
=
1
Ön
é
ê
ë
b2b3+ b5
n
ù
ú
û
® -1
2
b3
Ön
® 0
(35)
-

Zn6
 

6!
=
b23
3!
+ b2b4+b32/2
n
+ b6
n2
® b23
3!
= æ
ç
è
-1
2
ö
÷
ø
3

 
1
3!
(36)
-i

Zn7
 

7!
= 1
  _
Ö n
 
é
ê
ë
1
2
b22 b3 + b7
n2
+ 1
n
(b2b5+b3b4) ù
ú
û
® 1
2
1
Ön
æ
ç
è
-1
2
ö
÷
ø
2

 
b3 ® 0
(37)
It is evident that all even-order moments fall as 1/n and all odd-order moments diminish as 1/Ön. In fact, all of these normalized moments tend in the limit n®¥ to those of a gaussian distribution whose characteristic function is simply e-s2/2. Let us show this in detail for a general even-order moment.
i2k

Zn2k
 

(2k)!
® (b2)k
k!
(38)
which is equal to the coefficient of s2k in Taylor expansion of eb2 s2. Substituting for b2 = -1/2 we find

Zn2k
 
= æ
ç
è
1
2
ö
÷
ø
k

 
(2k)!
k!
= (2k-1)!!
(39)
where n!! = n×(n-2)×(n-4)×.... But this is just the 2kth moment of the gaussian distribution
1
  æ
Ö

2p
 
exp é
ê
ë
-Zn2
2
ù
ú
û
  .
(40)

Hence a possible statement of the CLT is that the probability density function F[Tn/(sÖn)] tends in the limit n®¥ to a distribution whose moments are identical with those of a gaussian Equations (7) or (31).

Another way to use these ideas is to estimate how large must n be that the gaussian distribution gives some moment to some desired absolute accuracy. For example, for the gaussian (which has zero skewness) to give the skewness [`(Xn3)] to an accuracy of 1% requires Ön ³ 100×|b3|3! = 100[`(x3)]. Likewise, for the gaussian approximation of the true function F(Xn) to give the curtosis [`(X4n)] to a relative fractional accuracy of 1% requires n ³ 100 |b4|2!/b22 = 100|[`(x4)]/3-1|.

4.2  Gaussian distribution

As a simple test of the method of semi-invariants, suppose Ft(t) is gaussian distributed with standard deviation s and zero mean and that we wish to find FTn(Tn) the distribution of the sum Tn = åi = 1nti. We adopt the dimensionless variables x = t/s and Xn = åi = 1nxi. Then Ft(t) = Fx(x)/s where Fx(x) = e-x2/2/Ö{2p}. The odd-order moments of Fx are zero. The even order moments are found by taking derivatives of Fx(xÖa) with respect to a, and then setting a = 1. The result is

x2k
 
= (-2)k dk
dak
1
Öa
ê
ê
ê


a = 1 
= (2k-1)!!  .
(41)
We find [`(x2)] = 1, [`(x4)] = 3, [`(x6)] = 15 and so on. We Taylor expand the characteristic function as FT{Fx} = A(s) and find a0 = 1, a2 = -1/2, a4 = 1/8, a6 = -1/48, and so on. We represent A(s) º eB(s) and find that b2 = -1/2 and all other bk are identically zero. Let Zn = Xn/Ön, then the probability distribution of the sum is exactly
FZn(z) = FT-1{ e-s2/2 } = e-z2/   æ
Ö

2p
 
  .
(42)
Now Zn = Tn/(sÖn) and so FTn(Tn) º FZn(z = Tn/s)/(sÖn).

5  Asymptotic expansions

The basic idea of the ``Central Limit Theorem" is that the first term of the characteristic function written in exponential form is quadratic in s followed by correction terms that shrink to zero as n tends to infinity. Accepting the ``central limit'' is tantamount to setting all semi-invariants to zero except b2 and accepting that the moments of the density function have become gaussian distributed.

5.1  Importance of higher order semi-invariants

Earlier, we introduced the remainder function R(s) = b3s3+b4s4+.... From our discussion of moments, it is now plain that each term we retain bk allows us to give accurate information about the evolution of [`(Tnk)] for low values of n. Hence if we retain the terms b3 and/or b4 in R(s) we shall obtain an approximation for FTn which correctly and exactly models the evolution of the skewness and/or curtosis for all values of the event summation limit n = 1,2,3,...¥. Contrarily, in a situation in which only knowledge of the evolution of the variance [`(Tn2)] is required, we may ignore all the terms in R(s) and use the pure gaussian form for the characteristic function enb2 s2; the inverse Fourier transform will then give a probability density that exactly models the correct evolution of the variance for all n = 1,2,3,...¥. This issue of the paramount importance of second moments is discussed by Jowett[5] in connection with Campbell's theorem.

5.1.1  convolution

Let G(z,n) = FT-1{ en×R(s/Ön)}  .
(43)
From the convolution integral (23) it follows that
FZn(z) = 1
  æ
Ö

2p
 
e-z2/2H(z,n) where H(z,n) = ó
õ
+¥

-¥ 
e(-u2/2+uz)G(u,n) du  .
(44)
In following articles we will further consider the CLT by examining the expansion H(z,n) in the space of Zn. Regarding the important questions about the CLT, evidently our task is to find for what n does H(z,n) become approximately constant over the range Zn = [-3,+3]. It will prove profitable to graph H versus Xn/Ön so that the range of the plot remains fixed at Zn = [-3,+3].

When G (or an approximation to it) is known analytically, one can quite profitably use the convolution form (44); and this is used in sections 7.1.2 and 7.2.2. A variant of this idea, which can be numerically more efficient, is to represent H by its truncated Taylor expansion:

H(z,n) » [^k]
å
k = 0 
zk
k!
hk where hk = ó
õ
+¥

-¥ 
e-u2/2ukG(u,n)du  ,
(45)
within some domain of convergence z < [^z] depending on the summation limit [^k]. Provided that the terms hk grow no more quickly than a geometric series, then this expansion will converge in the same manner as a Poisson distribution. Further, one can make a rough estimate of how many terms are required to achieve a desired accuracy. If we assume the hk are roughly constant, then to achieve an accuracy of 1% at Zn = ±3 requires 3k/k! < 1/100 with solution [^k] ³ 11.

5.2  The Tchebychev-Hermite expansion

In light of the arguments above, it is clear that finding an approximation to FZn that correctly reproduces not only the zeroth, first and second moments, but also the third, fourth and higher moments will be a considerable improvement over the gaussian asymptotic form for small n. Hence we propose to construct such a function based on our knowledge of the evolution of the moments. Let us recall equation (28) and expand the term exp[nR(s/Ön)] in a Taylor series:
2pFZn(z) = ó
õ
+¥

-¥ 
dse-iszenB(s/Ön) = ó
õ
+¥

-¥ 
ds e-is z e-s2/2exp ì
ï
ï
í
ï
ï
î
n ¥
å
k = 3 
sk
  __
Önk
 
bk ü
ï
ï
ý
ï
ï
þ
(46)
= ó
õ
+¥

-¥ 
e-isze-s2/2 é
ê
ë
1+ æ
ç
è
b3 s3
Ön
+ b4 s4
n
+... ö
÷
ø
+ 1
2!
æ
ç
è
b3 s3
Ön
+ b4 s4
n
+... ö
÷
ø
2

 
+... ù
ú
û
ds   .
(47)
Now there is the general theorem
2p æ
ç
è
i d
dz
ö
÷
ø
k

 
F(t) = ó
õ
+¥

-¥ 
e-izs[skf(s)]ds where f(s) = FT[F(t)]  .
(48)
Hence it follows that
FZn(z) = é
ê
ê
ê
ê
ë
1+ b3
Ön
æ
ç
è
i d
dz
ö
÷
ø
3

 
+ b4
n
æ
ç
è
i d
dz
ö
÷
ø
4

 
+ b5
  __
Ön3
 
æ
ç
è
i d
dz
ö
÷
ø
5

 
+ æ
ç
è
b32
n
+ b6
n2
ö
÷
ø
æ
ç
è
i d
dz
ö
÷
ø
6

 
+... ù
ú
ú
ú
ú
û
e-z2/2
  æ
Ö

2p
 
(49)
The derivatives of order k generates the Tchebychev-Hermite[2] polynomial of order k. Let FZn be composed as in Equation (44) with
H
=
1+ izb3
1! Ön
(3-z2)
(50)
+
1
2! n
[2(3 - 6z2 + z4)b4 - (-15 + 45z2 - 15z4 + z6)b32 ]
+
-iz
3!   __
Ön3
 
[(945 - 1260z2 + 378z4 - 36z6 + z8)b33 - 6(-105 + 105z2 - 21z4 + z6) b3b4
+
6(15 - 10z2 + z4)b5]
+
1
n2
é
ê
ë
z12 b34
4!
+ z10
4
(-11b34 - 2b32b4)+...+ 15
8
(231 b34 + 252 b32 b4 + 28 b42 + 56 b3 b5 + 8 b6) ù
ú
û
+...
The constant terms appearing in this expression guarantee that the integral of this probability density function is still identically equal to unity despite the correction terms that have been added to the gaussian. Equivalently we may write in terms of the normalized moments [`(xk)] = [`(tk)]/sk.
H
=
1+ z
Ön3!
(z2-3)
x3
 
+
1
n 2!(3!)2
[z6 (
x3
 
)2 - 3(9 + 5(
x3
 
)2 - 3
x4
 
) + 9 z2(6 + 5(
x3
 
)2 - 2
x4
 
) - 3z4(3 + 5(
x3
 
)2 -
x4
 
)]
+
1
  __
Ön3
 
é
ê
ê
ê
ê
ë
z9 (
x3
 
)3

3!(3!)2
+
z7
x3
 

4(3!)2
(-3 - 4 (
x3
 
)2 +
x4
 
)+...+ z
48
(35 (
x3
 
)3 +
x3
 
(45 - 35
x4
 
) +6
x5
 
) ù
ú
ú
ú
ú
û
+
1
n2
é
ê
ê
ê
ê
ë
z12(
x3
 
)4

4!(3!)4
- z10
4(3!)4
(
x3
 
)2(9 + 11(
x3
 
)2 - 3
x4
 
)+... ù
ú
ú
ú
ú
û
+...
(51)

5.3  Truncation of H and domain of convergence

Unfortunately, it is clear from the expression, that ranking the terms in powers of 1/Ön is less than ``half of the story''. High powers of (1/n) are also associated with high powers of z. Consequently, truncating the expansion at too low an order of 1/Ön could result in serious errors. For a given order of truncation of H in powers of 1/Ön, there will be a domain [^z] for which H converges. At the domain boundary, H will usually be dominated by terms like
1
k!
æ
ç
ç
ç
ç
è
z3
x3
 

3!Ön
ö
÷
÷
÷
÷
ø
k


 
(52)
which are distributed like a Poisson distribution. Evidently, for given n we have to retain enough terms in the asymptotic expansion that the last few terms are progressively very small compared to unity.

5.3.1  sufficiently large n

The CLT applies when for given [^z], the term in 1/Ön dominates over all other powers of 1/n and the same 1/Ön term itself becomes small compared with unity. We substitute k = 1 leading to the condition
( ^
z
 
)3
x3
 

3!Ön
£ e where e << 1  .
(53)
Conventionally, we substitute [^z] = 3. As an example, suppose we wish to know for what n does H become less than e = 1% for the single photon emission spectrum for which [`(x3)] = 3.662. Substituting we obtain the the estimate Ön ³ 16.48/e or n » 2.7×106.

5.4  Symmetric distribution

In the case of a symmetric distribution, the odd order moments are all zero; and this results in much simplification.
H
=
1+ 1
n
(3 - 6z2 + z4)b4
+
1
n22!
[(105 - 420z2 + 210z4 - 28z6 + z8)b42-2(-15 + 45z2 - 15z4 + z6)b6]
+
1
n3
é
ê
ë
z12 b43
3!
-z10 b4(11 b42 + b6)+... + -105(99 b43 + 45 b4 b6 + 4 b8) ù
ú
û
+...
(54)
Alternatively, in terms of the normalized moments moments we find
H
=
1+ 1
n
(
x4
 
-3)

1!(4!)
(3 - 6z2 + z4)
+
1
n2
é
ê
ê
ê
ê
ë
z8
(
x4
 
-3)2

2!(4!)2
+ z6
1440
( -255+180
x4
 
-35(
x4
 
)2+2
x6
 
)+...
+
1
384
(75 - 90
x4
 
+ 35(
x4
 
)2 - 8
x6
 
) ù
ú
û
+...
(55)

5.4.1  sufficiently large n

Following the previous argument, at the boundary of convergence [^z] of a truncated expansion of H, the expansion will usually be dominated by terms of the form
1
k!
æ
ç
ç
ç
ç
è
z4(
x4
 
-3)

n 4!
ö
÷
÷
÷
÷
ø
k


 
  .
(56)
Hence, by substituting k = 1 it follows that the CLT will apply when
( ^
z
 
)4|
x4
 
-3|

4! n
£ e where e << 1  .
(57)
As an example, take the ``top hat'' or uniform function for which [`(x4)] = 9/5 and evaluate for [^z] = 3; leading to n ³ 4.05/e, say n » 405 for e = 1% accuracy.

The conditions (53) and (57) are much more easily derived (section 7) in the transform space s by direct consideration of the characteristic function.

5.5  Convolution

The asymptotic expansion method given above allows to form an approximation to an n-fold self-convolution without performing any integrations provided that one knows the moments [`t]k. Using the computer program Mathematica we have created an asymptotic expansion up to order 1/Ö{n13} and up to order z39 with the 12 moments [`(x3)],  [`(x4)],  [`(x5)],...,[`(x13)],  [`(x14)],  [`(x15)] as free variables. Though the symbolic expressions are very large, as soon as n is chosen and the moments are substituted, the expressions collapse to a simple Taylor series in powers of z. Fortran-77 subroutines have been written to evaluate the expansion and generate coefficients for the Taylor series.

For a given domain [^z], there is a minimum n for which the truncated Taylor series converges, and its value depends on the form of Ft, the single event probability density. For example, for the top hat function n ³ 5 whereas for the single photon spectrum n ³ 270 when [^z] = 3.

5.6  Distribution function

Suppose random values t are distributed with a frequency Ft, the probability density. We introduce the distribution function
Dt(t) = ó
õ
t

-¥ 
Ft(u)du  .
(58)
Clearly Dt(-¥) = 0 and Dt(+¥) = 1. If we generate random numbers q uniformly in the interval q = [0,1] then t = D-1(q) will be distributed as Ft. Here D-1 denotes the inverse function. FZn as given by Equation (49) is an exact differential and so DZn can be computed almost trivially. Let us write
D(z) = F(z)+E(z)e-z2/2/   æ
Ö

2p
 
where F(z) = [1+Erf(z/Ö2)]/2
(59)
and Erf is the standard error function, and
E
=
1
1!Ön
ib3(1-z2)
(60)
+
z
2!n
[(15 - 10z2 + z4)b32 - 2(t2-3)b4]
+
i
3!   __
Ön3
 
[(105 - 420z2 + 210z4 - 28z6 + z8) b33 - 6 (-15 + 45z2 - 15z4 + z6)b3b4
+
6(3 - 6z2 + z4)b5]+...
(61)
Alternatively, we may write in terms of the normalized moments.
E
=
1
1!Ön

x3
 

3!
(1-z2)
+
1
2!(3!)2n
[-(
x3
 
)2 - 3z(9 + 5(
x3
 
)2 - 3
x4
 
) + z3(9 + 10(
x3
 
)2 - 3
x4
 
)] +...
(62)

Using the computer program Mathematica we have created an expansion of E up to order 1/Ö{n13} and up to order z38 with the 12 moments as free variables. Once n and the moments are substituted, the expansion collapses to a Taylor series in powers of z. Fortran-77 subroutines have been written to evaluate the expansion and generate coefficients for the Taylor series. In addition, we have written a further subroutine to evaluate the inverse D-1. Typically the correction term E(z)e-z2/2/Ö{2p} is much smaller than F(z). Hence we can adopt a two step strategy. In the first step we invert a good approximation to Erf. In the second step we use Newton-Raphson iteration to solve Dt(z) = q for z using an exact form for Erf. A useful and highly accurate approximation is

Erf(z) »   æ
Ö

1-exp(-4z2/p)
 
  .
(63)

5.7  Acknowledgement

Though we have discovered all of the above results concerning semi-invariants and asymptotic expansions for ourself, we later learned[2] that many of them had been previously derived by Tchebytchev in the late 1880s.

6  Examples

6.1  Uniform distribution

Consider the uniform or ``top hat'' Ft(t) distributed as follows:
Ft = ì
í
î
1/a
if |t| < a/2
0
otherwise
(64)
Suppose that we wish to find FTn the distribution of the sum Tn = åi = 1n ti. The variance of the distribution Ft is [`(t2)] = a2/12. We make a transform to coordinates wherein the moments are normalized by the variance: x = 2tÖ3/a. Then Ft(t) = Fx(x)×(2Ö3/a) where
Fx = ì
í
î
1/(2Ö3)
if |x| < Ö3
0
otherwise.
(65)

6.1.1  moments & semi-invariants

The odd order moments are zero. The even order moments are given by [`(xk)] = (Ö3)k/(k+1). For example, [`(x2)] = 1, [`(x4)] = 9/5, [`(x6)] = 27/7, [`(x8)] = 9, etc.. We Taylor expand the characteristic function as FT{Fx} = A(s) and find a0 = 1, a2 = -1/2, a4 = 3/40, a6 = -3/560, etc.. We represent A(s) º eB(s) and find that b2 = -1/2, b4 = -1/20, b6 = -1/105, etc., and that all odd order coefficients are zero. Note that b4 is negative even though the curtosis [`(x4)] is positive.

6.1.2  distribution of cumulative sum

Under the operation of convolution, the uniform distribution converges to gaussian form remarkably quickly (except the infinite tails are truncated) and FZn is already bell-shaped for n = 3. The asymptotic expansion for H(z,n) truncated to order 1/Ö{n13} converges for [^z] = 3 converges for n ³ 5. The expansion H(z,5), Figure 1, may be compared with an approximate (but probably more accurate) form, Figure 2, obtained by numerically performing the inverse Fourier transform of [ft(s)]5; the agreement is excellent.

The Figure 3 below shows shows the result of inverting DZn for n = 5 and 105 random trials.

square-05b.gif
Figure 1: H(z,5) constructed by asymptotic expansion versus Zn

square-05-mma.gif
Figure 2: Exact H(z,5) by numerical integration (of FT-1) versus Zn

In section 7.1.2, an approximation to H based on retaining the curtosis is given which is claimed to be valid for n ³ 20. We invite the reader to study Figures 4, 5 and 14; the agreement between them is excellent.

When the asymptotic expansion of H(z,n) is truncated at 1/Ö{n13} and n = 20, the domain of convergence is [^z] » 4.3. Consequently, we can graph and compare the expansion with a numerical evaluation out to Zn = 4, as below Figures 6 and 7.

square05.gif
Figure 3: Histogram of FZn for n = 5 based on 105 trials.

square-20b.gif
Figure 4: H(z,20) by asymptotic expansion versus Zn

square-20-mma.gif
Figure 5: Exact H(z,20) by numerical integration versus Zn

square-20-4.gif
Figure 6: H(z,20) constructed by asymptotic expansion versus Zn

square-20-4-mma.gif
Figure 7: Exact H(z,20) by numerical integration (of FT-1) versus Zn

6.2  Single photon distribution

The probability of emitting a single photon of energy u = t×uc is given by Fu(u) = Ft(t)/uc where
Ft(t) = ó
õ
¥

t 
K5/3(u)du/I0 and I0 = G(11/6)G(1/6)  .
(66)
Here K5/3 is the modified Bessel function of the second kind and G is the gamma function. Ft is sketched below, Figure 8.

bizare.gif
Figure 8: single photon emission spectrum Ft(t).

6.2.1  moments & semi-invariants

Using result (6.561.16) from Reference[6], the moments of Ft are given as follows:
I0
tk
 
= ó
õ
¥

0 
tk[ ó
õ
¥

t 
K5/3(z)dz]dt = ó
õ
¥

0 
tk+1
(k+1)
K5/3(t) dt º Ik
(67)
Ik º 2k
(k+1)
G æ
ç
è
1+ k+5/3
2
ö
÷
ø
G æ
ç
è
1+ k-5/3
2
ö
÷
ø
  .
(68)
To employ the CLT, the moments must be expressed with respect to the mean value of t. Hence we write t = [`t]+Dt, and the moments become:

(Dt)k
 
= ó
õ
¥

0 
(t- _
t
 
)k Ft(t)dt  .
(69)
For example, [`((Dt)2)] = I2/I0-(I1/I0)2 » 0.312593 . We define s2 º [`((Dt)2)] as the variance. Now, the moments must be normalized to give length scale invariance. We let x = Dt/s and PX(t) = Px(x)/s. The first few moments of Px are as follows:

x0
 
= 1,
x1
 
= 0,
x2
 
= 1,
x3
 
» 3.66205,
x4
 
» 23.0978,
x5
 
» 183.345,
x6
 
» 1786.43
(70)
Given that Pu(u) is a monotonically decreasing function, it may come as a surprise that the skewness [`(x3)] is positive. However, one must realize that whereas the range below [`u] extends only to zero, the range above [`u] extends to infinity. Hence the skewness will be dominated by the cubes of large positive numbers.

Now we expand the characteristic function as a Taylor series: FT{Px} = A(s). We find:

a2 = -1/2, a3 » -i×0.610341, a4 » 0.962408, a5 » i×1.52787, a6 » -2.48115, etc..
We represent A(s) º eB(s) and find that b0 = b1 = 0,
b2 = -1/2, b3 » -i×0.610341, b4 » +0.837408, b5 » +i×1.2227, b6 » -1.85535, etc..

6.2.2  distribution of cumulative sum

The asymptotic expansion for H(z,n) truncated to order 1/Ö{n13} converges for [^z] = 3 converges for n ³ 270. The expansion H(z,n) is given in Figure 9.

photon-270b.gif
Figure 9: H(z,270) versus Zn

photon-6000b.gif
Figure 10: H(z,6000) versus Zn

The Figure 11 below shows shows the result of inverting DZn for n = 270 and 105 random trials; the departure from a gaussian distribution is obvious.

photon-270.gif
Figure 11: Histogram of FZn for n=20 based on 105 trials.

In section 7.2.2, an approximation to H based on retaining the skewness is given which is claimed to be valid for n ³ 6000. We invite the reader to compare Figures 10 and 20; the agreement between them is excellent.

7  How large `n' for the CLT to apply

Let us examine again the question ``How large does n have to be for the CLT to apply?'' and answer in the Fourier transform space. We recall the definition (28). n must be chosen large enough that over the dominant range of the gaussian exp(-s2/2), the higher order terms exp[nR(s/Ön)] are negligible. The dominant range of the gaussian is ± three standard deviations, or s = ±3. Hence we find the conditions
n|bk| æ
ç
è
3
Ön
ö
÷
ø
k

 
<< 1   , k = 3,4,5,...
(71)
Provided that the series bk is not more divergent than a geometric series,1 then the conditions are distributed roughly like a Poisson distribution and by choosing n large enough the first non-zero term bk > 2 can be made to dominate over all other terms. Further for sufficiently large n this dominant term can be made to approach zero. If for the dominant k, the left side of inequality (71) is made less than 1/100 then enR(s/Ön will differ by less than 1% over the range s = ±3. In this case the true value of FZn, over the range Zn = [-3,+3], will only differ by up to 1% from its gaussian approximation. Notice that here we refer to the maximum modulus of the relative fractional error.

Because the range of the gaussian exp(-s2/2) is intimately related to its transform partner exp(-z2/2) so it follows that, for n large enough to satisfy the above conditions, the true FZn will be well approximated (i.e. to 1% or better) by a gaussian over the Zn-range ± three standard deviations, or the Tn-range of ±3Öns. If an approximation of FZn is required that is accurate over a smaller range of Zn, then one can relax the 3 standard deviations requirement appearing in equations (71); and the ``sufficiently large value of n" will decrease.

Let us denote by Nd the n for which a single term dominates nR(s/Ön) for |s| < 3. Further, let us call the value of n for which this term becomes negligible N¥. Once a desired accuracy for FXn and the range over which that accuracy must be maintained have been chosen, then the sufficiently large n for the CLT to be applicable, N¥, can be computed solely from a knowledge of the semi-invariants bk. The above arguments are not a proof. However, condition (71) is identical with that obtained in the Zn-space and, further, the two detailed numerical examples we shall present confirm all of these expectations. These examples are based on the notion of analytically performing the inverse Fourier transform of the remainder enR, but truncating R(s) at either b3s3 or b4s4. If we retain b3s3 then we reliably model the evolution of skewness; and if we keep b4s4, then we accurately model the curtosis. In fact, because of the exponentiation, the terms b3 and b4 ``feed up" and so also contribute to improving the estimates of the 5th, 6th and 7th moments, etc, even if we drop the terms b5,b6,b7 from r(s). For the range of values Nd < n < N¥, these numerical examples may be used as a ``bench mark'' for the asymptotic expansions derived earlier. In the following sections we apply these ideas to our two ``working examples''.

7.1  Uniform distribution

Suppose Ft(t) is distributed according to the uniform distribution, Equation (65), with the semi-invariants b2 = -1/2, b4 = -1/20, b6 = -1/105, etc..

7.1.1  Nd and N¥

We consider the first two terms of n×R(s/Ön) to ascertain for what value of n do they become negligible when evaluated at s = ±3. The term in s6 is
-nb6 æ
ç
è
3
Ön
ö
÷
ø
6

 
= 243
35 n2
(72)
and this falls below 1% for n ³ 26. The term in s4 is
-nb4 æ
ç
è
3
Ön
ö
÷
ø
4

 
= 81
20 n
(73)
and this falls below 1% for n ³ 405.

We may draw the following conclusions, valid for |s| < 3.

7.1.2  Retain curtosis in characteristic function

If we retain b4s4 in R, then we reliably model the evolution of the curtosis. Hence, for the range of values between Nd and N¥, that is 26 < n < 405 we may use the following approximation:
FZn(z) » 1
  æ
Ö

2p
 
e-z2/2*G(zÖn,n)Ön where G(z,n) = FT-1{ enb4 s4 }
(74)
G(z,n)
=
1
2p
ó
õ
+¥

-¥ 
dse-iszenb4s4 = 1
p
ó
õ
¥

0 
ds cos(sz)enb4 s4
=
1
p    _____
4Ön|b4|
 
ó
õ
¥

0 
du cos(ux)exp(-u4) ê
ê

x = z/[4Ö(n|b4|)] 
(75)
Here we have used the fact that for this particular case b4 is negative. The computer program ``Mathematica" can perform the last integration in terms of hypergeometric functions. The function G is sketched below, figure 12, for the case n = 1.

square-g.gif
Figure 12: G(z,1)

From the convolution integral (74) we may write

FZn » 1
  æ
Ö

2p
 
e-z2/2H(z,n) where H = Ön ó
õ
+¥

-¥ 
e-u2/2+uzG(uÖn,n)du  ,
(76)
and G is given by Equation (75). Figure 13 shows that H(z,1) tries to boost the gaussian for values of the argument around 2, and then H drops precipitously in an attempt to cut off the long gaussian tails. However, because b4s4 is not an adequate approximation to R(s), H(z,1) does not have sufficient dynamic range to restore FX1 to the true uniform or ``top-hat" distribution.

square-1.gif
Figure 13: H(z,1) versus z.

Figure 14 shows H(z,20) plotted versus Zn. For n = 20, the b4s4 terms starts to become a good approximation to R(s). However, as can be seen in figure 15 using the gaussian term alone to represent FZn would yield an error of up to 7% at Zn = 3.

square-20a.gif
Figure 14: H(z,20) versus Zn.

square-20b-mma.gif
Figure 15: H(z,20) versus Zn.

Figure 16 shows H(z,400) plotted versus Zn. We predicted above that H would differ from unity by only 1%. The numerical values appearing in the plot confirm accuracy to 0.4% out to Zn = 3. Hence our prediction was slightly conservative. When n = 400, figure 17 indicates that if the gaussian approximation is used in place of the true FZn, then the error will be less than 2% out to Zn = 4.

square-400a.gif
Figure 16: H(z,400) versus Zn.

square-400b.gif
Figure 17: H(z,400) versus Zn.

7.2  Single photon distribution

Suppose Ft is distributed according to the single photon emission spectrum, Equation (66), with the semi-invariants
b2 = -1/2, b3 » -i×0.610341, b4 » +0.837408, b5 » +i×1.2227, b6 » -1.85535, etc..

7.2.1  Nd and N¥

We consider the first two terms of the remainder function n×R(s/Ön) to ascertain for what value of n do they become negligible when evaluated at s = ±3. The term in s4 is
nb4 æ
ç
è
3
Ön
ö
÷
ø
4

 
= 67.8301
n
  .
(77)
In the range of frequencies |s| £ 3, this term falls below 1% for n ³ 6783. The term in s3 is
nb3 æ
ç
è
3
Ön
ö
÷
ø
3

 
= 16.4792
Ön
(-i)
(78)
and this falls below 1% for n ³ 2.7156×106.

We may draw the following conclusions, valid for |s| < 3.

7.2.2  retain the skewness in characteristic function

As previously, let Zn = åi = 1nzi be the cumulative sum and FZn its distribution. If we retain b3s3 in R then we accurately model the evolution of skewness. Hence, for the range of values between Nd and N¥, that is 6783 < n < 2.716×106 we may use the following approximation:
FZn(z) » 1
  æ
Ö

2p
 
e-z2/2*G(zÖn,n)Ön where G(z,n) = FT-1{ enb3 s3 }  .
(79)
G(z,n)
=
1
2p
ó
õ
+¥

-¥ 
dse-isze-in|b3|s3 = 1
p
ó
õ
¥

0 
ds cos(sz +n|b3| s3)
=
1
p    _____
3Ön|b3|
 
ó
õ
¥

0 
du cos(ux+u3) ê
ê

x = z/[3Ö(n|b3|)] 
(80)
The computer program ``Mathematica" can perform the last integration in terms of modified Bessel functions of the first kind. The function G is sketched below for the case n = 1.

photon-true.gif
Figure 18: G(z,1)

As before, from we may write the convolution integral in the form (76) but with G given by Equation (80). Again the task is to show that H(z,n) is approximately constant over the range Zn = [-3,+3]. Figure 19 shows that H(z,1) attempts to restore the shape of the true single-photon spectrum. However, because for n < 6000 the term b3s3/Ön is a poor approximation to nR(s/Ön), so it follows that e-z2/2H(z,1)/Ö{2p} is a poor approximation to the true PZn(z).

photon-1.gif
Figure 19: H(z,1)

Figure 20 shows H for large enough n that b3s3/Ön has become the dominant term and is a sufficient approximation to n×R(s/Ön) for frequencies |s| £ 3. However, n is not yet large enough that FZn can be well approximated by the gaussian term alone. H(z,6000) differs from unity by about 9% at the extremities of the range Zn = [-3,+3].

For very large values of n, ``Mathematica" has problems with numerical accuracy when performing the numerical integrations that give H(z,n). These difficulties arise not because the integrand is oscillatory, but rather because one term of the integrand is essentially 0/0 and taking the limit numerically is prone to errors that can only be avoided by truncating the integral. The problems are anticipated to be much more acute for negative values of z. Figure 21 shows H for n = 105. H(z) is given reliably for z > 0 and indicates that H deviates from unity by about 5% at Zn = 3.

photon-6000.gif
Figure 20: H(z,6000) versus Zn.

photon-100k.gif
Figure 21: Numerical approximation to H(z,105) versus Zn.

Figure 22 shows H for n = 106 large enough that the CLT starts to apply with some confidence. Unfortunately, n is also large enough that the numerical integrals which approximate to H are wildly inaccurate for z < 0. However, the numerical approximation to H is still acceptable for z > 0. Figure 23 shows that H differs from unity by less than 1% at Zn = 3. The reader will recall that we predicted above (section 7.2.1) that this level of accuracy would not occur until n = 2.7×106, and so our prediction was somewhat conservative.

photon-1ma.gif
Figure 22: Numerical approximation to H(z,106).

photon-1mb.gif
Figure 23: H(z,106) versus Zn mostly positive.

8  Conclusion

Inspired by the problem of ``fat photons'', finding the equivalent cumulative energy loss spectrum for electrons emitting synchrotron radiation, we have considered the Central Limit Theorem and its applicability when the number of random variables does not approach infinity. We have repeated the discoveries of Tchebychev-Hermite and constructed an asymptotic expansion that allows approximate computation of multiple self-convolution integrals merely from a knowledge of the moments of a density function. Using the program ``Mathematica'', the expansion has been carried through to a very high order and then coded into a Fortran subroutine for fast evaluation. A subroutine to construct the corresponding distribution function has also been coded in Fortran, and may be called to generate random events with the prescribed probability density. The subroutines take as arguments the moments of the users arbitrary function.

References

[1]
A.I. Khinchin: Mathematical Foundations of Statistical Mechanics, Dover Publications, New York (1949).
[2]
B.V. Gnedenko & A.N. Kolmorogrov: Limit Distributions for Sums of Independent Random Variables, Addison-Wesley Publishing Co.
[3]
Cramer: Mathematical Methods of Statistics, Princeton Univ. Press (1946).
[4]
M. Loeve: Probability Theory, Vol.I, Springer-Verlag (1977).
[5]
J.M. Jowett: Introductory Statistical Mechanics for Electron Storage Rings, SLAC-PUB-4033 (1986).
[6]
I. Gradshteyn & I. Rhyzik: Integrals, Series and Products, Academic Press (1980).


Footnotes:

1 This must be so, else the transform f(s) does not exist.


File translated from TEX by TTH, version 1.95.
On 28 Jan 2000, 09:31.