GEOPHYSICAL RESEARCH LETTERS, VOL. 27, NO. 20, PAGES 3333-3336, OCTOBER 15,2000 The lognormal distribution as a reference for reporting aerosol optical depth statistics; Empirical tests using multi-year, multi-site j l where z is the aerosol optical depth, zg is the geometric mean and log 1-1 is the geometric standard deviation (see Aitchison and Brown, ( 1972) for example). Throughout the text "log" refers to loglo while "In" represents log,. The representation of the P distribution in T space is given by; 1 1 P,(z) = - -P,(logz) z In10 This functional representation is mathematically inconsistent (one does not just substitute the argument "z" for "log z" in P,(logz)) but we have retained the formulation to keep the nomenclature as simple as possible. The y2/ distribution in z space Users of AOD statistics require that the reported parameters adequately mimic sample histograms so that derived quantities meet the accuracy needs of model driven applications such as is; radiative forcing or aerosol dispersion. The more complex the parameterization the more fieedom one has to achieve better histogram characterizations. However, the increased complexity necessarily renders all associated operations more difficult and Table 1 summarizes the nomenclature of these three analytical ultimately wasteful when the level of parameterization is distributions while Figure 1 illustrates the form of the excessive relative to the information content of the measurements. distributions and their basic statistical parameters. When a It is thus of some importance to search for a degree of particular parameter is computed for a given analytical parameterization which is as simple as possible while achieving a level of distribution characterization which is commensurate with application requirements. In this note we empirically evaluate the applicability of arithmetic and geometric parameters in log T and z space and of the associated ! and m probability distributions to a multi-year 1 P(x) = -- N dx where dN is the number of samples in the increment dx and N is the total number of samples. The use of the exact derivative in equation ( 1 ) implies that the series of N measurements has been repeated an arbitrarily large number of times. The ! distribution is simply a normal distribution with x = log z; P,(logz) Figure 1 ; the left hand and right hand panels show the probability distributions and associated parameters in log z and space respectively (c.f. Table 1). The arithmetic mean of the normal distribution is set equal to the arithmetic mean of the lognormal representation in linear 5 space (i.e. * = (). Table 1 . Nomenclature for the analytical distributions. The analytical expressions under the P,(T) column are derived from equation (2b). See Fig. 1 for illustrations of most of these parameters. log z space p, (1% mean ‘I>( --;r ~ = z g e ln*p/2 O(Z)( = LTI, = 1 ‘ T~ x {z), = zg e-ln2p Below we compare and evaluate the quality of B and m fits to data histograms. These fits will not be in the sense of minimum residuals but rather in the more pragmatic sense of allowing the z) or N x P,(T)) to analytical frequency distribution (N x P,(log assume the same mean, standard deviation and number of measurements as the data histogram. zg Some tests for the quality of normal or lognormal representations of data histograms Two common higher order parameters for the characterization of data histograms are the skewness and kurtosis (7, and defined in Ambramowitz and Stegun [ 19721). Skewness is an indicator of v1 80- 70 - 60- 2 0 0 r Egbert, 1998; N = 1206 measurements ~~ I ’a f 0 4 0 2 ~(500 nm) log1 0[?(500 nm)] GSFC, 1998; N = 1074 measurements - 120- geometric mean z g x p i = 0 2 0 4 ~ 2 1 1 ” 1 anthmetic mean < z >I - p, p, (7) 0) * {TIrn [ T l a 140 120 j - 100 1 8 80 60 40 4 ~ 0 5 20 i ‘0 06 T-7 1 I J A 0 5 -(50O nm) -1 5 -1 -05 0 iog10[~(500 nm)] 3335 O'NEILL ET AL.: REPORTING AEROSOL OPTICAL DEPTH STATISTICS Table 2. Station and data ensemble parameters. Last two columns show associated aerosol classes for each station. lat. long. station aerosol influence(s) background class on top of background biomass burning ASL land cover data years aerosol available 96 - 98 96 - 98 rural industrial & rural urban emissions, W76'52' (m.1 W 106'04' 550 boreal forest 50 suburban N53'55' N39'01' Waskesiu, Sask., CAN GSFC. MD, USA maritime aerosols industrial & rural urban emissions biomass burning rural 96, 98, 99 rural 264 farmland 98 21 8 boreal forest 96 - 99 212 farmland N44' 13' W79"45' N55'47' W97'50 N40'03' W88'22 IL, USA Egbert, Ont., CAN Thompson, Man.. CAN Bondville, Lanai, HI, USA maritime 97,98 80 island N20'49' W156'59' industrial volcanic emissions / Asian dust biomass burning industrial 96 - 98 98 rural maritime/dust 1 107 savanna island S 15'1 5' E23'09' E50'30 0 Mongu. Zambia Bahrain N26'19' fit to a data histogram is to ascertain whether the fitted curve correctly predicts the AOD position of histogram features other than those used in constraining the fit. One such test is to estimate the histogram peak position (mode) in t space which, as will be seen in the data histogram examples below, is not co-located with the mean. Another feature-position test is to ascertain whether the distribution asymmetry and is negative for a distribution displaying a left hand tail, positive for a right hand tailed distribution and zero for a normal distribution. Kurtosis is an indicator of the peakedness of a distribution and is postive for a very peaked distribution, negative for a flat distribution and zero for a normal distribution. Skewness and kurtosis are measures of the general form of the data histogram and can be used as higher order indicators of how closely the form resembles a normal distribution. A test which representation of the I; distribution in t space can be used to predict the arithmetic mean (in the case of the y1/ distribution the test is irrelevant since the y1/ distribution mean is set equal to the permits a more intuitive understanding of the quality of an y1/ or L 1.11 I . I . 0 ! ! ,I ! ! I ! I I i. I i ! .! ! . il m .. .. i I ' - : . Figure 3; (a) histogam skewness (skewness for a normal distribution, whether in log T space or t space, is Zero). (b) histogram kurtosis (kurtosis for a normal distribution is zero). (c) error in the estimated histogram peak position in t space as estimated using the lognomial fit distribution. (d) error in the arithmetic mean of the histogram as estimated using the lognormal tit distribution. O'NEILL ET AL.: REPORTING AEROSOL OPTICAL DEPTH STATISTICS 3336 arithmetic mean of the histogram). This arithmetic mean estimation test is interesting from the standpoint that a generally small mean error would ensure continuity with data sets of AOD arithmetic mean simply by employing the basic statistical parameters of the distribution to compute the arithmetic mean. Logarithmic and linear representations of optical depth histograms in this section we present a sampling of histograms using data acquired by CiMEL sunphotometers of the AERONET network over a variety of stations and a variable number of years. Detailed specifications of the AERONET instruments and data acquisition system are described elsewhere (Holben et al., 1998). Table 2 is a listing of the stations from which data were acquired and those years for which data was available. The table includes information on the background regional aerosol as well as major aerosol influences which may dominate local sunphotometry at a given station. The choice of stations was largely influenced by a desire to represent the greatest possible variety of aerosol types. The study was limited to a standard wavelength of 500 nm. All the data chosen in our study were cloud screened according to the procedure defined in Smirnov et al. (2000). Three months of data was taken as the standard sampling period in order to achieve a frequency of measurements which was of sufficient density to permit a significant number of measurements per sampling bin. These three months corresponded to the summer period of June, July and August except in the case of Mongu where the August to October period was chosen in order maximize the influence of the biomass burning season. Thirty sampling bins between extreme AOD values in log t space and sixty sampling bins between extreme AOD values in T space were used in the generation of data histograms. This number of bins seemed to give reasonably smooth histograms and distribution fits in both spaces; however it was ascertained that the generated statistical parameters were fairly insensitive to bin number and bin width. Figure 2a shows some selected sample histograms along with the B and m fits in log t and t space. The two-panel figures were designed so that Figure I could serve as a reference template to indicate the salient features of the ! and m analytical distributions respectively. These figures also provide an indication of the variation in the geometric mean and geometric standard deviation p. The analytical fits qualitatively demonstrate the superiority of the B fit over the m fit both in log t space where the histogram is generally more symmetric and normal in appearance and in linear t space where the asymmetric form of the P representation is clearly better matched to the positively skewed form of the data histograms. Figure 2b shows some sample histograms where the ! fits were still generally superior to the m fits but where certain features in the log t histogram distribution degraded the quality of the 0 fit. These include the bi-modal features in the GSFC histogram of 1998 and the negative skewness in the Mongu histogram of 1997. Figure 3 shows the four test parameters of skewness, kurtosis, peak location error and arithmetic mean estimation error for all Jour. App. .Met., 1999. stations and available years. Figures 3a and 3b demonstrate that the skewness and kurtosis calculated in log T space is systematically more normal like (closer to zero) than the equivalent calculations in T space. The '5 space histograms are positively skewed; although the log t space representation shows (Received March 13: 2000; revised August 3, 2000; some positive skewness it is significantly less than the former. References Aitchison, J.. Brown, 1 . A. C., The lognormal distribution, Cumbridge L'nivemity Press, 176 pp., 1957. Ambramowitz, M., Stegun. 1. A., Handbook of Mathematical Functions, Dover Publications Inc.. New York, 1972. Campbell, J . W., The lognormal distribution as a model for bio-optical variability in the sea, J. Geophys. Rex, Vol. 100, No. C7, pp. 13237- 13254, 1995. Malm, W. C., Walther, E. G., Cudney, R. A,, The Effects of Water Vapor, Ozone and Aerosol on Atmospheric Turbidity, Jour. App. Met. Vol. 16, pp. 268-274, 1977. King, M. D., Byme, D. M., Reagan, J. A., Herman, B. M., Spectral Variation of Optical Depth at Tucson Arizona between August 1975 and December 1977, Jour. App. Met., Vol. 19, pp. 723-732, 1980. Holhen, B.N., T.F;.Eck, ISlutsker, D.Tanre, J.P.Buis, A.Setzer, E.Vermote, J.A.Reagan, Y .J.Kaufman, T.Nakajima, F.Lavenu, LJankowiak, and ASmimov, AERONET - A federated instrument network and data archive for aerosol characterization, Rem.Sens.Env., 66(l), 1-16, 1998. Ignatov, A, Stowe, L., Physical Basis, Premises, and Self-consistency Checks of Aerosol Retrievals from TRMMIVIRS, submitted to the Ignatov, A, Yu. Ratner, personal communication, 1995. Smimov, A., Holben, B. N., Eck, T. F., Dubovik, O., Slutsker, I., Cloud Screening and Quality Control Algorithms for the AERONET data base, accepted for publication in accepted August 7; 2000.) Acknowledgements The authors thank NASA and the Uational Science Foundation (NRC) for their support. The thoughtful advice of Sasha Smirnov is gratefully acknowledged. !J Figures 3c and 3d show the errors in histogram peak location and estimated arithmetic mean. For all stations and all years the rms average peak location error for the m distribution fit was 0.13 while the rms average error for the P representation was 0.03. The use of the ! representation to estimate the arithmetic mean and standard deviation of the histogram in t space yielded rms errors of 0.01 and 0.04 respectively for all stations and all years (the arithmetic mean for the m distribution fit is set equal to the histogram arithmetic mean as indicated above). Thus the distribution can be used to estimate histogram features in linear t space to accuracies which are of the order of or a little greater than typical sunphotometry errors of 0.01 to 0.02. Conclusion Multi-year and multi-station AOD data was employed to demonstrate that the lognormal probability distribution was systematically a better reference for reporting AOD statistics than a normal probability distribution. Comparative tests in log t and t space showed that data histograms in the former nearly always corresponded to smaller values of skewness and kurtosis and accordingly that this was a better space for a normal representation of the histograms. The estimation of the AOD value corresponding to the histogram peak in t space was significantly better if a lognormal fit was applied to the data histogram. The use of the lognormal distribution to predict the arithmetic mean and standard deviation of the histogram in t space yielded reasonably accurate estimates and as such provides a means of ensuring the continuity of data archives based on arithmetic means. in certain cases the data histograms displayed apparent bi-modal features or negative skewness in log t space which neither distribution could adequately fit but for which the lognormal distribution was still a better reference. Remote Sens. Environ., 2000.