Data normality

Most of the mainstem and tributary databases appeared normal, or approximately so, based on comparison of means and medians and, more analytically, on assessment of skew and kurtosis.

Normal refers to a probability function showing a symmetrical distribution of data, with most of the observations clustered around the central peak, and with the probabilities of values further from the mean tapering evenly in both directions. (See normality plots.)

In fact, because all the databases were large, each with 1248 to 2976 data points, the central limit theorem pertained, enabling the direct interpretation of normality for all the databases [1, 2].

The usual threshold for large is about 30 data points [1]. Clearly, the databases in this study appropriately were considered large.

That is, with large databases, the estimated database means—the ones calculated from the measurement data—were more likely to be near the true population means, which is to say, were more likely to be accurate and appropriately used to represent that population.

This is a direct result of the fact that as sample size increases, standard deviation decreases. Standard deviation determines the width, or narrowness, of a data distribution.

If this study were repeated for the same location and exact conditions, and the same large number of measurements were collected, the means from those databases would be similar and the mean of those means would approach the true, but unknowable, population mean.

This study has only one data collection per location and conditions. Because each database was large, however, the calculated mean of the narrowly distributed data was near the unknowable population mean, and appropriately could be used to represent that mean.

By conditions are meant factors that can change, like precipitation, air temperature, hydrologic patterns, vegetation cover, along with those that cannot (within a study time frame), an example being geology.

It was judged appropriate, therefore, to be confident in using calculated means to represent the populations sampled, including, for example, for comparing temperatures among monitoring sites and evaluating against criteria, see WAT plots, and for estimating elevations of coldenough water.
References

Frost, Jim, 2020, Hypothesis Testing, An Intuitive Guide to for Making Data Driven Decisions, Statistics by Jim Publishing, State College, PA.

Sokal, R. R. and F. J. Rohlf, 1969, Biometry, San Francisco, W. H. Freeman and Company.
Summary  WATTAT  Statistics  tTest  Normality
Summary  WATTAT  Statistics  tTest  Normality