Contents
Brief Contents
- Chapter 1 Basics of Climate Data Arrays, Statistics, and Visualization
- Chapter 2 Elementary Probability and Statistics
- Chapter 3 Estimation and Decision Making
- Chapter 4 Regression Models and Methods
- Chapter 5 Matrices for Climate Data
- Chapter 6 Covariance Matrices, EOFs, and PCs
- Chapter 7 Introduction to Time Series
- Chapter 8 Spectral Analysis of Time Series
- Chapter 9 Introduction to Machine Learning
Full Table of Contents
- Preface page ix
- Acknowledgements xix
- Chapter 1: Basics of Climate Data Arrays, Statistics, and Visualization 1
- 1.1 Global temperature anomalies from 1880 to 2018 1
- 1.1.1 The NOAAGlobalTemp dataset 2
- 1.1.2 Visualize the data of global average annual mean temperature 3
- 1.1.3 Statistical indices 8
- 1.2 Commonly used climate statistical plots 11
- 1.2.1 Histogram of a set of data 11
- 1.2.2 Box plot 13
- 1.2.3 Q-Q plot 14
- 1.2.4 Plot a linear trend line 17
- 1.3 Read netCDF data file and plot spatial data maps 18
- 1.3.1 Read netCDF data 18
- 1.3.2 Plot a spatial map of temperature 20
- 1.3.3 Panoply plot of a spatial map of temperature 21
- 1.4 1D-space-1D-time data and Hovmo ̈ller diagram 22
- 1.5 4D netCDF file and its map plotting 24
- 1.6 Paraview and 4DVD 27
- 1.6.1 Paraview 27
- 1.6.2 4DVD 28
- 1.6.3 Other online climate data visualization tools 28
- 1.7 Chapter summary 30
- References and Further Readings 31
- Chapter 2: Elementary Probability and Statistics 36
- 2.1 Random variables 36
- 2.1.1 Definition 36
- 2.1.2 Probabilities of a random variable 40
- 2.1.3 Conditional probability and Bayes’ theorem 40
- 2.1.4 Probability of a dry spell 41
- 2.1.5 Generate random numbers 42
- 2.2 PDF and CDF 43
- 2.2.1 The dry spell example 43
- 2.2.2 Binomial distribution 46
- 2.2.3 Normal distribution 49
- 2.2.4 PDF and histogram 52
- 2.3 Expected values, variances and higher moments of an RV 55
- 2.3.1 Definitions 55
- 2.3.2 Properties of expected values 56
- 2.4 Joint distributions of X and Y 57
- 2.4.1 Joint distributions and marginal distributions 57
- 2.4.2 Covariance and correlation 60
- 2.5 Additional commonly used probabilistic distributions in climate science 60
- 2.5.1 Poisson distribution 60
- 2.5.2 Exponential distribution 62
- 2.5.3 Mathematical expression of normal distributions, mean,and the central limit theorem 64
- 2.5.4 Chi-square χ2 distribution 66
- 2.5.5 Lognormal distribution 70
- 2.5.6 Gamma distribution 71
- 2.5.7 Student’s t-distribution 73
- 2.6 Chapter Summary 73
- References and Further Readings 75
- Chapter 3: Estimation and Decision Making 79
- 3.1 From data to estimate 79
- 3.1.1 Sample mean and its standard error 79
- 3.1.2 Confidence interval for the true mean 84
- 3.2 Decision making by statistical inference 92
- 3.2.1 Contingency table for the decision-making with uncertainty 93
- 3.2.2 Steps of hypothesis testing 94
- 3.2.3 Interpretations of the significance level, power, and p-value 98
- 3.2.4 Hypothesis testing for the 1997-2016 mean of the Edmonton January temperature anomalies 99
- 3.3 Effective sample size 101
- 3.4 Test the goodness of fit 111
- 3.4.1 The number of clear, partly cloudy, and cloudy days 112
- 3.4.2 Fit the monthly rainfall data to a Gamma distribution 113
- 3.5 Kolmogorov-Smirnov test for cumulative distributions 116
- 3.6 Decide the existence of a significant relationship 121
- 3.6.1 Correlation and t-test 122
- 3.6.2 Kendall tau test for the existence of a relationship 124
- 3.6.3 Mann-Kendall test for trend 126
- 3.7 Chapter summary 127
- References and Further Readings 128
- Chapter 4: Regression Models and Methods 131
- 4.1 Simple linear regression 131
- 4.1.1 Temperature lapse rate and an approximately linear model 131
- 4.1.2 Assumptions and formula derivations of the single variate linear regression 135
- 4.1.3 Statistics of slope and intercept: Distributions, confidence intervals, and inference 146
- 4.2 Multiple linear regression 159
- 4.2.1 Calculating the Colorado TLR when taking location coordinates into account 159
- 4.2.2 Formulas for estimating parameters in the multiple linear regression 162
- 4.3 Nonlinear fittings using the multiple linear regression 164
- 4.3.1 Diagnostics of linear regression: An example of global temperature 164
- 4.3.2 Fit a third order polynomial 169
- 4.4 Chapter Summary 172
- References and Further Readings 173
- Chapter 5: Matrices for Climate Data 177
- 5.1 Matrix definitions 177
- 5.2 Fundamental properties and basic operations of matrices 180
- 5.3 Some basic concepts and theories of linear algebra 185
- 5.3.1 Linear equations 186
- 5.3.2 Linear transformations 187
- 5.3.3 Linear independence 187
- 5.3.4 Determinants 188
- 5.3.5 Rank of a matrix 189
- 5.4 Eigenvectors and eigenvalues 192
- 5.4.1 Definition of eigenvectors and eigenvalues 192
- 5.4.2 Properties of eigenvectors and eigenvalues for a symmetric matrix 195
- 5.5 Singular Value Decomposition 198
- 5.5.1 SVD formula and a simple SVD example 199
- 5.6 SVD for the standardized sea level pressure data of Tahiti and Darwin 204
- 5.7 Chapter Summary 207
- References and Further Readings 208
- Chapter 6: Covariance Matrices, EOFs, and PCs 213
- 6.1 From a space-time data matrix to a covariance matrix 213
- 6.2 Definition of EOFs and PCs 221
- 6.2.1 Defining EOFs and PCs from the sample covariance matrix 221
- 6.2.2 Percentage variance explained 223
- 6.2.3 Temporal covariance matrix 225
- 6.3 Climate field and its EOFs 227
- 6.3.1 SVD for a climate field 227
- 6.3.2 Stochastic climate field and covariance function 228
- 6.4 Generating random fields 229
- 6.5 Sampling errors for EOFs 235
- 6.5.1 Sampling error of mean and variance of a random variable 235
- 6.5.2 Errors of the sample eigenvalues 236
- 6.5.3 North’s rule-of-thumb: Errors of the sample eigenvectors 237
- 6.5.4 EOF errors and mode mixing: A 1D example 238
- 6.5.5 EOF errors and mode mixing: A 2D example 247
- 6.5.6 The original paper of North’s Rule-of-Thumb 251
- 6.5.7 When there is serial correlation 252
- 6.6 Chapter Summary 252
- References and Further Readings 254
- Chapter 7: Introduction to Time Series 259
- 7.1 Examples of time series data 259
- 7.1.1 The Keeling curve: Carbon dioxide data of Mauna Loa 260
- 7.1.2 ETS decomposition of the CO2 time series data 262
- 7.1.3 Forecasting the CO2 data time series 266
- 7.1.4 Ten years of daily minimum temperature data of St. Paul, Minnesota, US 268
- 7.2 White noise 271
- 7.3 Random walk 274
- 7.4 Stochastic processes and stationarity 280
- 7.4.1 Stochastic processes 280
- 7.4.2 Stationarity 280
- 7.4.3 Test for stationarity 281
- 7.5 Moving average time series 282
- 7.6 Autoregressive process 285
- 7.6.1 Brownian motion and autoregressive model AR(1) 285
- 7.6.2 Simulations of AR(1) time series 286
- 7.6.3 Auto-covariance of AR(1) time series when X0 = 0 288
- 7.7 Fit time series models to data 292
- 7.7.1 Estimate the MA(1) model parameters 293
- 7.7.2 Estimate the AR(1) model parameters 294
- 7.7.3 Difference a time series 294
- 7.7.4 AR(p) model estimation using Yule-Walker equations 295
- 7.7.5 ARIMA(p, d, q) model and data fitting by R 295
- 7.8 Chapter Summary 300
- References and Further Readings 302
- Chapter 8: Spectral Analysis of Time Series 306
- 8.1 The sine oscillation 306
- 8.2 Discrete Fourier series and periodograms 311
- 8.2.1 Discrete sine transform 312
- 8.2.2 Discrete Fourier transform 313
- 8.2.3 Energy identity 333
- 8.2.4 Periodogram of white noise 334
- 8.3 Fourier transform in (−∞, ∞) 335
- 8.4 Fourier series for a continuous time series on a finite time interval [−T /2, T /2] 336
- 8.5 Chapter Summary 340
- References and Further Readings 343
- Chapter 9: Introduction to Machine Learning 351
- 9.1 K-means clustering 351
- 9.1.1 K-means setup and trivial examples 352
- 9.1.2 A K-means algorithm 358
- 9.1.3 K-means clustering for the daily Miami weather data 360
- 9.2 Support vector machine 370
- 9.2.1 SVM for a system of three points labeled in two categories 375
- 9.2.2 SVM mathematical formulation for a system of many points in two categories 379
- 9.3 Random forest method for classification and regression 385
- 9.3.1 RF flower classification for a benchmark iris dataset 385
- 9.3.2 RF regression for the daily ozone data of New York City 392
- 9.3.3 What does a decision tree look like? 396
- 9.4 Neural network and deep learning 399
- 9.4.1 An NN model for an automized decision system 399
- 9.4.2 An NN prediction of iris species 406
- 9.5 Chapter Summary 410
- References and Further Readings 411
- Index 415