23 June 2026

James G Scott - Collected Quotes

"A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, and will never account for every last detail. A useful mantra here is: all models are wrong, but some models are useful." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] always remember that the construction of an ANOVA table is inherently sequential. For example, first we add the clutter variable, which remains in the model at every subsequent step; then we add the distance variable, which remains in the model at every subsequent step; and so forth. Thus the actual question being answered at each stage of an analysis of variance is: how much variation in the response can this new variable predict, in the context of what has already been predicted by other variables in the model? This point - the importance of context in interpreting an ANOVA table - is subtle, but important." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"An obvious question is: do bootstrapped confidence intervals satisfy the frequentist coverage property? If your sample is fairly representative of the population, then the answer is a qualified yes. That is, the bootstrapping procedure yields nominal X% intervals that cover the true value 'approximately' X% of the time. Moreover, as the size of the original sample gets bigger, the quality of the approximation gets better. Alas, it is necessary to appeal to some very advanced probability theory to put both of these claims on firm footing." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"At the core of the resampling approach to statistical inference lies a simple idea. Most of the time, we can’t feasibly take repeated samples of size n from the population, to see how our estimate changes from one sample to the next. But we can repeatedly take samples of size n from the sample itself, and apply our estimator afresh to each notional sample. The idea is that the variability of the estimates across all these samples can be used to approximate our estimator’s true sampling distribution. This process - pretending that our sample is the whole population, and taking repeated samples of size n with replacement from our original sample of size n - is called bootstrap resampling, or just bootstrapping" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"By themselves, sums of squares are hard to interpret, because they are measured in squared units of the Y variable. But their ratios are highly meaningful. In fact, the ratio of PV to TV - or what fraction of the total variation has been predicted by the model - is one of the most frequently quoted summary measures in all of statistical modeling. This ratio is called the coefficient of determination, and is usually denoted by the symbol R2 [...] The correct interpretation of R2 sometimes trips people up, and is therefore worth repeating: it is the proportion of variance in the data that can be predicted using the statistical model in question." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Good estimators are those that usually yield estimates close to the truth, with minimal variation. Therefore, we typically summarize a sampling distribution using its standard deviation, which we refer to as the standard error. In quoting the standard error of an estimator’s sampling distribution, you are saying: 'If I were to take repeated samples from the population and use this estimatorfor every sample, my estimate is typically off from the truth by about this much.' Notice again that this is a claim about a procedure, not a particular estimate. The bigger the standard error, the less stable the estimator across different samples, and the less you can trust the estimate for any particular sample." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"In fitting statistical models, we typically equate the trustworthiness of a procedure with its stability under the influence of luck, and we seek to measure the degree to which that procedure might have given a different answer if the forces of randomness had made the world look a bit different. Specifically, the question we seek to answer is: 'if our data set had been different merely due to chance, would our answer have been different, too?'" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Model-building requires much more than just technical knowledge of statistical ideas. It also requires care and judgment, and cannot be reduced to a flowchart, a table of formulas, or a tidy set of numerical summaries that wring every last drop of truth from a data set. There is almost never a single 'right' statistical model for some problem. But there are definitely such things as good models and bad models, and learning to tell the difference is important. Just remember: calling a model good or bad requires knowing both the tool and the task." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] complexity sometimes comes at the expense of explanatory power. We must avoid building models calibrated so perfectly to past experience that they do not generalize to future cases." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"It is common to view a statistical model as nothing more than a recipe for calculating the fitted values, and to think that the residuals are just the errors made by this model. But we’ll have a richer picture if instead we view the residuals as part of the model. If you’ve ignored the variation in the residuals, then you really haven’t specified a complete forecast." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Resampling won’t yield the true sampling distribution of an estimator, but it is often good enough for approximating the standard error (which you’ll remember is just the standard deviation of the sampling distribution). We use the term bootstrapped standard error for the standard deviation of the bootstrapped sampling distribution. The bootstrapped standard error is an estimate of the true standard error." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Tables are almost always the best way to display categorical data sets with few classifying variables, for the simple reason that they convey a lot of information in a small space." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"The residuals from a regression model are sometimes called 'errors'. This is especially true in experimental science, where measurements of some Y variable will be taken at different values of the X variable (called design points), and where noisy measurement instruments can introduce random errors into theobservations. But in many cases this interpretation of a residual as an error can be misleading. A regression model can still give a nonzero residual, even if there is no mistake in the measurement of the Y variable. It’s often far more illuminating to think of the residual as the part of the Y variable that it is left unpredicted by X." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

James G Scott - Collected Quotes

"A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, ...