02 June 2024

On Least Squares Method

"From the foregoing we see that the two justifications each leave something to be desired. The first depends entirely on the hypothetical form of the probability of the error; as soon as that form is rejected, the values of the unknowns produced by the method of least squares are no more the most probable values than is the arithmetic mean in the simplest case mentioned above. The second justification leaves us entirely in the dark about what to do when the number of observations is not large. In this case the method of least squares no longer has the status of a law ordained by the probability calculus but has only the simplicity of the operations it entails to recommend it." (Carl Friedrich Gauss, "Anzeige: Theoria combinationis observationum erroribus minimis obnoxiae: Pars prior", Göttingische gelehrte Anzeigen, 1821)

"[…] in the Law of Errors we are concerned only with the objective quantities about which mathematical reasoning is ordinarily exercised; whereas in the Method of Least Squares, as in the moral sciences, we are concerned with a psychical quantity - the greatest possible quantity of advantage." (Francis Y Edgeworth, "The method of least squares", 1883) 

"The method of least squares is used in the analysis of data from planned experiments and also in the analysis of data from unplanned happenings. The word 'regression' is most often used to describe analysis of unplanned data. It is the tacit assumption that the requirements for the validity of least squares analysis are satisfied for unplanned data that produces a great deal of trouble." (George E P Box, "Use and Abuse of Regression", 1966)

"At the heart of probabilistic statistical analysis is the assumption that a set of data arises as a sample from a distribution in some class of probability distributions. The reasons for making distributional assumptions about data are several. First, if we can describe a set of data as a sample from a certain theoretical distribution, say a normal distribution (also called a Gaussian distribution), then we can achieve a valuable compactness of description for the data. For example, in the normal case, the data can be succinctly described by giving the mean and standard deviation and stating that the empirical (sample) distribution of the data is well approximated by the normal distribution. A second reason for distributional assumptions is that they can lead to useful statistical procedures. For example, the assumption that data are generated by normal probability distributions leads to the analysis of variance and least squares. Similarly, much of the theory and technology of reliability assumes samples from the exponential, Weibull, or gamma distribution. A third reason is that the assumptions allow us to characterize the sampling distribution of statistics computed during the analysis and thereby make inferences and probabilistic statements about unknown aspects of the underlying distribution. For example, assuming the data are a sample from a normal distribution allows us to use the t-distribution to form confidence intervals for the mean of the theoretical distribution. A fourth reason for distributional assumptions is that understanding the distribution of a set of data can sometimes shed light on the physical mechanisms involved in generating the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Least squares' means just what it says: you minimise the (suitably weighted) squared difference between a set of measurements and their predicted values. This is done by varying the parameters you want to estimate: the predicted values are adjusted so as to be close to the measurements; squaring the differences means that greater importance is placed on removing the large deviations." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Principal components and principal factor analysis lack a well-developed theoretical framework like that of least squares regression. They consequently provide no systematic way to test hypotheses about the number of factors to retain, the size of factor loadings, or the correlations between factors, for example. Such tests are possible using a different approach, based on maximum-likelihood estimation." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Fuzzy models should make good predictions even when they are asked to predict on regions that were not excited during the construction of the model. The generalization capabilities can be controlled by an appropriate initialization of the consequences (prior knowledge) and the use of the recursive least squares to improve the prior choices. The prior knowledge can be obtained from the data." (Jairo Espinosa et al, "Fuzzy Logic, Identification and Predictive Control", 2005)

"Often when people relate essentially the same variable in two different groups, or at two different times, they see this same phenomenon - the tendency of the response variable to be closer to the mean than the predicted value. Unfortunately, people try to interpret this by thinking that the performance of those far from the mean is deteriorating, but it’s just a mathematical fact about the correlation. So, today we try to be less judgmental about this phenomenon and we call it regression to the mean. We managed to get rid of the term 'mediocrity', but the name regression stuck as a name for the whole least squares fitting procedure - and that’s where we get the term regression line." (Richard D De Veaux et al, "Stats: Data and Models", 2016)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

On Data: Longitudinal Data

  "Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One o...