28 July 2025

Statistical Tools IV: Urns

"The early experts in probability theory were forever talking about drawing colored  balls out of 'urns' . This was not because people are really interested in jars or boxes full of a mixed-up lot of colored balls, but because those urns full of balls could often be designed so that they served as useful and illuminating models of important real situations. In fact, the urns and balls are not themselves supposed real. They are fictitious and idealized urns and balls, so that the probability of drawing out any one ball is just the same as for any other." (Warren Weaver, "Lady Luck: The Theory of Probability". 1963) 

"The urn model is to be the expression of three postulates: (1) the constancy of a probability distribution, ensured by the solidity of the vessel, (2) the random-character of the choice, ensured by the narrowness of the mouth, which is to prevent visibility of the contents and any consciously selective choice, (3) the independence of successive choices, whenever the drawn balls are put back into the urn. Of course in abstract probability and statistics the word 'choice' can be avoided and all can be done without any reference to such a model. But as soon as the abstract theory is to be applied, random choice plays an essential role." (Hans Freudenthal, "The Concept and the Role of the Model in Mathematics and Natural and Social Sciences", 1961)

"Specifically, it seems to me preferable to use, systematically: 'random' for that which is the object of the theory of probability […]; I will therefore say random process, not stochastic process. 'stochastic' for that which is valid 'in the sense of the calculus of probability': for instance; stochastic independence, stochastic convergence, stochastic integral; more generally, stochastic property, stochastic models, stochastic interpretation, stochastic laws; or also, stochastic matrix, stochastic distribution, etc. As for 'chance', it is perhaps better to reserve it for less technical use: in the familiar sense of'by chance', 'not for a known or imaginable reason', or (but in this case we should give notice of the fact) in the sense of, 'with equal probability' as in 'chance drawings from an urn', 'chance subdivision', and similar examples." (Bruno de Finetti, "Theory of Probability", 1974)

"Statisticians talk about populations. In probability books, the equivalent concept is an urn with numbered balls as a prototype for a population. In fact, when sampling from populations, it is customary to number the population and pretend the population is an urn from which we are drawing the sample." (Juana Sánchez, "Probability for Data Scientists", 2020)

"Many people mistakenly think that the defining property of a simple random sample is that every unit has an equal chance of being in the sample. However, this is not the case. A simple random sample of n units from a population of N means that every possible col‐lection of n of the N units has the same chance of being selected. A slight variant of this is the simple random sample with replacement, where the units/marbles are returned to the urn after each draw. This method also has the property that every sample of n units from a population of N is equally likely to be selected. The difference, though, is that there are more possible sets of n units because the same marble can appear more than once in the sample." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Several key assumptions enter into this urn model, such as the assumption that the vaccine is ineffective. It’s important to keep track of the reliance on these assumptions because our simulation study gives us an approximation of the rarity of an outcome like the one observed only under these key assumptions." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"The urn model is a simple abstraction that can be helpful for understanding variation.This model sets up a container (an urn, which is like a vase or a bucket) full of identical marbles that have been labeled, and we use the simple action of drawing marbles from the urn to reason about sampling schemes, randomized controlled experiments, and measurement error. For each of these types of variation, the urn model helps us estimate the size of the variation using either probability or simulation." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

On Pythagoras of Samos

"The so-called Pythagoreans applied themselves to mathematics, and were the first to develop this science; and through studying it they...