15 August 2025

Steve McKillup - Collected Quotes

"A correlation between two variables means they vary together. A positive correlation means that high values of one variable are associated with high values of the other, while a negative correlation means that high values of one variable are associated with low values of the other." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Accuracy is the closeness of a measured value to the true value. Precision is the ‘spread’ or variability of repeated measures of the same value." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Correlation is an exploratory technique used to examine whether the values of two variables are significantly related, meaning whether the values of both variables change together in a consistent way. (For example, an increase in one may be accompanied by a decrease in the other.) There is no expectation that the value of one variable can be predicted from the other, or that there is any causal relationship between them." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Designing a well-controlled, appropriately replicated and realistic experiment has been described by some researchers as an ‘art’. It is not, but there are often several different ways to test the same hypothesis, and hence several different experiments that could be done. Consequently, it is difficult to set a guide to designing experiments beyond an awareness of the general principles." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Even an apparently well-designed mensurative or manipulative experiment may still suffer from a lack of realism." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"First, if you already know that the population from which your sample has been taken is normally distributed (perhaps you have data for a variable that has been studied before), you can assume the distribution of sample means from this population will also be normally distributed. Second, the central limit theorem […] states that the distribution of the means of samples of about 25 or more taken from any population will be approximately normal, provided the population is not grossly non-normal (e.g. a population that is bimodal). Therefore, provided your sample size is sufficiently large you can usually do a parametric test. Finally, you can examine your sample. Although there are statistical tests for normality, many statisticians have cautioned that these tests often indicate the sample is significantly non normal even when a t-test will still give reliable results." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Graphs may reveal patterns in data sets that are not obvious from looking at lists or calculating descriptive statistics. Graphs can also provide an easily understood visual summary of a set of results." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Inaccurate and imprecise measurements or a poor or unrealistic sampling design can result in the generation of inappropriate hypotheses. Measurement errors or a poor experimental design can give a false or misleading outcome that may result in the incorrect retention or rejection of an hypothesis." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"It has often been said, ‘There is no such thing as a perfect experiment.’ One inherent problem is that, as a design gets better and better, the cost in time and equipment also increases, but the ability to actually do the experiment decreases. An absolutely perfect design may be impossible to carry out. Therefore, every researcher must choose a design that is ‘good enough’ but still practical. There are no rules for this – the decision on design is in the hands of the researcher, and will be eventually judged by their colleagues who examine any report from the work." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"It is important to realise that Type 1 error can only occur when the null hypothesis applies. There is absolutely no risk if the null hypothesis is false. Unfortunately, you are most unlikely to know if the null hypothesis applies or not - if you did know, you would not be doing an experiment to test it! If the null hypothesis applies, the risk of Type 1 error is the same as the probability level you have chosen" (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Linear correlation analysis assumes that the data are random representatives taken from the larger population of values for each variable, which are normally distributed and have been measured on a ratio, interval or ordinal scale. A scatter plot of these variables will have what is called a bivariate normal distribution. If the data are not normally distributed, or the relationship does not appear to be linear, they may be able to be analysed by nonparametric tests for correlation [...]" (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Many statistics texts do not mention this and students often ask, ‘What if you get a probability of exactly 0.05?’ Here the result would be considered not significant, since significance has been defined as a probability of less than 0.05 (<0.05). Some texts define a significant result as one where the probability is less than or equal to 0.05 ( 0.05). In practice this will make very little difference, but since Fisher proposed the ‘less than 0.05’ definition, which is also used by most scientific publications, it will be used here." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"No hypothesis or theory can ever be proven - one day there may be evidence that rejects it and leads to a different explanation (which can include all the successful predictions of the previous hypothesis).Consequently we can only falsify or disprove hypotheses and theories – we can never ever prove them." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"One of the nastiest pitfalls is appearing to have a replicated manipulative experimental design, which really is not replicated." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"One way of generating hypotheses is to collect data and look for patterns. Often, however, it is difficult to see any pattern from a set of data, which may just be a list of numbers. Graphs and descriptive statistics are very useful for summarising and displaying data in ways that may reveal patterns." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Parametric tests are designed for analyzing data from a known distribution, and the majority assume a normally distributed population. Although parametric tests are quite robust to departures from normality, and major ones can often be reduced by transformation, there are some cases where the population is so grossly non-normal that parametric testing is unwise. In these cases a powerful analysis can often still be done by using a non-parametric test." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Sample statistics like the mean, variance, standard deviation, and especially the standard error of the mean are estimates of population statistics that can be used to predict the range within which 95% of the means of a particular sample size will occur. Knowing this, you can use a parametric test to estimate the probability that a sample mean is the same as an expected value, or the probability that the means of two samples are from the same population." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Statistical tests are just a way of working out the probability of obtaining the observed, or an even more extreme, difference among samples (or between an observed and expected value) if a specific hypothesis (usually the null of no difference) is true. Once the probability is known, the experimenter can make a decision about the difference, using criteria that are uniformly used and understood."  (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The essential features of the ‘hypothetico-deductive’ view of scientific method are that a person observes or samples the natural world and uses all the information available to make an intuitive, logical guess, called an hypothesis, about how the system functions. The person has no way of knowing if their hypothesis is correct - it may or may not apply. Predictions made from the hypothesis are tested, either by further sampling or by doing experiments. If the results are consistent with the predictions then the hypothesis is retained. If they are not, it is rejected, and a new hypothesis formulated." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The unavoidable problem with using probability to help you make a decision is that there is always a chance of making a wrong decision and you have no way of telling when you have done this." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The use of a t-test makes three assumptions. The first is that the data are normally distributed. The second is that each sample has been taken at random from its respective population and the third is that for an independent sample test, the variances are the same. It has, however, been shown that t-tests are actually very ‘robust’ – that is, they will still generate statistics that approximate the t distribution and give realistic probabilities even when the data show considerable departure from normality and when sample variances are dissimilar." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Unfortunately, the only way to estimate the appropriate minimum sample size needed in an experiment is to know, or have good estimates of, the effect size and standard deviation of the population(s). Often the only way to estimate these is to do a pilot experiment with a sample. For most tests there are formulae that use these (sample) statistics to give the appropriate sized sample for a desired power." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"When expected frequencies are small, the calculated chi-square statistic is inaccurate and tends to be too large, therefore indicating a lower than appropriate probability, which increases the risk of Type 1 error. It used to be recommended that no expected frequency in a goodness of fit test should be less than five, but this has been relaxed somewhat in the light of more recent research, and it is now recommended that no more than 20% of expected frequencies should be less than five." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Whenever you make a decision based on the probability of a result, there is a risk of either a Type 1 or a Type 2 error. There is only a risk of Type 1 error when the null hypothesis applies, and the risk is the chosen probability level. There is only a risk of Type 2 error when the null hypothesis is false." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Without compromising the risk of Type 1 error, the only way a researcher can reduce the risk of Type 2 error to an acceptable level and therefore ensure sufficient power is to increase their sample size. Every researcher has to ask themselves the question, ‘What sample size do I need to ensure the risk of Type 2 error is low and therefore power is high?’ This is an important question because samples are usually costly to take, so there is no point in increasing sample size past the point where power reaches an acceptable level." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

Steve McKillup - Collected Quotes

"A correlation between two variables means they vary together. A positive correlation means that high values of one variable are associ...