Showing posts with label DIKW. Show all posts
Showing posts with label DIKW. Show all posts

30 January 2022

On Data (1940-1949)

"Scientific data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action. The step intermediate between the collection of data and the action is prediction." (William E Deming, "On a Classification of the Problems of Statistical Inference", Journal of the American Statistical Association Vol. 37 (218), 1942)

"The faith of scientists in the power of mathematics is so implicit that their work has gradually become less and less observation, and more and more calculation. The promiscuous collection and tabulation of data have given way to a process of assigning possible meanings, merely supposed real entities, to mathematical terms, working out the logical results, and then staging certain crucial experiments to check the hypothesis against the actual, empirical results. But the facts [...] accepted by virtue of these tests are not actually observed at all." (Susanne K Langer, "Philosophy in a New Key", 1942)

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena. In this definition 'natural phenomena' includes all the happenings of the external world, whether human or not " (Sir Maurice G Kendall  & Alan Stuart,, "Advanced Theory of Statistics", Vol. 1, 1943)

"True, the universe is more than a collection of objective experimental data; more than the complexus of theories, abstractions, and special assumptions devised to hold the data together; more, indeed, than any construct modeled on this cold objectivity. For there is a deeper, more subjective world, a world of sensation and emotion, of aesthetic, moral, and religious values as yet beyond the grasp of objective science. And towering majestically over all, inscrutable and inescapable, is the awful mystery of Existence itself, to confound the mind with an eternal enigma." (Banesh Hoffmann, "The Strange Story of the Quantum", 1947)

"There is only one kind of whiskey, but two broad classes of data, good and bad." (William E Deming, "On the Classification of Statistics", The American Statistician Vol. 2 (2), 1948)

22 April 2021

On Data (-1849)

"With a true view all the data harmonize, but with a false one the facts soon clash." (Aristotle, "The Nicomachean Ethics", Book I, cca. 349 BC)

"[...] one day the precision of the data might be brought to such perfection that the mathematician in his study would be able to calculate any phenomenon of chemical combination in the same way…as he calculates the movement of the heavenly bodies." (Antoine-Laurent Lavoisier, "Memories de l’Académie Royale des Sciences", 1782 [Published 1785])

"[...] to reason without data is nothing but delusion." (James Hutton, "The Theory of the Earth" Vol. 1, 1788)

"[...] mathematicians obtain the solution of a problem by the mere arrangement of data, and by reducing their reasoning to such simple steps, to conclusions so very obvious, as never to lose sight of the evidence which guides them." (Antoine-Laurent Lavoisier, "Elements of Chemistry In a New Systematic Order", 1790)

"The modern age has a false sense of superiority because of the great mass of data at its disposal. But the valid criterion of distinction is rather the extent to which man knows how to form and master the material at his command." (Johann Wolfgang von Goethe, "On Theory of Color", 1810)

"We ought then to consider the present state of the universe as the effect of its previous state and as the cause of that which is to follow. An intelligence that, at a given instant, could comprehend all the forces by which nature is animated and the respective situation of the beings that make it up, if moreover it were vast enough to submit these data to analysis, would encompass in the same formula the movements of the greatest bodies of the universe and those of the lightest atoms. For such an intelligence nothing would be uncertain, and the future, like the past, would be open to its eyes." (Pierre-Simon de Laplace, "Essai philosophique sur les probabilités", 1814)

"Knowledge signifies things known. Where there are no things known, there is no knowledge. Where there are no things to be known, there can be no knowledge. We have observed that every science, that is, every branch of knowledge, is compounded of certain facts, of which our sensations furnish the evidence. Where no such evidence is supplied, we are without data; we are without first premises; and when, without these, we attempt to build up a science, we do as those who raise edifices without foundations. And what do such builders construct? Castles in the air." (Frances Wright, "Course of Popular Lectures", 1820)

On Data (Unsourced)

"An extremely odd demand is often set forth but never met, even by those who make it; i.e., that empirical data should be presented without any theoretical context, leaving the reader, the student, to his own devices in judging it. This demand seems odd because it is useless simply to look at something. Every act of looking turns into observation, every act of observation into reflection, every act of reflection into the making of associations; thus it is evident that we theorize every time we look carefully at the world." (Johann Wolfgang von Goethe)

"Errors using inadequate data are much less than those using no data at all." (Charles Babbage)

"If you can’t have an experiment, do the best you can with whatever data you can gather, but do be very skeptical of historical data and subject them to all the logical tests you can think of." (Robert Hooke)

"In general, it is necessary to have some data on which to calculate probabilities.[...] Statisticians do not evolve probabilities out of their inner consciousness, they merely calculate them." (Leonard C Tippett)

"In these days of rapid scientific progress there is a tendency to accept the facts of nature, as at present known, without glancing back at the slow and difficult stages by which the knowledge of these facts has been arrived at. Yet such a retrospect is by no means unprofitable, since it warns us that hasty generalizations upon insufficient data retard rather than advance the progress of knowledge, and that the theories of the day must not be accepted as necessarily expressing absolute truths." (Archibald Garrod) 

"[…] numerous samples collected without a clear idea of what is to be done with the data are commonly less useful than a moderate number of samples collected in accordance with a specific design." (William C Krumbein)

"Progress in science depends not only upon new data but also upon the careful elaboration of new approaches to old data as well as new." (David Rindos)

"Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control." (William E Deming)

On Data (1850-1899)

"In every branch of knowledge the progress is proportional to the amount of facts on which to build, and therefore to the facility of obtaining data." (James C Maxwell, [letter to Lewis Campbell] 1851)

"In the original discovery of a proposition of practical utility, by deduction from general principles and from experimental data, a complex algebraical investigation is often not merely useful, but indispensable; but in expounding such a proposition as a part of practical science, and applying it to practical purposes, simplicity is of the importance: - and […] the more thoroughly a scientific man has studied higher mathematics, the more fully does he become aware of this truth - and […] the better qualified does he become to free the exposition and application of principles from mathematical intricacy." (William J M Rankine, "On the Harmony of Theory and Practice in Mechanics", 1856)

"It usually happens in scientific progress, that when a great fact is at length discovered, it approves itself at once to all competent judges. It furnishes a solution to so many problems, and harmonizes with so many other facts, - that all the other data as it were crystallize at once about it." (Edward Everett, "The Uses of Astronomy", [An Oration Delivered at Albany] 1856)

"The great problems which offer themselves on all hands for solution, problems which the wants of the age force upon us as practically interesting, and with which its intellect feels itself competent to deal, are far more complex in their conditions, and depend on data which to be of use must be accumulated in far greater masses, collected over an infinitely wider field, and worked upon with a greater and more systematized power than has sufficed for the necessities of astronomy. The collecting, arranging, and duly combining these data are operations which, to be carried out to the extent of the requirements of modern science, lie utterly beyond the reach of all private industry, mean, or enterprise. Our demands are not merely for a slight and casual sprinkling to refresh and invigorate an ornamental or luxurious product, but for a copious, steady, and well-directed stream, to call forth from a soil ready to yield it, an ample, healthful, and remunerating harvest." (Sir John F W Herschel, "Essays from the Edinburgh and Quarterly Reviews with Addresses and Other Pieces", 1857)

"Consider an arbitrary figure in general position, indeterminate in the sense that it can be chosen from all such figures without upsetting the laws, conditions, and connections among the different parts of the system; suppose that given these data we have found one or more relations or properties, metric or descriptive, of that figure using the usual obvious inference (i.e., in a way regarded in certain cases as the only rigorous argument). Is it not obvious that if, preserving these very data, one begins to change the initial figure by insensible steps, or applies to some parts of the figure an arbitrary continuous motion, then is it not obvious that the properties and relations established for the initial system remain applicable to subsequent states of this system provided that one is mindful of particular changes, when, say, certain magnitudes vanish, change direction or sign, and so on - changes which one can always anticipate a priori on the basis of reliable rules." (Jean V Poncelet, "Treatise on Projective Properties of Figures", 1865)

"Mathematics may be compared to a mill of exquisite workmanship, which grinds you stuff of any degree of fineness; but, nevertheless, what you get out depends upon what you put in; and as the grandest mill in the world will not extract wheat-flour from peascod, so pages of formulae will not get a definite result out of loose data." (Thomas H Huxley, "Geological Reform", Quarterly Journal of the Geological Society of London Vol. 25, 1869)

"The mathematician starts with a few propositions, the proof of which is so obvious that they are called self-evident, and the rest of his work consists of subtle deductions from them. The teaching of languages, at any rate as ordinarily practised, is of the same general nature: authority and tradition furnish the data, and the mental operations are deductive." (Thomas H Huxley, 1869)

"The ignoring of data is, in fact, the easiest and most popular mode of obtaining unity in one's thought." (William James, "The Sentiment of Rationality", Mind Vol. 4, 1879)

"It is a capital mistake to theorise before one has data." (Arthur C Doyle, "The Adventures of Sherlock Holmes", 1892)

"All deduction rests ultimately upon the data derived from experience. This is the tortoise that supports our conception of the cosmos." (Percival Lowell, "Mars", 1895)

"Physical research by experimental methods is both a broadening and a narrowing field. There are many gaps yet to be filled, data to be accumulated, measurements to be made with great precision, but the limits within which we must work are becoming, at the same time, more and more defined." (Elihu Thomson, "Annual Report of the Board of Regents of the Smithsonian Institution", 1899)

On Data (1910-1919)

"The first step in beginning the scientific study of a problem is to collect the data, which are or ought to be 'facts'." (John A Thomson, "Introduction to Science", 1911)

"[...] it is a function of statistical method to emphasize that precise conclusions cannot be drawn from inadequate data." (Egon S Pearson & H Q Hartley, "Biometrika Tables for Statisticians" Vol. 1, 1914)

"This diagrammatic method has, however, serious inconveniences as a method for solving logical problems. It does not show how the data are exhibited by cancelling certain constituents, nor does it show how to combine the remaining constituents so as to obtain the consequences sought. In short, it serves only to exhibit one single step in the argument, namely the equation of the problem; it dispenses neither with the previous steps, i.e., 'throwing of the problem into an equation' and the transformation of the premises, nor with the subsequent steps, i.e., the combinations that lead to the various consequences. Hence it is of very little use, inasmuch as the constituents can be represented by algebraic symbols quite as well as by plane regions, and are much easier to deal with in this form." (Louis Couturat, "The Algebra of Logic", 1914)

"As soon as science has emerged from its initial stages, theoretical advances are no longer achieved merely by a process of arrangement. Guided by empirical data, the investigator rather develops a system of thought which, in general, is built up logically from a small number of fundamental assumptions, the so-called axioms. We call such a system of thought a theory. The theory finds the justification for its existence in the fact that it correlates a large number of single observations, and it is just here that the 'truth' of the theory lies." (Albert Einstein: "Relativity: The Special and General Theory", 1916)

"The man of science, by virtue of his training, is alone capable of realising the difficulties - often enormous - of obtaining accurate data upon which just judgment may be based." (Sir Richard Gregory, "Discovery; or, The Spirit and Service of Science", 1916)

"Statistical tables are essentially specific in their meaning, and they require data that are uniformly specific in the same kind and degree." (W. B Bailey & John Cummings, "Statistics", 1917)

"There is, then, in this analysis of variance no indication of any other than innate and heritable factors at work." (Sir Ronald A Fisher, "The Causes of Human Variability", Eugenics Review Vol. 10, 1918)

"Philosophy, like science, consists of theories or insights arrived at as a result of systemic reflection or reasoning in regard to the data of experience. It involves, therefore, the analysis of experience and the synthesis of the results of analysis into a comprehensive or unitary conception. Philosophy seeks a totality and harmony of reasoned insight into the nature and meaning of all the principal aspects of reality." (Joseph A Leighton, "The Field of Philosophy: An outline of lectures on introduction to philosophy," 1919)

On Data (1920-1929)

"Science is frankly empirical in method and aim; it seeks to discover the laws of concrete being and becoming, and to formulate these in the simplest terms, which are either immediate data of experience or verifiably derived therefrom." (J Arthur Thomson, "The System of Animate Nature" Vol. 1, 1920)

"We are, therefore, in danger of being overwhelmed by our data and of being unable to deal with the simpler problems first and understand their connection. The continual heaping up of data is worse than useless if interpretation does not keep pace with it." (Joseph H Woodger, "Biological Principles: A Critical Study", 1920)

"A 'poor evaluation' of the probability of anything may reflect ignorance of relevant data which 'ought' to be known [...]" (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1924)

"There is no such thing as the probability of four aces in one hand, or the probability of anything else. Given all the relevant data which there are to be known, everything is either certainly true or certainly false." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1924)

"No human mind is capable of grasping in its entirety the meaning of any considerable quantity of numerical data." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"[...] no one knows better than the engineer the need of discrimination between the sure ground of known data and formal logic, on the one hand - as exemplified, say, by mathematical operations - and acts of judgment on the other; and no one has learned through wider experience than the engineer the need of applying his conclusions in the light of that component part which, of necessity, has been dependent on estimate and judgment." (William F Durand, Transactions of The American Society of Mechanical Engineers Vol.47, [address] 1925)

"Statistics may be regarded as (i) the study of populations, (ii) as the study of variation, and (iii) as the study of methods of the reduction of data." (Sir Ronald A Fisher, "Statistical Methods for Research Worker", 1925)

"The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"There is no more pressing need in connection with the examination of experimental results than to test whether a given body of data is or is not in agreement with any suggested hypothesis." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The statistician’s job is to draw general conclusions from fragmentary data. Too often the data supplied to him for analysis are not only fragmentary but positively incoherent, so that he can do next to nothing with them. Even the most kindly statistician swears heartily under his breath whenever this happens". (Michael J Moroney, "Facts from Figures", 1927)

"When a man of science speaks of his 'data', he knows very well in practice what he means. Certain experiments have been conducted, and have yielded certain observed results, which have been recorded. But when we try to define a 'datum' theoretically, the task is not altogether easy. A datum, obviously, must be a fact known by perception. But it is very difficult to arrive at a fact in which there is no element of inference, and yet it would seem improper to call something a 'datum' if it involved inferences as well as observation. This constitutes a problem [...] (Bertrand Russell, "The Analysis of Matter", 1927)

"In the attempt to achieve a conceptual formulation of the confusingly immense body of observational data, the scientist makes use of a whole arsenal of concepts which he imbibed practically with his mother’s milk; and seldom if ever is he aware of the eternally problematic character of his concepts." (Albert Einstein, "Concepts of Space: The History of Theories of Space in Physics, 1928)

On Data (1930-1939)

"Science works by the slow method of the classification of data, arranging the detail patiently in a periodic system into groups of facts, in series like the strata of the rocks. For each series there must be a vocabulary of special words which do not always make good sense when used in another series. But the laws of periodicity seem to hold throughout, among the elements and in every sphere of thought, and we must learn to co-ordinate the whole through our new conception of the reign of relativity." (William H Pallister, "Poems of Science", 1931)

"However, perhaps the main point is that you are under no obligation to analyse variance into its parts if it does not come apart easily, and its unwillingness to do so naturally indicates that one’s line of approach is not very fruitful." (Sir Ronald A Fisher, [Letter to Lancelot Hogben] 1933)

"The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic." (Sir Ronald A Fisher, Journal of the Royal Statistical Society Vol. 1, 1934)

"Mathematics alone make us feel the limits of our intelligence. For we can always suppose in the case of an experiment that it is inexplicable because we don’t happen to have all the data. In mathematics we have all the data [...] and yet we don’t understand. We always come back to the contemplation of our human wretchedness. What force is in relation to our will, the impenetrable opacity of mathematics is in relation to our intelligence." Simone Weil, "The Notebooks of Simone Weil" Vol. 2, 1935)

"[...] there is more difficulty in stating our principle so as to be applicable when our data are confined to a finite part of the universe. Things from outside may always crash in and have unexpected effects." (Bertrand Russell, "Religion and Science", 1935)

"[...] scientists are not a select few intelligent enough to think in terms of 'broad sweeping theoretical laws and principles'. Instead, scientists are people specifically trained to build models that incorporate theoretical assumptions and empirical evidence. Working with models is essential to the performance of their daily work; it allows them to construct arguments and to collect data." (Peter Imhof, Science Vol. 287, 1935–1936)

"Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained from observation or experiment. The subject has a coherent structure based on the theory of Probability and includes many different procedures which contribute to research and development throughout the whole of Science and Technology." (Egon Pearson, 1936)

"Science will stagnate only when all will agree that only one interpretation can be drawn from a given series of data." (Ross A Gortner, "Selected Topics in Colloid Chemistry with Especial Reference to Biochemical Problems", 1937)

"The mathematical machine works with unerring precision; but what we get out of it is nothing more than a rearrangement of what we put into it. In the last analysis observation - the actual contact with real events - is the only reliable way of securing the data of natural history." (William R Thompson, "Science and Common Sense", 1937)

"Because they are determined mathematically instead of according to their position in the data, the arithmetic and geometric averages are not ascertained by graphic methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1938)

"The laws of science are the permanent contributions to knowledge - the individual pieces that are fitted together in an attempt to form a picture of the physical universe in action. As the pieces fall into place, we often catch glimpses of emerging patterns, called theories; they set us searching for the missing pieces that will fill in the gaps and complete the patterns. These theories, these provisional interpretations of the data in hand, are mere working hypotheses, and they are treated with scant respect until they can be tested by new pieces of the puzzle." (Edwin P Whipple, "Experiment and Experience", [Commencement Address, California Institute of Technology] 1938)

"An inference, if it is to have scientific value, must constitute a prediction concerning future data. If the inference is to be made purely with the help of the distribution theory of statistics, the experiments that constitute evidence for the inference must arise from a state of statistical control; until that state is reached, there is no universe, normal or otherwise, and the statistician’s calculations by themselves are an illusion if not a delusion. The fact is that when distribution theory is not applicable for lack of control, any inference, statistical or otherwise, is little better than a conjecture. The state of statistical control is therefore the goal of all experimentation." (William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Science [...] involves active, purposeful search; it discovers, accumulates, sifts, orders, and tests data; it is a slow, painstaking, laborious activity; it is a search after bodies of knowledge sufficiently comprehensive to lead to the discovery of uniformities, sequential orders or so-called 'laws'; it may be carried on by an individual, but it gains relevance only as it produces data which can be added to and tested by the findings of others." (Constantine Panunzio, "Major Social Institutions", 1939)

On Data (1950-1959)

"Anyone can easily misuse good data." (William E Deming, "Some Theory of Sampling", 1950)

"Data have an ephemeralness, a rhapsodic spontaneity, a nakedness so utterly at variance with the orderly instincts that pervade our being and with the given unity of our own experience as to be unfit for use in the building of reality. The constructs, on the other hand, are foot-loose, subjective, and altogether too fertile with logical implication  to serve in their indiscriminate totality as material for the real world. They do, however, contain the solid logical substance which a stable reality must contain." (Henry Margenau ,"The Nature of Physical Reality: A Philosophy of Modern Physics", 1950)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1951)

"The technical analysis of any large collection of data is a task for a highly trained and expensive man who knows the mathematical theory of statistics inside and out. Otherwise the outcome is likely to be a collection of drawings - quartered pies, cute little battleships, and tapering rows of sturdy soldiers in diversified uniforms - interesting enough in the colored Sunday supplement, but hardly the sort of thing from which to draw reliable inferences." (Eric T Bell, "Mathematics: Queen and Servant of Science", 1951)

"Mathematics, springing from the soil of basic human experience with numbers and data and space and motion, builds up a far-flung architectural structure composed of theorems which reveal insights into the reasons behind appearances and of concepts which relate totally disparate concrete ideas." (Saunders MacLane, "Of Course and Courses", The American Mathematical Monthly, Vol. 61 (3), 1954)

"When you learn how to mobilize your data and bring them to bear on your problems, you are no longer a rank amateur." (Edward Hodnett, "The Art of Problem Solving", 1955)

"We should admit in theory what is already very largely a case in practice, that the main currency of scientific information is the secondary sources in the form of abstracts, reports, tables, etc., and that the primary sources are only for detailed reference by very few people. It is possible that the fate of most scientific papers will be not to be read by anyone who uses them, but with luck they will furnish an item, a number, some facts or data to such reports which may, but usually will not, lead to the original paper being consulted. This is very sad but it is the inevitable consequence of the growth of science." (John D Bernal, "The Supply of Information to the Scientist: Some Problems of the Present Day", Journal of Documentation Vol. 13, 1957)

"Physicists do not start from hypotheses; they start from data. By the time a law has been fixed into an H-D [hypothetico-deductive] system, really original physical thinking is over." (Norwood R Hanson, "Patterns of Discovery", 1958)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

On Data (1960-1969)

"When evaluating the reliability and generality of data, it is often important to know the aims of the experimenter. When evaluating the importance of experimental results, however, science has a trick of disregarding the experimenter's rationale and finding a more appropriate context for the data than the one he proposed." (Murray Sidman, "Tactics of Scientific Research", 1960)

"If data analysis is to be well done, much of it must be a matter of judgment, and 'theory' whether statistical or non-statistical, will have to guide, not command." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"Philosophers of science have repeatedly demonstrated that more than one theoretical construction can always be placed upon a given collection of data." (Thomas Kuhn, "The Structure of Scientific Revolutions", 1962)

"Teaching data analysis is not easy, and the time allowed is always far from sufficient." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"The physical sciences are used to 'praying over' their data, examining the same data from a variety of points of view. This process has been very rewarding, and has led to many extremely valuable insights. Without this sort of flexibility, progress in physical science would have been much slower. Flexibility in analysis is often to be had honestly at the price of a willingness not to demand that what has already been observed shall establish, or prove, what analysis suggests. In physical science generally, the results of praying over the data are thought of as something to be put to further test in another experiment, as indications rather than conclusions." (John W Tukey, "The Future of Data Analysis", The Annals of Mathematical Statistics, Vol. 33 (1), 1962)

"We must include in any language with which we hope to describe complex data-processing situations the capability for describing data." (Grace Hopper, "Management and the Computer of the Future", 1962)

"A theory with mathematical beauty is more likely to be correct than an ugly one that fits some experimental data. " (Paul A M Dirac, Scientific American, 1963)

"Science begins with the world we have to live in, accepting its data and trying to explain its laws. From there, it moves toward the imagination: it becomes a mental construct, a model of a possible way of interpreting experience." (Northrop Frye, "The Educated Imagination", 1963)

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena." (Sir Maurice G Kendall & Alan Stuart, "The Advanced Theory of Statistics", 1963)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation.[...] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"Data are of high quality if they are fit for their intended use in operations, decision-making, and planning." (Joseph M Juran, 1964)

"It has been said that data collection is like garbage collection: before you collect it you should have in mind what you are going to do with it." (Russell Fox et al, "The Science of Science", 1964)

"The trouble with group theory is that it leaves so much unexplained that one would like to explain. It isolates in a beautiful way those aspects of nature that can be understood in terms of abstract symmetry alone. It does not offer much hope of explaining the messier facts of life, the numerical values of particle lifetimes and interaction strengths - the great bulk of quantitative experimental data that is now waiting for explanation. The process of abstraction seems to have been too drastic, so that many essential and concrete features of the real world have been left out of consideration. Altogether group theory succeeds just because its aims are modest. It does not try to explain everything, and it does not seem likely that it will grow into a complete or comprehensive theory of the physical world." (Freeman J Dyson, "Mathematics in the Physical Sciences", Scientific American Vol. 211 (3), 1964)

"Weighing a sample appropriately is no more fudging the data than is correcting a gas volume for barometric pressure." (Frederick Mosteller, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1964)

"Without the hard little bits of marble which are called 'facts' or 'data' one cannot compose a mosaic; what matters, however, are not so much the individual bits, but the successive patterns into which you arrange them, then break them up and rearrange them." (Arthur Koestler, "The Act of Creation", 1964)

"One of the chief motivations behind the attempt to defend a distinction between theoretical and observational terms has been the desire to explain how a theory can be tested against the data of experience, and how one theory can be said to “account for the facts” better than another; that is, to give a precise characterization of the idea, almost universally accepted in modern times, that the sciences are 'based on experience', that they are 'empirical'." (Dudley Shapere, "Philosophical Problems of Natural Science", 1965)

"Throughout science there is a constant alternation between periods when a particular subject is in a state of order, with all known data falling neatly into their places, and a state of puzzlement and confusion, when new observations throw all neatly arranged ideas into disarray." (Sir Hermann Bondi, "Astronomy and the Physical Sciences", 1966)

"Modern science is characterized by its ever-increasing specialization, necessitated by the enormous amount of data, the complexity of techniques and of theoretical structures within every field. Thus science is split into innumerable disciplines continually generating new subdisciplines. In consequence, the physicist, the biologist, the psychologist and the social scientist are, so to speak, encapusulated in their private universes, and it is difficult to get word from one cocoon to the other." (Ludwig von Bertalanffy, "General System Theory", 1968)

On Data (1970-1979)

"At root what is needed for scientific inquiry is just receptivity to data, skill in reasoning, and yearning for truth. Admittedly, ingenuity can help too." (Willard v O Quine, "The Web of Belief", 1970)

"Statistical methods of analysis are intended to aid the interpretation of data that are subject to appreciable haphazard variability." (David V. Hinkley & David Cox, "Theoretical Statistics", 1974)

"A theory is worthless without good supporting data." (Alexis L Romanoff, "Encyclopedia of Thoughts", 1975)

"Data are often presented in a form that is not immediately clear. The reader can then either ignore the data, analyze them himself, or return them to the author for him to analyze." (Andrew S C Ehrenberg, "Data Reduction", 1975)

"For the theory-practice iteration to work, the scientist must be, as it were, mentally ambidextrous; fascinated equally on the one hand by possible meanings, theories, and tentative models to be induced from data and the practical reality of the real world, and on the other with the factual implications deducible from tentative theories, models and hypotheses." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"If enough data is collected, anything may be proved by statistical methods." (Arthur Bloch, "Murphy’s Law", 1977)

"In a way, science might be described as paranoid thinking applied to Nature: we are looking for natural conspiracies, for connections among apparently disparate data." (Carl Sagan, "The Dragons of Eden", 1977)

"Most scientific theories, however, are ephemeral. Exceptions will likely be found that invalidate a theory in one or more of its tenets. These can then stimulate a new round of research leading either to a more comprehensive theory or perhaps to a more restrictive (i.e., more precisely defined) theory. Nothing is ever completely finished in science; the search for better theories is endless. The interpretation of a scientific experiment should not be extended beyond the limits of the available data. In the building of theories, however, scientists propose general principles by extrapolation beyond available data. When former theories have been shown to be inadequate, scientists should be prepared to relinquish the old and embrace the new in their never-ending search for better solutions. It is unscientific, therefore, to claim to have 'proof of the truth' when all that scientific methodology can provide is evidence in support of a theory." (William D Stansfield, "The Science of Evolution", 1977)

"The interpretation of a scientific experiment should not be extended beyond the limits of the available data. In the building of theories, however, scientists propose general principles by extrapolation beyond available data. When former theories have been shown to be inadequate, scientists
should be prepared to relinquish the old and embrace the new in their never-ending search for better solutions. It is unscientific, therefore, to claim to have 'proof of the truth' when all that scientific methodology can provide is evidence in support of a theory." (William D Stansfield, "The Science of Evolution", 1977)

"Data, seeming facts, apparent asso­ciations-these are not certain knowledge of something. They may be puzzles that can one day be explained; they may be trivia that need not be explained at all. (Kenneth Waltz, "Theory of International Politics", 1979)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

On Data (1980-1989)

"Facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts. Facts do not go away while scientists debate rival theories for explaining them." (Stephen J Gould "Evolution as Fact and Theory", 1981)

"In natural science we are concerned ultimately, not with convenient arrangements of observational data which can be generalized into universal explanatory form, but with movements of thought, at once theoretical and empirical, which penetrate into the intrinsic structure of the universe in such a way that there becomes disclosed to us its basic design and we fi nd ourselves at grips with reality.[...] We cannot pursue natural science scientifically without engaging at the same time in meta-scientific operations." (Thomas F Torrance, "Divine and Contingent Order", 1981)

"People often feel inept when faced with numerical data. Many of us think that we lack numeracy, the ability to cope with numbers. […] The fault is not in ourselves, but in our data. Most data are badly presented and so the cure lies with the producers of the data. To draw an analogy with literacy, we do not need to learn to read better, but writers need to be taught to write better." (Andrew Ehrenberg, "The problem of numeracy", American Statistician 35(2), 1981)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"There is a tendency to mistake data for wisdom, just as there has always been a tendency to confuse logic with values, intelligence with insight. Unobstructed access to facts can produce unlimited good only if it is matched by the desire and ability to find out what they mean and where they lead." (Norman Cousins, "Human Options : An Autobiographical Notebook", 1981)

"A scientist should not cheat or falsify data or quote out of context or do any other thing that is intellectually dishonest. Of course, as always, some individuals fail; but science as a whole disapproves of such action. Indeed, when transgressors are detected, they are usually expelled from the community." (Michael Ruse, "Response to the Commentary: Pro Judice", Science, Technology and Human Values Vol. 7 (41), 1982)

"In all human activities, it is not ideas of machines that dominate; it is people. I have heard people speak of 'the effect of personality on science'. But this is a backward thought. Rather, we should talk about the effect of science on personalities. Science is not the dispassionate analysis of impartial data. It is the human, and thus passionate, exercise of skill and sense on such data. (Philip Hilts, "Scientific Temperaments: Three Lives in Contemporary Science", 1982)

"Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]" (George Greenstein, "Frozen Star", 1983)

"Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should show the data, induce the viewer to think about the substance rather that about the methodology, graphic design, the technology of graphic production, or something else, avoid distorting what the data have to say, present many numbers in a small space make large data sets coherent, encourage the eye to compare different pieces of data, reveal the data at several levels of detail, from a broad overview to the fine structure, serve a reasonable clear purpose: description, exploration, tabulation, or decoration [should] be closely integrated with the statistical and verbal descriptions of a data set." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"In all scientific fields, theory is frequently more important than experimental data. Scientists are generally reluctant to accept the existence of a phenomenon when they do not know how to explain it. On the other hand, they will often accept a theory that is especially plausible before there exists any data to support it." (Richard Morris, 1983)

"Inept graphics also flourish because many graphic artists believe that statistics are boring and tedious. It then follows that decorated graphics must pep up, animate, and all too often exaggerate what evidence there is in the data. […] If the statistics are boring, then you've got the wrong numbers." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

“The purpose of models is not to fit the data but to sharpen the questions.” (Samuel Karlin, 1983)

“There are those who try to generalize, synthesize, and build models, and there are those who believe nothing and constantly call for more data. The tension between these two groups is a healthy one; science develops mainly because of the model builders, yet they need the second group to keep them honest.” (Andrew Miall, “Principles of Sedimentary Basin Analysis”, 1984)

"Data is raw. It simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself. In computer parlance, a spreadsheet generally starts out by holding data." (Russell L Ackoff, "Towards a Systems Theory of Organization, 1985)

"Information is data that has been given meaning by way of relational connection. This 'meaning' can be useful, but does not have to be. In computer parlance, a relational database makes information from the data stored within it." (Russell L Ackoff, "Towards a Systems Theory of Organization", 1985)

"Intuition becomes increasingly valuable in the new information society precisely because there is so much data." (John Naisbitt, "Re-Inventing the Corporation", 1985)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Thus statistics should generally be taught more as a practical subject with analyses of real data. Of course some theory and an appropriate range of statistical tools need to be learnt, but students should be taught that Statistics is much more than a collection of standard prescriptions." (Christopher Chatfield, "The Initial Examination of Data", Journal of the Royal Statistical Society A Vol. 148, 1985)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (John Tukey, "Sunset Salvo", The American Statistician Vol. 40 (1), 1986)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M Stigler, "Neutral Models in Biology", 1987)

"[…] no good model ever accounted for all the facts, since some data was bound to be misleading if not plain wrong. A theory that did fit all the data would have been ‘carpentered’ to do this and would thus be open to suspicion." (Francis H C Crick, "What Mad Pursuit: A Personal View of Scientific Discovery", 1988)

"Physicists are all too apt to look for the wrong sorts of generalizations, to concoct theoretical models that are too neat, too powerful, and too clean. Not surprisingly, these seldom fi t well with data. To produce a really good biological theory, one must try to see through the clutter produced by evolution to the basic mechanisms. What seems to physicists to be a hopelessly complicated process may have been what nature found simplest, because nature could build on what was already there." (Francis H C Crick, "What Mad Pursuit?: A Personal View of Scientific Discovery", 1988)

"[...] to acknowledge the subjectivity inherent in the interpretation of data is to recognize the central role of statistical analysis as a formal mechanism by which new evidence can be integrated with existing knowledge. Such a view of statistics as a dynamic discipline is far from the common perception of a rather dry, automatic technology for processing data." (Donald A Berry, "Statistical Analysis and the Illusion of Objectivity", American Scientist Vol. 76, 1988)

"Randomness is a difficult notion for people to accept. When events come in clusters and streaks, people look for explanations and patterns. They refuse to believe that such patterns - which frequently occur in random data - could equally well be derived from tossing a coin. So it is in the stock market as well." (Burton G Malkiel, "A Random Walk Down Wall Street", 1989)

"Some methods, such as those governing the design of experiments or the statistical treatment of data, can be written down and studied. But many methods are learned only through personal experience and interactions with other scientists. Some are even harder to describe or teach. Many of the intangible influences on scientific discovery - curiosity, intuition, creativity - largely defy rational analysis, yet they are often the tools that scientists bring to their work." (Committee on the Conduct of Science, "On Being a Scientist", 1989)

"When evaluating a model, at least two broad standards are relevant. One is whether the model is consistent with the data. The other is whether the model is consistent with the ‘real world’." (Kenneth A Bollen, "Structural Equations with Latent Variables", 1989)

On Data (1990-1999)

"The apparent simplicity of a model is due to a failure of imagination and limited data, unless the domain really is simple. If the world were really random, chemistry, cooking, and credit would not be possible, so our models cannot be figments of our imagination." (Peter Cheeseman, "On Finding the Most Probable Model", 1990)

"The use of Occam’s razor, along with the related critical, skeptical view toward any speculations about the unknown, is perhaps the most misunderstood aspect of the scientific method. People confuse doubt with denial. Science doesn’t deny anything, but it doubts everything not required by the data. Note, however, that doubt does not necessarily mean rejection, just an attitude of disbelief that can be changed when the facts require it." (Victor J Stenger, "Physics and Psychics: The Search for a World Beyond the Senses", 1990)

"We live in an era when it seems legitimate to try everything conceivable within the known laws of physics, particularly in the absence of data." (Geoffrey Burbridge, "Focal Point", Sky and Telescope Vol. 78 (6), 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and  ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information." (Edward R Tufte, "Envisioning Information", 1990)

"Data without generalization is just gossip." (Robert M Pirsig, "Lila: An Inquiry into Morals", 1991)

"Much of the technical literature is difficult to read, even for scientists and engineers. Even the best books tend to dwell on the mathematical models and don’t give the slightest hint what to do if one is lucky enough to have some data." (Foster Morrison, "The Art of Modeling Dynamic Systems: Forecasting for Chaos, Randomness & Determinism", 1991)

"Statistics is a very powerful and persuasive mathematical tool. People put a lot of faith in printed numbers. It seems when a situation is described by assigning it a numerical value, the validity of the report increases in the mind of the viewer. It is the statistician's obligation to be aware that data in the eyes of the uninformed or poor data in the eyes of the naive viewer can be as deceptive as any falsehoods." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"A fundamental difference between religious and scientific thought is that the received beliefs in religion are ultimately based on revelations or pronouncements, usually by some long dead prophet or priest.[...] Dogma is interpreted by a caste of priests and is accepted by the multitude on faith or under duress. In contrast, the statements of science are derived from the data of observations and experiment, and from the manipulation of these data according to logical and often mathematical procedures." (John A Moore, "Science as a Way of Knowing: The Foundations of Modern Biology", 1993)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"Science demands a tolerance for ambiguity. Where we are ignorant, we withhold belief. Whatever annoyance the uncertainty engenders serves a higher purpose: It drives us to accumulate better data. This attitude is the difference between science and so much else. Science offers little in the way of cheap thrills. The standards of evidence are strict. But when followed they allow us to see far, illuminating even a great darkness." (Carl Sagan, "Pale Blue Dot: A Vision of the Human Future in Space", 1994)

"Science is not impressed with a conglomeration of data. It likes carefully constructed analysis of each problem." (Daniel E Koshland Jr, Science, Volume 263 (5144), [editorial] 1994)

"We do not realize how deeply our starting assumptions affect the way we go about looking for and interpreting the data we collect." (Roger A Lewin, "Kanzi: The Ape at the Brink of the Human Mind", 1994)

"When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method for gathering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1994)

"Intuition is the art, peculiar to the human mind, of working out the correct answer from data that is, in itself, incomplete or even, perhaps, misleading." (Isaac Asimov, "Forward the Foundation", 1993)

"Having a scientific outlook means being willing to divest yourself of a pet hypothesis, whether it relates to easy self-help improvements, homeopathy, graphology, spontaneous generation, or any other concept, when the data produced by a carefully designed experiment contradict that hypothesis. Retaining a belief in a hypothesis that cannot be supported by data is the hallmark of both the pseudoscientist and the fanatic. Often the more deeply held the hypothesis, the more reactionary is the response to nonsupportive data." (Michael Zimmerman, "Science, Nonscience, and Nonsense: Approaching Environmental Literacy", 1995)

"Now that knowledge is taking the place of capital as the driving force in organizations worldwide, it is all too easy to confuse data with knowledge and information technology with information." (Peter Drucker, "Managing in a Time of Great Change", 1995)

"Some people derive satisfaction from accumulating data, whereas others are content to dream and leave experiments to colleagues. Still others flit from flower to flower rather than learning more and more about one situation. The difference in approach is a matter of temperament, and we all must understand our own strengths. All workers ultimately contribute to the matrix of facts, ideas, understandings, techniques, and visions that we know as science." (Arthur J Birch, "To See the Obvious", 1995)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Education is not the piling on of learning, information, data, facts, skills, or abilities - that's training or instruction - but is rather making visible what is hidden as a seed." (Thomas W Moore, "The Education of the Heart", 1996)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996) 

"Paradigms are the most general-rather like a philosophical or ideological framework. Theories are more specific, based on the paradigm and designed to describe what happens in one of the many realms of events encompassed by the paradigm. Models are even more specific providing the mechanisms by which events occur in a particular part of the theory's realm. Of all three, models are most affected by empirical data - models come and go, theories only give way when evidence is overwhelmingly against them and paradigms stay put until a radically better idea comes along." (Lee R Beach, "The Psychology of Decision Making: People in Organizations", 1997)

"Data is discrimination between physical states of things (black, white, etc.) that may convey or not convey information to an agent. Whether it does so or not depends on the agent's prior stock of knowledge." (Max Boisot, "Knowledge Assets", 1998)

"The unit of coding is the most basic segment, or element, of the raw data or information that can be assessed in a meaningful way regarding the phenomenon." (Richard Boyatzis, "Transforming qualitative information", 1998)

"While hard data may inform the intellect, it is largely soft data that generates wisdom." (Henry Mintzberg, "Strategy Safari: A Guided Tour Through The Wilds of Strategic Management", 1998)

On Data (2000-2009)

"Building statistical models is just like this. You take a real situation with real data, messy as this is, and build a model that works to explain the behavior of real data." (Martha Stocking, New York Times, 2000)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No matter what the data, and no matter how the values are arranged and presented, you must always use some method of analysis to come up with an interpretation of the data.
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"While all data contain noise, some data contain signals. Before you can detect a signal, you must filter out the noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] you simply cannot make sense of any number without a contextual basis. Yet the traditional attempts to provide this contextual basis are often flawed in their execution. [...] Data have no meaning apart from their context. Data presented without a context are effectively rendered meaningless." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical modeling: The two cultures" Statistical Science 16(3), 2001)

"The more data we have, the more likely we are to drown in it." (Nassim N Taleb, "Fooled by Randomness", 2001)

"Data is a fact of life. As time goes by, we collect more and more data, making our original reason for collecting the data harder to accomplish. We don't collect data just to waste time or keep busy; we collect data so that we can gain knowledge, which can be used to improve the efficiency of our organization, improve profit margins, and on and on. The problem is that as we collect more data, it becomes harder for us to use the data to derive this knowledge. We are being suffocated by this raw data, yet we need to find a way to use it." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Good communication is not just data transfer. You need to show people something that addresses their anxieties, that accepts their anger, that is credible in a very gut-level sense, and that evokes faith in the vision." (John Kotter, "The Heart of Change: Real-Life Stories of How People Change Their Organizations", 2002)

"Data is transformed into graphics to understand. A map, a diagram are documents to be interrogated. But understanding means integrating all of the data. In order to do this it’s necessary to reduce it to a small number of elementary data. This is the objective of the 'data treatment' be it graphic or mathematic." (Jacques Bertin, [interview] 2003)

"The use of computers shouldn't ignore the objectives of graphics, that are: 
 1) Treating data to get information. 
 2) Communicating, when necessary, the information obtained." (Jacques Bertin, [interview] 2003)

"Thought, without the data on which to structure that thought, leads nowhere." (Victor J Stenger, "Has Science Found God?: The Latest Results in the Search for Purpose in the Universe", 2003)

"[...] because observations are all we have, we take them seriously. We choose hard data and the framework of mathematics as our guides, not unrestrained imagination or unrelenting skepticism, and seek the simplest yet most wide-reaching theories capable of explaining and predicting the outcome of today’s and future experiments." (Brian Greene, "The Fabric of the Cosmos", 2004)

"The only way to look into the future is use theories since conclusive data is only available about the past." (Clayton Christensen et al., "Seeing What’s Next: Using the Theories of Innovation to Predict Industry Change", 2004)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"The best scientists aren't the ones who know the most data; they're the ones who know what they're looking for." (Noam Chomsky, [Guardian] 2005)

"With our heads spinning in the world of coincidence and chaos, we nevertheless must make decisions and take steps into the minefield of our future. To avoid explosive missteps, we rely on data and statistical reasoning to inform our thinking." (Michael Starbird, "Coincidences, Chaos, and All That Math Jazz", 2005)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

On Data (2010-2019)

"Finding patterns is easy in any kind of data-rich environment; that's what mediocre gamblers do. The key is in determining whether the patterns represent signal or noise." (Nate Silver, "The Signal and the Noise: Why So Many Predictions Fail-but Some Don't", 2012)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models.  [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The inherent nature of complexity is to doubt certainty and any pretense to finite and flawless data. Put another way, under uncertainty principles, any attempt by political systems to 'impose order' has an equal chance to instead 'impose disorder'." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"The value of having numbers - data - is that they aren't subject to someone else's interpretation. They are just the numbers. You can decide what they mean for you." (Emily Oster, "Expecting Better", 2013)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Without context, data is useless, and any visualization you create with it will also be useless. Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Any knowledge incapable of being revised with advances in data and human thinking does not deserve the name of knowledge." (Jerry Coyne, “Faith Versus Fact”, 2015)

"As business leaders we need to understand that lack of data is not the issue. Most businesses have more than enough data to use constructively; we just don't know how to use it. The reality is that most businesses are already data rich, but insight poor." (Bernard Marr, Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance, 2015)

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context."  (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"To make progress, every field of science needs to have data commensurate with the complexity of the phenomena it studies. [...] With big data and machine learning, you can understand much more complex phenomena than before. In most fields, scientists have traditionally used only very limited kinds of models, like linear regression, where the curve you fit to the data is always a straight line. Unfortunately, most phenomena in the world are nonlinear. [...] Machine learning opens up a vast new world of nonlinear models." (Pedro Domingos, "The Master Algorithm", 2015)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement" (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Any fool can fit a statistical model, given the data and some software. The real challenge is to decide whether it actually fits the data adequately. It might be the best that can be obtained, but still not good enough to use." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

Related Posts Plugin for WordPress, Blogger...

On Leonhard Euler

"I have been able to solve a few problems of mathematical physics on which the greatest mathematicians since Euler have struggled in va...