Universes or samples? Why do a census and not a survey?
If you want to be sure of something you have to ask the whole, because every sample is subject to error. But this, sometimes, is impossible or very expensive, and at other times, absurd. Can you imagine verifying that all the bullets in a game are in perfect condition to send to the battlefield? But it is also possible to raise the opposite question: if it is so expensive, why are censuses carried out? Why aren’t all questions answered by taking samples?
I talked about this with the Italian Ugo Broggi (1880-1965), who arrived in Buenos Aires in 1910. From 1912 he taught higher mathematics at the National University of La Plata and became, according to Elías A. De Césare, the first to dictate that discipline with a modern sense. Carlos Eugenio Dieulefait considered him the introducer of mathematical statistics in Argentina.
–Manuel Fernández López points out that you also made contributions to the economic analysis. What did they consist of?
–In 1918, as part of the University Reform, I taught a free course with Luis Roque Gondra, introducing students to pure or mathematical economics, familiarizing them with the works of Enrico Barone, Antonio Osorio, Maffeo Pantaleoni, Vilfredo Pareto and Marie Esprit Leon Wallras,
–He returned to deal with Pareto when he died, in 1923.
-That’s how it is. As a result of his death, the Faculty of Economic Sciences of the UBA organized an act, in which I spoke for the professors and Raúl Federico Prebisch spoke for the students. I criticized the Walrasian approach of simply counting the number of equations and unknowns to prove the existence of competitive general equilibrium, raising for the first time the need to go further in the analysis. Gondra publicly disavowed me, with an argument of authority: if Pareto said so, it must be fine. But Pareto was a railway engineer, not a doctor of mathematics. Years later, the second generation of competitive general equilibrium experts proved me right.
–Why doesn’t any sample, except by carambola, accurately reflect the reality of a universe?
–So it is called sampling error. Let’s put aside the pseudo polls that tell the candidates that they will triumph in the next elections, when they end up losing by a landslide. On the political front, even the most professional polls can go awry. British Prime Minister David Cameron called for a referendum because pollsters told him his position would win if Britain remained in the European Union. Exactly the opposite happened: Brexit started and Cameron had to resign.
–What does the size of the sample error depend on?
–The size of the sample, that it is truly random and that it is well stratified. The latter means that the weighting of the sample data has to replicate the structure of the universe. In a neighborhood populated by 99 people from party X and one from party Y, voting intention cannot be based on simply adding what one person from party X thinks and the only one from party Y.
–And also that it is not biased.
–That’s what I meant when I said it had to be really random. When the Ministry of Commerce “agrees” with businessmen to increase the prices of the products that INDEC takes less to estimate the inflation rate, but leaves them to do what they want with the rest of the products, it is biasing the estimate. By the way, in this regard we have a past that leaves something to be desired, but a present that is correct in principle. Nothing to do with Turkey, whose president has just fired the head of his statistics office for having stated that in 2021, in that country, the inflation rate was 36.1%.
–He convinced me. So let’s go to the other extreme. Why do censuses have to be done, since they are so expensive?
–Because a sample is used to understand the characteristics of a universe, but not to calculate its size. Example: it is not necessary to ask all the inhabitants if they have running water in their house to find out what proportion of the houses have water. But how can you find out how many people your country has, except by counting them all?
“An implication of what you just said occurs to me.
I see where you’re going. If what I say is true, a population census should go to all dwellings, but only one question should be asked: how many people live in this property? To avoid duplication, it is usually stated like this: how many people slept in this property last night? The rest of the information should come from samples. Which would imply saving time and computing effort; first to the respondents, to the enumerators and to all the officials in charge of the census.
–Give me other examples of the need to appeal to the universe.
–Covid-19. We do not need to consult the entire population to know approximately what proportion of the tested population is infected; just as we do not need to pay attention to the universe to know what proportion of those hospitalized for Covid were vaccinated and with how many doses. But paying attention to the samples, we cannot know what the total number of infected, vaccinated, etc. is.
– And in the economic plane?
–Take the case of GDP. How to know the size of an economy, without calculating the total GDP? How to calculate the GDP per inhabitant, without estimating the population of that country?
–But, what degree of reliability can the GDP estimate have, given the degree of informal economy that exists, for example, in Argentina?
-Good point. In this regard, extremes must be avoided: neither work with decimals, nor ignore estimates completely. Because as it is well said, statistics, like bikinis, what they show is important, but what they hide is fundamental. This is the version to mention to statisticians; statisticians must be told the opposite: that what they hide is fundamental, but what they show is important.
–Any other defense of GDP estimates?
-A pair. On the one hand, from the productive point of view, formality and informality are not watertight compartments. For example: a factory that operates informally consumes electricity; the factory is probably “hanging”, but the power generation is recorded. The other defense is that the analyzes pay more attention to variations than to absolute levels; and it is plausible that the growth rates of the formal and informal portion of the economy cannot differ much. Although, of course, it is an empirical question, and therefore subject to verification.
– Don Ugo, thank you very much.