7 min read
Population and sample in your thesis: how to define who takes part in your research

One of the moments that paralyzes thesis writers the most arrives when they have to answer this question in the methodological chapter: “what is your population and what is your sample?”. And the answer most of them give isn’t entirely wrong, but it isn’t complete either.
The underlying problem is that most methodology textbooks present these concepts in a rigid way, disconnected from the rest of the design. So the thesis writer learns to define population and sample, but not to understand why they exist, when they apply, and how they relate to the holotype of their research.
From the holistic understanding of science, proposed by Dr. Jacqueline Hurtado de Barrera, these concepts are framed within a broader system that includes the unit of study, the sources, and the informants. Understanding that complete architecture is what makes the difference between a solid methodological framework and one the committee will question.
Before talking about population: the unit of study
In the holistic understanding of science, the central term is not “population” but the unit of study. Why? Because “population” carries connotations that don’t always apply: it suggests a group of people, when in fact your research might study documents, organizations, events, processes, objects, or situations.
The unit of study is the singular element from which the relevant information for the research is obtained. It is the basic unit of analysis. Everything else (population, sample, data-collection techniques) is defined in relation to it.
Examples of units of study:
- In research on academic performance: the student (not the institution, not the classroom).
- In research on business management: the company (or the manager, depending on the event of study).
- In research on organizational communication: the message or the document (not the person who wrote it).
- In research on teaching practices: the class session (not the teacher or the student separately).
Defining the unit of study well before talking about “population” avoids one of the most common mistakes: confusing who takes part in the research with what is being studied in it.
What is the population in a research project?
The population —or, more precisely in holistic terminology, the holos— is the total set of units of study that share the characteristics defined in the event of study and that are part of the research context.
Three elements define it:
- The characteristics of the event of study: what qualities must the units have to be part of this research?
- The space-time context: where and during what period are they located?
- The boundaries of access: which ones are available to be studied?
Example: if your research seeks to describe the internal communication strategies of small manufacturing companies in the city of Caracas during 2025, your population is the set of all small manufacturing companies in Caracas active in that period. That is your reference universe.
When do you need a sample?
Here comes the point that generates the most confusion: not every research project requires a sample.
The sample is a subset of the population that is selected when it is not possible or convenient to study all of the units. It only makes sense when:
- The population is too large to be addressed in full.
- Time, budget, or access resources don’t allow studying the whole universe.
- The research holotype implies generalizing results (broad descriptive, explanatory, predictive, confirmatory).
Conversely, if your research is qualitative single-case, or you study a complete organization, or your holotype is projective and you only need information from a specific group to design a proposal, you may work with all available cases without drawing a formal sample.
The practical rule: if you can access your entire population and it makes sense to do so given your research question, work with all of it. If you can’t or don’t need to, you define a sample.
Types of sampling: which one applies to your thesis
There are two broad families of sampling. The choice between them depends on your research holotype and on what you intend to do with the results:
Probability sampling
It means that every unit of the population has a known probability (greater than zero) of being selected. It is the type of sampling that allows you to statistically generalize the results to the universe. Its most common variants are:
- Simple random: each unit is selected randomly without restriction.
- Stratified: the population is divided into subgroups (strata) and a proportional sample is drawn from each.
- Cluster: already-formed groups (classrooms, departments, regions) are selected and studied fully or partially.
- Systematic: one unit is selected from each defined interval (every 10th, every 20th, etc.) from a random starting point.
It applies mainly to research of broad descriptive, comparative, explanatory, predictive, and confirmatory holotype, where the results aim to represent the whole population.
Non-probability sampling
The selection of units does not follow strict randomness criteria. It does not allow statistical generalization, but it is completely valid and rigorous when the research does not seek that kind of generalization. Its most-used variants are:
- Purposive or by criteria: the units that best represent the phenomenon are selected according to theoretical criteria defined by the researcher. Widely used in qualitative research.
- Snowball: one unit leads to the next, useful when the population is hard to access (specific groups, experts, closed communities).
- Convenience: you work with the available units. Valid in exploratory studies or when access is the only limitation.
- Quota: proportions of certain profiles are set and units are selected until each quota is filled.
It applies to exploratory, analytical, projective, interactive, and evaluative research, where the depth of analysis matters more than statistical representativeness.
How to calculate the sample size
If probability sampling applies, the sample size is not an arbitrary decision. It is calculated with a formula that considers:
- The population size (N)
- The desired confidence level (usually 95%, equivalent to z = 1.96)
- The acceptable margin of error (e, frequently 5%)
- The estimated proportion of the characteristic in the population (p = 0.5 if unknown)
The most widely used formula for finite populations is:
n = (N × z² × p × q) / (e² × (N − 1) + z² × p × q)
Where q = 1 – p. If the population is very large or unknown, the formula for infinite populations is used: n = z² × p × q / e².
For non-probability research, the theoretical saturation criterion replaces the statistical calculation: you keep incorporating units until the information obtained stops adding new elements to the analysis.
The most frequent mistake: confusing who informs with what is studied
This is the critical point where most thesis writers make the most costly mistake: confusing the unit of study with the source or with the informant.
Let’s revisit the previous blog article (“Unit of study, source, and informant: who’s who in your thesis?”):
- The unit of study is what is being investigated: the company, the student, the text, the process.
- The source is where the information comes from: a document, a record, a database.
- The informant is who provides the information about the unit of study: it can be the unit itself (a worker answering about their own experience) or someone external (a supervisor reporting on their team).
The classic error: a study on managerial practices in small companies that declares the managers of those companies as its “population.” The managers are the informants, not the unit of study. The unit of study is the company. This mistake seems minor, but it changes how you write the design, the techniques, and the interpretation of the data.
What does this mean for your thesis?
Before writing the population-and-sample section of your methodological chapter, answer these four questions in order:
- What is my unit of study? (the concrete entity the analysis will fall upon)
- What is the totality of those units in my context? (that is your population)
- Can and should I study all of them, or do I need to select a subset? (this determines whether there is a sample and what type of sampling applies)
- Who or what will provide the information about those units? (that is your informant or your source)
If you answer those four questions clearly, the population-and-sample section of your thesis writes itself. The problem, in almost every case, is that people try to answer the fourth without having properly resolved the first.