Douglas D. Heckathorn Discusses Respondent-Driven Sampling (RDS)
New Hot Paper Commentary, September 2010
![]() |
Article: EXTENSIONS OF RESPONDENT-DRIVEN SAMPLING: ANALYZING CONTINUOUS VARIABLES AND CONTROLLING FOR DIFFERENTIAL RECRUITMENT
Authors: Heckathorn, DD |
Douglas D. Heckathorn talks with ScienceWatch.com and answers a few questions about this month's New Hot Papers paper in the field of Social Sciences, general.
Why do you think your paper is highly
cited?
This paper further improves my sampling method, Respondent-Driven Sampling (RDS), which has become the method of choice for studies of geographically dispersed hard-to-reach populations. Studying these populations is challenging because they lack a sampling frame (i.e., an exhaustive list of population members), and they are usually small relative to the general population with social networks which are difficult for strangers to penetrate.
Examples include groups such as drug users, prostitutes, the homeless, undocumented workers, street youth, musicians, artists, gay men and so forth, who are important to social science studies, public health, public policy, and to studies of arts and culture.
The aforementioned populations have sometimes been studied using institutional or location-based sampling, but such studies are limited by the incomplete sampling frame. For example, in New York City only 22% of jazz musicians are musician union members and they are on average 10 years older, with nearly double the income of nonmembers who are not on any public list.
So, how does RDS function? First, it accesses members of hidden populations through their social networks, employing a variant of a snowball or "chain-referral" approach. As in all such samples, the study begins with a set of initial respondents who serve as "seeds." These then recruit their acquaintances, friends, or relatives who qualify for inclusion in the study to form the first "wave."
"The study also revealed which industries, and which types of workers had the highest-victimization rates, findings which can assist in targeting enforcement resources."
The first wave respondents then recruit the second wave, who in turn recruit the third wave, and so forth. The sample expands in this manner, growing wave by wave, in the manner of a snowball increasing in size as it rolls down a hill.
Then, RDS combines snowball sampling with a mathematical model that weights the sample to compensate for the fact that it was not obtained in a simple random way. This procedure includes controls for four biases that are inherent in any snowball sample:
- The seeds cannot be recruited randomly, because if that were possible, the population would not qualify as "hidden" in the first place.
- Respondents recruit their acquaintances, friends, and family members, whom they tend to resemble in income, education, race/ethnicity, religion, and other factors.
- Respondents who are well-connected tend to be over-sampled, because more recruitment paths lead to them.
- Population subgroups vary in how effectively they can recruit, so the sample reflects disproportionately the recruitment patterns of the most effective recruiters.
RDS works because it is based on a mathematical model of the network-recruitment process which functions somewhat like a corrective lens, controlling for the distorting effects of network structure on the sampling process to produce an accurate estimate of population characteristics.
Hundreds of studies in dozens of countries have been conducted using this sampling method, so methods for increasing the accuracy of the method are of significant interest.
Does it describe a new discovery, methodology, or
synthesis of knowledge?
RDS was originally introduced in the late 1990s in a paper titled, "Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations" (Heckathorn DD, Social Problems 44[2]: 174-99, May 1997). As the number of researchers using the method has grown, a rapidly growing research community has also emerged to further develop the method, so development of the method has become highly collaborative.
These efforts focus on identifying the conditions under which the method yields reliable and valid results, on reducing the method's dependence on restrictive assumptions, and on refinements to increase the method's validity and reliability.
Previous applications of the method had generally focused on categorical variables, such as race, ethnicity, and HIV status. The current paper's principal contribution is to introduce means for analyzing continuous variables in a way that is not biased when some groups recruit more of their peers than others. Such a procedure increases the accuracy of terms such as "quantitative estimates of the levels of HIV risk behavior among drug users," and "income among lower-tier workers."
Would you summarize the significance of your paper
in layman's terms?
This paper provides improved means for studying groups which are often missed by standard surveys, such as drug users, the homeless, and undocumented workers. It thereby provides means for replacing impressionistic descriptions of these groups with estimates which are valid scientifically.
How did you become involved in this research, and
how would you describe the particular challenges, setbacks, and
successes that you've encountered along the way?
RDS grew out of a project focusing on reducing HIV-risk behavior among injecting drug users in Connecticut. Robert Broadhead (Professor of Sociology, University of Connecticut) and I had received a grant from the National Institute on Drug Abuse to combat HIV infection among rural drug users. We used a snow-ball method to recruit respondents into the study.
"This paper further improves my sampling method, Respondent-Driven Sampling (RDS), which has become the method of choice for studies of geographically dispersed hard-to-reach populations."
Mathematical modeling of the recruitment process led me to a surprising discovery, which was that when the level of clustering in the networks of the drug users was uniform across groups, the composition of the recruits became stable after a modest number of waves, and furthermore, that recruits proved to be exactly representative of the population from which the sample was drawn. These unexpected discoveries led, through subsequent work, to the development of RDS.
Initially, this work evoked deep skepticism for conventional wisdom, as reflected in sampling textbooks, held that snow-ball-type network samples could never be statistically or scientifically valid. Legitimacy came as network-based sampling methods gained acceptance, in particular adaptive sampling and link-tracing designs.
Where do you see your research leading in the
future?
RDS provides a method for studying the structure of networks which are too large for study using brute force methods. This potential is based on a proof showing that when the sampling process stabilizes, the method draws random samples of network ties. The implication is that RDS can provide a statistically based picture of very large networks of initially unknown size and unknown structure.
Do you foresee any social or political
implications for your research?
The method has implications in several areas. One is public health. For example, the CDC employs it as part of its National HIV Behavioral Surveillance (NHBS) program, which focuses on persons at high risk of HIV infection, including injecting drug users and high-risk heterosexuals. The method is used both to estimate HIV prevalence, and to estimate the levels of risk behavior. This provides means for efficiently targeting preventive interventions, as well as assessing their effectiveness. The method is also widely used internationally: by Global AIDS, UNAID, Gates India, and others in developing countries.
Another implication concerns lower-tier and undocumented workers, defined as the bottom third of the work force. A Russell-Sage-Funded RDS study of 4,387 workers in New York City, Chicago, and Los Angeles revealed that two thirds had suffered from a workplace violation during the preceding week, such as failing to receive overtime or the minimum wage, and that this had cost each worker an average of $31 out of his or her $339 weekly income. The study also revealed which industries, and which types of workers had the highest-victimization rates, findings which can assist in targeting enforcement resources.
View more on this sampling method.
Douglas D. Heckathorn
Professor of Sociology
Cornell University
Ithaca, NY, USA
KEYWORDS: HIDDEN POPULATIONS; MULTIPLICITY; USERS; MEN.