Passing Camels through the Eye of a Needle:
The Effort to Create Internationally Comparable Social-Science-Based Longitudinal Data Sets in Canada
Richard V. Burhauser
Departement of Policy Analysis and Management
Cornell University
Research Professor, DIW Berlin
We thank Gert Wagner for careful readings of earlier drafts of this paper and Reverend Michael Mahler for his biblical advice. Partial funding for the research on which this paper is based came from the United States National Institute on Aging. The findings and conclusions expressed here are solely those of the authors and do not represent the views of the NIA.
“It is easier to pass a camel through the eye of a needle than for a rich man to enter the kingdom of heaven” (Matthew 19:24).
As a kid growing up in a lower-middle-class Roman Catholic household, I always enjoyed watching my more prosperous fellow congregants squirm during the annual sermon based on this passage from Matthew. But my enjoyment has diminished over the years as my own fortunes have improved and the odds of me passing this test have concomitantly fallen.
Luckily, a recent talk with a biblical scholar has caused me to be more optimistic about my eternal reward. When this biblical passage was written, camels were the major means of transportation and the eye of the needle was the narrow passage between the mountains through which the camels were lead, rather than the eye of my mother’s sewing needle I had always imagined. Hence, while my odds are still not good, I figure it’s not impossible.
With this in mind, let us say to our Canadian readers: “It is easier to pass a camel through the eye of a needle, than for a government statistical agency to successfully create and market an internationally comparable long-term, social-science-based longitudinal data set.”
The reasons for this prognosis are many, but perhaps the most important evidence is that it has never been done. Here we briefly discuss the most spectacular failure by government statistical agencies to launch ex ante comparable all-age, country-based, long-term panel data sets. We then contrast this experience with successful long-term all-age and older-age cohort panel data sets that have been launched by non-government organizations (NGOs) that are either ex ante or ex post comparable. Finally, we outline what are, in our view, the most important hurdles Statistics Canada faces in successfully creating and marketing new internationally comparable all-age or older-age cohort long-term social-science-based longitudinal data sets. We do so from our vantage point of researchers who have worked over the last 15 years to make three of the longest running and most successful all-age, long-term, social-science-based panel data sets available to the international research community, with funding at various time from the United States National Institute on Aging and other government agencies and most especially with the cooperation of the researchers in charge of these country-based panels.
The promise of empirical evidence to inform policy makers about their population’s health, wealth, employment, and economic well being has propelled governments to invest in the harmonization of country-specific, social-science-based long-term longitudinal micro data over the last 15 years. Furthermore, the advent of high quality and relatively easy to collect biomarkers promises to dramatically improve the ability of these data sets to disentangle the consequences of nurture and nature on life cycle success outcomes both within and across countries. But to succeed, these projects have had to overcome specific country restrictions on access to their country’s data by the worldwide research community as well as the problems associated with either ex ante or ex post harmonization of country data.
1. Long-Term, Social-Science-Based Panel Data Collection Efforts
Most OECD countries regularly survey a large representative sample of their populations. They do so to document the economic well being, labor market outcomes, and health of their citizens, and to gauge whether and how effectively public policies have improved their lives, as measured by social success parameters developed from the data. For the most part these predominantly cross sectional data sets are collected by the country’s central statistical agency.
In addition, organizations in several OECD countries have launched or have attempted to launch all-age, long-term, social-science-based longitudinal surveys to capture movement in these measures as well as the life course events that influence them from a dynamic perspective. Three of the most successful of these surveys are: The United States Panel Study of Income Dynamics (PSID) begun in 1968, The German Socio-Economic Panel (SOEP) begun in 1984, and the British Household Panel Survey (BHPS) begun in 1991.
In addition, organizations in a growing number of countries have fielded or have begun to field longitudinal micro-samples that focus on different age cohorts as they pass through some critical life events. Examples include the National Longitudinal Survey of Children and Youth (NLSCY) in Canada, the National Child Development Study in Great Britain, and the National Longitudinal Survey of Youth in the United States, which focus on cohorts of children, as well as the Health and Retirement Study (HRS) in the United States, the English Longitudinal Study in Aging (ELSA), and the new Survey of Health, Aging and Retirement in Europe (SHARE), which focus on older-age cohorts transitioning into retirement.
Canada has many excellent representative cross sectional data sets under the direction of its central statistical agency as well as several short-term panels, including the all-age family of social-science-based panels contained in the Survey of Labour and Income Dynamics (SLID) which are supplemented with excellent administrative records data. But the only major longer term ongoing Canadian panel is the NLSCY, which is following a cohort of children aged 10-11 in 1994-1995 and is scheduled to end when the kids reach age 25. (Its most recently released data is for 2000-2001.) To date, Canada has not invested in long-term longitudinal all-age or older-age cohort social-science-based data sets that, in principal, can span decades.
While most of the early long-term panel surveys were developed to evaluate outcomes in a specific country, over the last 15 years these data have increasingly been made ex post comparable (e.g. Cross-National Equivalent File, Consortium of Household Panels for European Socio-economic Research) for use in cross-national studies. And more recently a new generation of ex ante coordinated country-based surveys (SHARE) have been launched whose purpose is both to evaluate country outcomes as well as to compare those outcomes across countries (See: Burkhauser and Smeeding, 2001 and Burkhauser and Lillard, 2005, for critical reviews of these data efforts).
An even more recent international movement with respect to long-term panel data is that most of these data projects have already committed to or are drawing up plans to add biomarkers to their core social-science-based data. In doing so, they will expand the ability of social and physiological scientists to more objectively measure the impact of health on social outcomes as well as the impact of the social environment on the health outcomes of their populations. Thus, the addition of biomarker data to social science survey data will enable researchers to provide the information necessary to produce evidence-based policy by their country’s social policy makers. (See: Burkhauser and Lillard, 2006 for a critical overview of the value of biomarkers data in social-science-based data sets. See Cawley and Burkhauser, 2006 for an example of the value of better measures of fatness for social science based research on the consequences of obesity.)
As Statistics Canada considers the feasibility of developing new long-term panel data, it should review the relative merits of all-age versus age cohort panels and the importance of including biomarkers in these social-science-based data sets. More generally, Statistics Canada should review the added value of making their data either ex ante comparable or at least ensuring that it can feasibly be made comparable ex post with mature data sets from other OECD countries.
2. Past Failures to Create and Market Long-Term Panels by Statistical Agencies
The most comprehensive attempt by a governmental statistical agency to create an ex ante harmonized data set was the European Community Household Panel (ECHP) (Web address: www.epunet.essex/ac/uk/echp.php ). Led by Eurostat, the ECHP attempted, by using a common survey instrument, to create a set of country-based data sets that were comparable across countries. The ECHP goal was to create comparable panel data for all European Community countries.
While these data were collected from 1994 through 2001, the ECHP’s goal of creating harmonized data through a common survey instrument was not successful. The panels were abandoned in 2001. The ECHP was plagued by problems from the outset. In part, these problems arose because the ECHP was developed by Eurostat and implemented by each country’s statistical agency with little or no consultation with the research community. Hence, unlike the successful harmonization efforts discussed below, end users played only a minor role in the creation and implementation of the survey instrument. Most troubling, the ECHP project failed to utilize the long experience of researchers who were running mature panel surveys in European Community countries.
After two waves of ECHP it was clear that several major country panel surveys had unsustainable attrition problems. In Germany, Great Britain, and the Netherlands, ECHP panels were abandoned and replaced with existing panels (SOEP, BHPS, and the Dutch Socio-Economic Panel, SEP respectively) run by researchers outside their country’s statistical agencies.
After collecting data for only eight years, the ECHP ended in 2001. The accumulated problems which led to its demise included:
- Long delays in processing
- Problems with initial responses
- Problems with attrition rates
- Non-uniform implementation
- Lack of input from the research community in design and response to users over time
- Initial failure to take advantage of existing panels
- Poor dissemination strategy to get the data to the international research community
- High costs of use for individual researchers.
3. Successful Examples of Long-Term Panel Creation and Marketing
3.1. All-age Panels
The PSID is the first and longest running all-age social-science-based panel. Over the past 20 years the PSID has inspired a generation of similarly focused panels in other OECD countries, including the SOEP and BHPS. These three data sets have a common funding and administrative history. While all three surveys are funded either directly or indirectly by their federal governments, each receives funding through peer reviewed competitions. Each is affiliated either with a university or a peer reviewed research institute whose mission is primarily research based. Most of their key administrators are academics who are also active researchers. Most of their data managers also have active research agendas and are expected to publish as well as develop and monitor core data.
Hence, while all their team members are committed to the creation and distribution of high quality data, they do so from the perspective of the active researchers they serve. Furthermore, each organization has an outside board of overseers whose members represent both the agencies that finance the data collection and members of the research community that use the data. Perhaps as a consequence, each survey’s refunding importantly depends on the ability of the organization to provide timely and useful data to the end users, the research community. Finally, each organization and its board of overseers defines the research community to include not only domestic but also international researchers.
These non-governmental, research-centered organizations are therefore highly sensitive to the demands of end users, aware of cutting edge developments in theory and methods in the research literature, and have more leeway to implement path breaking innovations in data and data collecting than are data efforts directly controlled by central statistical agencies. Organizationally, this makes them more likely to correctly adjust their data collection efforts to changes in the theory-driven research demands for their data. Together, the characteristics of these organizations make it more likely that the data they produce will be widely used for future evidence-based policy making.
3.2. Cohort Panels
An important new source of data for cross-national research has been the creation of panel surveys capturing the economic well-being, labor force outcomes, health and wealth of a cohort of older working-age people. The model for almost every recent OECD country cohort study is the United States Health and Retirement Study (HRS). Begun in 1992, the HRS was funded by a consortium of government agencies led by the National Institute on Aging. Like the PSID, SOEP, and BHPS, the HRS is run by active researchers through a university or research centered institute. Its key decisions are also actively monitored by an outside board of overseers. In addition, HRS has very active subcommittees of multidisciplinary researchers, most of whom are not directly affiliated with the home organization (University of Michigan), and who represent the various social science disciplines who use the data. (See: Juster and Suzman, 1995 for a history of the intellectual development of the HRS.)
Two other panel studies, both modeled closely on the design of the HRS, have been recently launched in Europe. The English Longitudinal Study in Aging (ELSA), begun in 2002, has a funding and management structure similar to that of the HRS. However, ELSA has an even wider group of academic disciplines represented in its board of overseers and its subcommittees than does the HRS. The newest and most ambitious older age cohort study is the multi-country Survey of Health, Aging and Retirement in Europe (SHARE). SHARE released its first wave of data for 10 European countries in 2005. In contrast to ECHP, SHARE organized itself using administrators who are primarily researchers. The administrators who manage and oversee the operation of SHARE are all researchers employed by NGOs or universities, rather than employees of central statistical agencies.
The development of the HRS was propelled by a collaborative effort of cutting-edge, social-science-based researchers in the disciplines of demography, economics, epidemiology, social psychology, and sociology with representatives of the leading government agencies responsible for data collection in the United States. Those involved in the planning of the HRS chose to include representatives from such a broad range of social science disciplines because they recognized that social-science-based research on aging would increasingly be done in multi-disciplinary teams.
The result was a data set that not only achieved its immediate purpose but has had a worldwide influence on data collection initiatives. The outstanding self-reported health and socio-economic information in HRS has permitted social scientists from various disciplines, both individually and in teams, to begin to show how individuals make decisions over their life course. That same model of cooperation was used by European social scientists in the development of ELSA and SHARE.
The long-term scientific value of these studies rests in large part on the ability of the NGOs who run them to rapidly change their data collection efforts with changes in the theory-driven demand of their end users. The next generation of life course researchers will include teams of social and physiological scientists. These teams will require more objective measures of health (biomarkers) than are currently provided by self-reported health variables even in the best existing social-science-based data sets. These data will have to include sufficient anatomical and biological characteristics of respondents to allow researchers to better understand the relative importance of nurture (the social environment) and nature (biology) on social success outcomes across the life course including at older ages.
The basic research done by this new generation of social and physiological scientists can then be used in empirical models that offer the possibility of distinguishing the relative importance of these two sources of variation in explaining individual outcomes. The creation of this new generation of data will be required not only to advance core knowledge of the process to better health and greater economic well-being at older ages but also to better inform policymakers of the possibilities and limitations of current and future social policies meant to improve the health and well-being of their constituents.
The collection and linking of biomarkers to these social-science-based longitudinal data sets has already begun. The second wave of ELSA (2004), which is on the verge of general release, contains some of these biomarkers (blood pressure, lung function, height and weight, saliva, etc.) The HRS has already collected DNA in its subsample on dementia. More importantly, the HRS has proposed to collect a broad set of physical performance, anthropometric, genetic, and biological data via enhanced face-to-face interviews during its next six-year cycle of funding. The PSID is also considering doing so, as part of its next wave of the Child Development Supplement (CDS). In addition, both the HRS and PSID are considering the non-face-to-face collection of some biomarkers via, for instance, postcards as supplements to their usual phone-based interviews. The integration of biomarkers into these social-science-based data sets, far ahead of such data sets directly controlled by central statistical agencies, is an example of NGOs’ greater flexibility and willingness to take risks.
4. Successful Examples of Cross-National Harmonization of Panel Data
4.1. Ex Post Harmonization of Existing All-age Panel Data
The Cross-National Equivalent File (CNEF), (Web address: http://www.human.cornell.edu/che/PAM/Research/Centers-Programs/German-Panel/Cross-National-Equivalent-File_CNEF.cfm) harmonizes a subset of the data found on five panel data sets: the United States Panel Study of Income Dynamics (PSID), the German Socio-Economic Panel (SOEP), the British Household Panel Survey (BHPS), the Canadian Survey of Labor and Income Dynamics (SLID) and starting in 2007, the Australian Household Income and Labour Dynamics in Australia (HILDA). (Unlike the other four data sets, SLID is a short-term panel data set.)
CNEF primarily contains information on income and labor market outcomes but recently added self-reported health variable as each of its long-term country data sets began to add a richer mix of health variables to their core questions. CNEF uses the PSID as its model and harmonizes its key variables to the definition of variables in the PSID. By doing so, it provides a data set that is especially useful for making comparisons of outcomes in the United States to those in the other three countries. It not only allows researchers access to the original data sets from which the CNEF variables are created but also to the programs used to create them. Access to these programs allows individual researchers to review the algorithms used to create variables. It also allows researchers to customize the programs. Efforts are made to make it easier for researchers to merge CNEF data with data from each parent study. In this way researchers can append information from the original data to create new harmonized variables that are then made available to the cross-national research community.
CNEF, administered by researchers at Cornell University, shares a common organizing theme with the groups that created the country panel that it is drawn from: active researchers conceived, planned, and carried out how the data would be harmonized. While data managers, some in government statistical agencies, were often involved in the process, it was researchers who decided how to define the variables of interest so they represented equivalent measures. In addition, the above efforts have involved researchers familiar with the institutions of each country. This involvement means that when a decision had to be made about how to harmonize data, the decision was not only informed by country-specific knowledge of institutions but also was guided by an overall conceptual definition based on the latest research on that specific topic. Even using similarly designed country panel surveys, it is not a trivial exercise to harmonize the data consistently across countries. Researchers guided by theory and concepts flowing from the research pertinent to their studies are best able to make the assumptions necessary to harmonize data across countries.
CNEF was a major advance over the first generation of cross-national harmonization efforts in several important ways. It was the first effort to successfully harmonized panel data. But it also was able to provide direct access to the underlying country data that it harmonized so that all legitimate members of the international research community could both link the harmonized data to the more extensive variable on the country data as well as choose alternative harmonization strategies. Unlike the Luxembourg Income Study (LIS) (Web address: www.lisproject.org ), whose core country data sets were primarily from central statistical agencies that would not allow their data to be directly accessed by those outside their borders, CNEF was able to negotiate contracts with the SOEP and BHPS (core PSID data is available to all and hence no special contracts were necessary) for access to their core data for all legitimate researchers. By special arrangement with SLID, CNEF users outside of Canada were given permission to use the subset of CNEF data containing SLID equivalized data, but international users are required to use the indirect enclave method of access to the original SLID data. (CNEF is not the only successful attempt to ex post harmonize all-age long-term panel data. The Consortium of Household Panels for European Socio-Economic Research (Web address: www.ceps.lu/Cher/acceuil.cfm) is an ex post harmonization effort that uses the SOEP as its organizing model and is most valuable for those interested in comparing European panel data. It too is run by an NGO.)
As a result researchers from any country can now use these data to evaluate public policies from a cross national perspective. For two recent examples see Valetta (2006) who uses CNEF data to test the importance of government policies on poverty dynamics and Burkhauser, Giles, Lillard and Schwarze (2005) who use the same data to look at how the economic well being of women change after the death of their husband in these four countries and the relative importance of government and private sources in replacing lost earnings.
The use of biomarkers will greatly enhance the ability of the international research community to study cross-national differences in health. Collecting more objective anatomical and biological data offers the best alternative for overcoming the cultural biases contained in self-reported health measures. Hence, the use of biomarkers in country-specific panel data sets is likely to be the most effective way to capture pure health effects in cross-national comparisons of behavioral outcomes. But it is critical that these data can be directly used by the international research community. To date, central statistical agencies have found it more difficult to enter into the kinds of marketing agreements that permit such free access than have NGOs.
4.2. Ex Ante Harmonization of New Cohort Panels
The HRS has been a major source of information on the economic well-being, labor force behavior, health and wealth of men and women transitioning into retirement age in the United States over the last decade. It was the inspiration for ELSA and SHARE. Like ELSA, SHARE (Web address: www.share-project.org) is being led by a European network of researchers who are working together with United States-based researchers at the National Bureau of Economic Research (NBER). SHARE’s major advantage over previous efforts like ECHP is that it is run by an NGO led by a team of outstanding country-based researchers who have been working with their country’s all purpose panel survey data. Hence, this team is well aware of the problems of fielding panel surveys and has already avoided some of the pitfalls that befell the ECHP project. The SHARE organizers/researchers consulted with organizers/researchers who created the HRS and ELSA surveys in the development of their original set of English language questions. In addition, when they translated questionnaires, they used experts who were not only fluent in each country’s language but who were also familiar with each country’s social environment. The SHARE organizers/researchers have also made the early release of data a major priority and are doing so in a way that imposes a minimal cost (in effort only) on the researcher. While it is far too early to evaluate the success of this panel project, early indications are that it will be the first successful ex ante harmonized data set for Europe.
5. Threading the Eye of the Needle
Cross-national research using large representative panel data sets that have been ex post harmonized is still relatively new. It is only in the last decade that several ex post harmonized country panel data sets have become available. Yet these panel data have already become essential for those interested in knowing the relative economic well-being of OECD populations and their labor market outcomes. Dynamic cross-national analysis is now common on issues related to income mobility, poverty dynamics, and social policy (Biewen and Jenkins, 2005; Hungerford, 2003; Hacker, forthcoming). Better ex post harmonized data on self-reported health from these country panel sets are only now becoming available.
Reliable ex ante harmonized data efforts are even less far along. And as the history of the ECHP shows, not all investments in long-term panel data have produced benefits that exceeded their costs. While SHARE has the potential to succeed where ECHP failed, much remains to be done before it will be possible to get a set of truly ex ante harmonized panel data sets in the hands of researchers. Yet, the experience gained in the ex post harmonization efforts reviewed above provide every reason to believe that the greater involvement of researchers in ex ante harmonization efforts will lead to successful and useful harmonized data. SHARE is the best example to date of the value of researcher-driven ex ante data collection efforts. Hence, despite the shortcomings of past efforts, the potential of harmonized data to help identify key relationships between policies, socio-economic factors, and health outcomes makes the money invested in them a worthwhile venture.
To date, Canada has not invested in either all-age or older-age cohort panel data of the type discussed here. Its sole long-term ongoing panel, the NLSCY, is scheduled to phase out as the children reach age 25. In addition, while some data from the short-term panels in SLID are contained in CNEF, access to and research by the international community of scholars using the full SLID panels is more limited than it is to the long-term panel data sets Germany, Great Britain and in the United States discussed above. Finally, Canada has not taken part in either the European ex ante harmonization efforts of SHARE or in an ELSA-like effort to use the HRS cohort model to capture the dynamic behavior of older age cohorts of Canadians.
Hence, Canadian researchers are not well positioned to compare outcomes in Canada with outcomes in other OECD countries. Nor will Canadian researchers be able to take advantage of the potential value of the biomarker data used in future cross-national studies of the impact of social policies on life course success outcomes to inform Canadian public policy.
The conference and the book that resulted from it are an excellent first steps toward an informed decision on whether it is now time for Canada’s research and data collection efforts to become more fully integrated into the world research and data collection community by investing in long-term social-science-based all-age or older-age cohort panel data that are ex ante harmonized or, at least, can be feasibly made to be harmonized ex post. In making these decisions, Canadian policy makers should consider both the successes and failures of past investments by other countries in such data. It is possible for Statistics Canada to create and market long term, all-age or older-age cohort social-science-based panels. The creation of such data will provide the raw material for both the basic and applied policy research necessary to develop evidence-based social policies in Canada.
But in doing so, Statistics Canada will have to determine how important it is to involve its own country researchers as well as the international community of researchers in the process of design, implementation, marketing, and general oversight of these data efforts. To date, few central statistical agencies have been willing to actively engage “outsiders” in these activities. Not to do so, however will make their journey through the mountains more perilous. But, returning to Matthew, not hopeless since, “With men this is impossible, but with God all things are possible.” (Matthew 19:26)
References
Biewen, Martin, and Jenkins, Stephen P. 2005. “A framework for the decomposition of poverty differences with an application to poverty differences between countries.” Empirical Economics. 30 (2) (September): 331-358.
Burkhauser, Richard V., Philip Giles, Dean R. Lillard, and Johannes Schwarze. 2005. “Until Death Do us Part: An Analysis of the Economic Well-Being of Widows in Four Countries,” Journal of Gerontology, 60B, (5) (September): S238-S246.
Burkhauser, Richard V. and Dean R. Lillard. 2005. “The Contribution and Potential of Data Harmonization for Cross-National Comparative Research,” Journal of Comparative Policy Analysis, 7 (December): 313-330.
Burkhauser, Richard V. and Dean R. Lillard. 2006. “The Case for NIA Leadership in Integrating State-of-the-Art Biomarkers into Next Generation Social-Science-Based Data,” Cornell University Working Paper.
Burkhauser, Richard V. and Timothy M. Smeeding. 2001. “The Role of Micro-Level Panel Data in Policy Research,” Schmollers Jahrbuch: Journal of Applied Social Science Studies, 121 (4): 469-500.
Cawley, John and Richard V. Burkhauser. 2006. “Beyond BMI: The Value of More Accurate Measures of Fatness and Obesity in Social Science Research.” National Bureau of Economic Research Working Paper 12291, (June).
Hacker, Jacob S. forthcoming. “The Politics of Risk Privatization in US Social Policy.” in The Politics of the Market, Marc Landy, Martin Levin, and Martin Shapiro, editors. Brookings.
Hungerford, Thomas L. 2003. “Is there an American Way of Aging? Income Dynamics of the Elderly in the United States and Germany.” Research on Aging 25 (5): 435-455.
Juster, F. Thomas and Richard Suzman. 1995. “An Overview of the Health and Retirement Study,” Journal of Human Resources, 30 (Supplement): S7-S56.
Valletta, Robert G. 2006. “The Ins and Outs of Poverty in Advanced Economies: Government Policy and Poverty Dynamics in Canada, Germany, Great Britain and the United States,” The Review of Income and Wealth, 52 (2): 261-284.