Family Background and Outcomes Later in Life: A (Partial and Personal) Survey of Recent Research Using Swedish Register Data
Anders Björklund
Swedish Institude for Social Research (SOFI)
Stockholm University
1. Introduction
Issues about family background and subsequent outcomes later in life are often addressed in both academia and policy circles. Researchers from several disciplines study such relationships from many different perspectives. Historically, economists should maybe be considered as newcomers to this research field in which psychologists and sociologists have been very active for ages. This survey of recent research based on Swedish administrative register data should therefore be read with the author’s background as a labor economist in mind.
In order to put the research into perspective, I make a distinction between two different research purposes. The first one is mainly descriptive although not always very easy to conduct and is concerned with questions like these: How important is family background for adult outcomes in adult life like educational attainment and income? How strong are income and education correlations or other measures of association between parents and children and among siblings? Have these correlations changed over time? Are they stronger in some countries than others? Such relationships are sometimes also discussed under the label “intergenerational mobility”. And often they are claimed to shed light on “equality of opportunity”, since a strong association between own labor-market performance and family background is considered as a deviation from an equality-of-opportunity norm.
The second research purpose is concerned with purely causal questions. What is the causal impact of parental resources like education and income? What is the impact of events during childhood like parental separation or residential mobility? What is the impact of different choices like school choice? In such studies, “family factors” often show up as confounding factors that researchers want to eliminate, for example by regressing between-sibling differences in income or education on differences in school choice or the experience of childhood events.
Needless to say, such scientific studies require sophisticated data. Most of the issues require accurate information about two generations. Such data are not easy to get. The experience of the well-known PSID exemplifies this claim. Those who were children in the first households interviewed in the first PSID-wave in 1968 had to be followed for around 20 years until outcomes in adult life were realized and could be measured. Such a follow-up is costly and time-consuming and for many research questions such as those relating to non-standard family types sample sizes become very small. Further, quite some attrition takes place along the way.
As an alternative to interviews, it is very appealing to get the data from nation-wide administrative registers. The purpose of this paper is to show how research on the topics just described has been carried out on register data from Statistics Sweden (SCB). In recent years, researchers have been allowed to extract research data sets from SCB’s registers. These data sets have been created by SCB by merging its own registers using the personal identifier available in Sweden. In unidentified form and after the approval of an ethical committee the data sets have been delivered to researchers in Sweden who are obliged to treat the data with care and not leave them out to any other researcher or use them outside of the country. Section 2 of the paper explains what basic data registers at SCB that have made such research possible to conduct. Section 3 gives a flavour of the results from some studies. Section 4 concludes and discusses problems and prospects for the future.
2. Data sources
Studies like those typified in the introduction require quite sophisticated data. First of all, one must identify family relationships, i.e., find out who is related to whom. Second, one needs data on parental resources such as income and education as well as similar outcome measures for offspring.
2.1. Defining family connections
A major virtue of Swedish registers for the type of studies considered in this paper is that family connections can be identified simultaneously in two different ways. One data source, the so called Multi-generational register, contains connections between parents and children via biology and adoption. This register has been created specifically for research purposes by SCB in the last ten years out of other demographic information. The underlying population of this register is all Swedes who were born 1932 or later and registered as living in Sweden any time from 1961 onwards. For those individuals, the register contains information about biological and adoptive parents. Thus, one can also identify full and half siblings and in some cases more than two generations.
The other data source for family, or rather household, connections is the bidecennial censuses conducted from 1960 to 1990. There were also censuses in 1955 and back in time but the data from these ones are not available in computerized form. The censuses covered the whole population living in Sweden in the fall of the year that it was conducted. A major effort in the censuses was to define the household in which each person lived. In addition, the censuses collected other types of information like housing conditions, employment and commuting distances for those who worked in the labor market, and in some cases also information about the educational level of the household members.
From the censuses it is thus possible to determine whether a child lives with its biological (or adoptive parents) or not. The presence of step parents and step children can also be determined. Consequently, the combination of data from the multigenerational register and the censuses makes it possible to define a number of family types from the child’s point of view: those living with two biological parents, those living with two adoptive parents, those living with one biological parent only, those living with one biological parent and one step parent and so on. Because some of such family types are quite rare, the sample in a typical survey study would contain only very few observations of them. In nation-wide data sets, however, also quite rare family types appear in reasonably large samples. The fact that the censuses were only conducted every five years is of course a limitation. Thus, an event like parental separation cannot be determined more exactly than that it appeared between two censuses five years apart from each other.
2.2. Income and education data
Income, ideally from all possible sources, is a natural variable to use for labor economists. SCB collects, from other governmental bodies, all income components that are part of the tax assessment procedure. Thus, SCB gets information about the income that each person declares in the annual tax assessment procedure. Over the years, the basic source of such information has gradually switched to compulsory reports from employers to the tax authority. Nowadays, in the annual tax assessment, a Swede basically confirms the income reported by employers.
Of course, income in the black market is not covered in such statistics. And this is definitely a weakness of the data source and limits the inference that can be made. At the same time, one can question whether typical surveys manage to obtain data on such income. It would be useful to get a solid evaluation of the income variables obtained via registers and a comparison with survey-collected data. However, I am not aware of any such study.
SCB also gets information about all public transfers that are not subject to income tax. The same applies to final income taxes paid. Thus, SCB can offer researchers the package of income components that add to a person’s disposable income. Since persons can be allocated to households, it is also possible to measure disposable income with the household as “the unit of income”.
Income data for the whole population go back to 1968, but the quality is lower and the division into income components is less detailed in the beginning of this period. Some aggregate income measures, however, are measured quite accurately ever since 1968 so for some cohorts the major part of a career is covered. Thus, a life-cycle approach to income determination can be taken with these nation-wide data; see Böhlmark & Lindquist (2006) for a recent study that examines the association between annual and lifetime income over this period.
Data on education are crucial for many applications in labor economics as well as in many other social science applications. The 1960 census collected some data on higher education and these are available for researchers in a computerized form. The 1970 census collected more complete education data. These data also form the starting point for an important nation-wide register that SCB started to establish from the late 1980s onwards. All Swedish schools and colleges are required to report graduation data to SCB so that an education register can be continuously updated. Special questionnaires are sent by mail to immigrants so that their educational levels also are are covered. The education register has data on level and field of education, year of graduation as well as place of graduation. It has been evaluated several times and has proven to be of good quality.
Beginning with cohorts who graduated in 1988, SCB has also collected data on grades for all students who graduate from Swedish secondary and upper-secondary schools. In so doing, register researchers get access to useful school performance measures. These data also contain some useful information on school characteristics. For example, after some school reforms in the early 1990s, independent schools were established to a significant extent and the graduation registers contain information whether the student went to such a school or to a public school.
More recently, SCB has also started to collect data on compulsory national tests done at the end of secondary school and during upper-secondary school. In times when grade inflation is an issue, such information is a valuable complement to grade data.
3. Some results
3.1. How strong are family associations?
A sibling correlation is a most useful statistic to inform about the importance of family background for outcomes during adult life. The reason is that a sibling correlation has a straightforward interpretation: it is the fraction of the variation in a certain outcome say earnings that can be attributed to such factors that siblings share. Siblings who have grown up together not only share the same family but also the same neighborhood conditions. Thus, it is a rather broad measure of the importance of childhood conditions. Somewhat paradoxically, a correlation has the same interpretation as the R2 from an estimated regression equation: the fraction of the variation in a variable that can be “explained” by certain factors.
Income and earnings are obvious domains for labor economists. To study the importance of the family for earnings inequality, it is only meaningful to focus on measures of long-run income. The main reason is that income in a single year is affected by many transitory factors that generally have nothing to do with family background. By using more long-run measures, such transitory factors are “averaged out”.
Björklund et al. (2002) estimate brother correlations in long-run earnings for Sweden, as well as the neighbor Nordic countries Denmark, Finland and Norway, and compare them with similar estimates for the United States. Their estimate for Sweden is about 0.25 and about the same for two brother definitions, namely biological full siblings and boys who lived together in the same census household. The estimates for Denmark and Finland are quite similar, it is significantly lower for Norway and above 0.40 for the United States as previous and subsequent studies also have shown. Björklund, Lindahl and Sund (2003) use the data on grades at age 16 to estimate sibling correlations in grade-point-averages for closely spaced siblings. They estimated such statistics for cohorts who graduated from 1988 to 2000 in order to examine whether family and neighborhood background became more important during a period of market-oriented school reforms involving among others more room for parental school choice and establishment of independent schools. The results show that the correlations were very stable around 0.50 despite the far-reaching reforms. Thus, about half of the variation in grade-point averages could be attributed to family and neighborhood conditions.
Intergenerational associations have also been estimated using Swedish register data. The most frequently estimated statistic has been the intergenerational earnings elasticity of a son’s long-run earnings with respect to his father’s. For example, Björklund & Chadwick (2003) estimated such an elasticity to close to 0.25 in contrast to typical US estimates above 0.40. Österberg (2000), also using register data, estimated much lower elasticities. The most likely explanation for her getting lower estimates is that she observed fathers’ earnings at an older age. Österberg also estimated father-daughter, mother-daughter, and mother-son elasticities, and in particular those for mothers were very low. Hammarstedt & Palme (2005) have recently estimated intergenerational elasticities for different immigrant groups and find striking differences among immigrants from different source countries.
3.2. Family associations by family types and variance decompositions
As I stressed in the introduction, the estimates presented in the previous subsection are “descriptive” in nature. In particular, it is important to stress that the true causal effects of changes in parental income and education for example changes generated by specific policies most likely differ from the intergenerational estimates presented above. Before discussing studies that explicitly aim at estimating true causal effects, I present results from a research approach that can be considered as an intermediate one between purely descriptive family association studies and purely causal ones. The idea behind this approach is to estimate family associations like sibling correlations and intergenerational elasticities for different family types. In so doing, one can get hints about what kind of mechanisms are more or less important ones behind the estimated average associations.
One example of this approach is Björklund, Jäntti & Solon (2005), who estimate sibling correlations in long-run earnings for nine different sibling types. The sibling types differ in two dimensions. First they differ in genetic similarity with MZ (identical) twins sharing the same genetic setup, DZ-twins and full biological sharing 50 percent, half siblings sharing 25 % and finally adoptive siblings having no genetic similarity. Second, they are divided into siblings reared apart and reared together. The nine sibling correlations estimated by Björklund, Jäntti & Solon (2005) reveal a general pattern that suggests that both genes and shared environment are important. Björklund, Jäntti & Solon then apply these estimated correlations to make a decomposition of earnings variation into one component representing genetic inheritance and another representing environmental influences that are shared by siblings. The best-fitting model suggested that these two components were equally important for earnings. However, since the sibling correlations clustered around 0.20 somewhat lower than in Björklund et al. (2002) due to other age limits non-shared environment was even more important than genes and shared environment.
Björklund, Lindahl & Plug (2006) extend the standard intergenerational model by using data on adopted children’s biological and adoptive parents in the same intergenerational equation. They found that both parents’ characteristics matter significantly and that their coefficients more or less add up to the coefficient for birth parents in equations for children raised by their birth parents. They found, however, a tendency for the biological mother’s education to be more important than the adoptive mother’s education, whereas the adoptive father’s income had a bigger impact than the biological father. The overall conclusion from the study is that both pre-birth factors proxied by bio parents’ resources and post-birth factors proxied by adoptive parents’ resources matter for intergenerational associations.
Björklund & Chadwick’s (2003) analysis is in the same spirit, but they estimate intergenerational father-son elasticities by time spent with the father. They find significant differences in expected direction, namely that the more time spent with the father the higher the estimated elasticities are. So the mechanisms behind intergenerational associations include time living with the father.
Two studies have also looked into the questions whether family associations vary by birth order and family size. If they would, theories of intergenerational mobility should be able to explain what it is in parental behaviour that creates such differences. Somewhat confusingly, but not necessarily in conflict with each other, Björklund et al.’s (2004) results show that sibling correlations are fairly similar for different family size, whereas Lindahl (2006), studying intergenerational elasticities, find a significant pattern such that the elasticities fall by birth order for larger families. More research is needed to figure out what is behind these results.
3.3. Causal effects
Studies with the explicit aim to get at causal effects that is address the counterfactual question what would happen if some intervention would change a factor like parental education or income with some specific units must be clearly distinguished from the more descriptive intergenerational mobility studies reported so far. There is, however, a very recent wave of studies that aim at disentangling the true causal effect of parental income or education on offspring. Three different methods have been used in this recent research. Plug (2004), for example, use data on adopted children and their adoptive parents to purge the intergenerational association from the genetic component that definitely must be considered a confounding factor if the purpose is to estimate the true causal effect of changes in parental income or education. Behrman & Rosenzweig (2002) use data on identical twins with different education to eliminate the influence of genetic inheritance and shared environment. Both Plug and Behrman & Rosenzweig use rather small US data sets. Finally, some studies have used “exogenous variation” in parental education generated by school reforms to estimate the causal impact of parental education on offspring’s outcomes; see Black, Devereux & Salvanes (2005) for a Norwegian study and Oreopoulus, Page & Stevens (2006) for a US one.
The first two of these methods can be applied on Swedish register data with the additional advantage that the sample size can be much larger. Björklund, Lindahl & Plug (2004) use data on all adoptees born in Sweden in the 1960s to pursue the adoption approach to this problem. They find that the intergenerational associations between parents and children who are related by adoptions are about half of those for own-birth children who were raised by their biological parents. Yet the estimates were significantly different from zero, suggesting that there are some causal effects of parental resources like income and education. To dig deeper into this causal question, Holmlund, Lindahl & Plug use Swedish register data to estimate the causal effect by means of all three methods used in recent international research on the same, large data set. A Swedish school reform implemented experimentally during the 1940s and 1950s is exploited to get exogenous variation in parental education. Their main conclusion is that the results depend on method and thus that the different results in previous studies are not due to their using using different data sets. It remains a challenge for future research to find out why the three different methods yield different results.
Register data with family connections have also been used in a series of other studies of causal effects. In particular, the technique to relate sibling-differences in an outcome of interest, say income or educational attainment, to sibling differences in “events” like parental separation or school choice have been used in several different ways. The idea behind this approach is that siblings share quite many traits that are unobserved for the researcher so a sibling-difference approach can control for confounding factors that a conventional cross-sectional regression analysis cannot control for. When only typical survey data sets are available, however, the sample sizes of siblings who have experienced different events or made different choices are very small. Thus, large data sets that are obtained from nation-wide registers are crucial for a meaningful application of this research approach.
Björklund & Sundström (2007) apply this technique on the question whether parental divorce has a detrimental effect on children’s educational attainement. They compare the educational attainment of older siblings, who had moved out of their parents’ home when the separation occurred, with younger siblings who experienced the separation more directly. They did not find any significant educational differences among such siblings. Björklund, Ginther & Sundström (2007) examine the relationships between childhood family structure and subsequent earnings and educational attainment in Sweden and the United States. They also run sibling-difference models to test for causal effects of growing up in different types of families. They find surprisingly small differences between the two countries despite the differences in social policies.
Holmlund (2005) addresses the common issue whether teenage childbearing has an impact on women’s educational attainment. The register data allow her to compare the sibling-difference approach, which has been applied in influential US research, with a model that controls for grade-point averages at age 16. She finds that sisters who have a child as a teenager have lower grades at age 16 than their sisters who do not have a child as a teenager. This result suggests that the sibling-difference model for applications like these would benefit from additional information like sibling-difference information on grades.
Other recent applications of this research approach on Swedish register data involve topics such as the return to different college types (Lindahl & Regnér 2005), the impact of age at immigration for children’s school achievement (Böhlmark 2005), the scar effects of youth unemployment (Nordström Skans 2004), and inter-industry wage differentials (Björklund et al. 2007).
4. Prospects for future research
Statistics Sweden’s registers have turned out to be a gold mine for Swedish researchers who now very actively exploit these data sources. In this paper, I have focused on studies, using such data, of the importance of family background for income and education during adult life. Other labor economists have used the opportunity to match employer-employee data by registers. Several studies of unemployment and labor market policy have exploited the opportunity to merge SCB’s register data with the National Labour Market Board’s data on registered unemployed’s job search and their program participation; e.g., Nordström Skans (2004) mentioned above. Yet other researchers have used data from hospitals and mortality data for medical research topics. Presently, much research output is coming out of these efforts and much more can be expected in the next couple of years when new generations of researchers have learnt to exploit this valuable research resource. A similar development with more and more access to register information is taking place in the other Nordic countries; see for example Roed and Raaum (2003).
What about the future? Are there any threats in sight to this use of register data? Will SCB be able to improve data availability and data quality even further?
Of course, data integrity is often discussed in Sweden and a new heated public discussion on such issues could force SCB to change its policy in making (unidentified) data sets available to researchers. One sign that integrity issues are involved is the current discussion about the need for a new census or a close substitute to a census. The idea to conduct a new census is not popular in political circles, so SCB has instead proposed to establish a close substitute, a “register of dwellings” that can tell where Swedes live and what households they form. SCB has shown that it is feasible to construct such a register out of different types of available information but without the apparatus of a new census. The formal political decision to build up such a register has however been delayed, most likely because the topic is sensitive.
At another level, one could argue that further globalization is a threat to this research approach. The more often that Swedes spend longer job or education spells abroad, the more difficult it will be to get complete data on earnings and educational attainment.
But there are also further improvements in sight for researchers who use Swedish register data. In recent years, the basic data have been kept by SCB in many different sub-registers, making it more costly to merge the data that are wanted for a specific research project. Some actions have now been taken to reduce that problem. One example is the so-called LINDA data base that contains a large number of income and earnings variables over a very long period of time, variables that especially labor market researchers want to use. By drawing a research sample directly from LINDA, costs can be considerably reduced. The Swedish Income Distribution Survey, which also is Sweden’s contribution to the Luxembourg Income Study, is drawn from the LINDA-sample. SCB has also arranged so that the 3 % LINDA sample contains monthly (working-time adjusted) earnings data.
Very recently, SCB has taken a step to make it possible for a researcher in Sweden to “log in” on a SCB server to get access to register data so that the need to give out data sets will be reduced or even eliminated. There is, however, yet very little experience of this effort.
It could also be worthwhile to mention that documentation is always a problem when using data that have originally been created for other purposes than research. To really understand the processes that create these data, one can not only read a questionnaire. Instead one must understand how the Swedish society works. Nonetheless, more and more of the frequently used variables are now documented, thus making life for researchers somewhat easier.
Finally, it is tempting to mention one potential of Swedish register data that has not been exploited: getting the censuses from 1955 and back in time in computerized form would open up new opportunities. But the fact that the Swedish personal identifier, which always is used to merge data from different sources, was introduced in 1947 makes it rather costly to go very far back in time.
References
Behrman Jere and Mark Rosenzweig (2002), “Does Increasing Women’s Schooling Raise the Schooling of the Next Generation?”, American Economic Review.
Björklund Anders Tor Eriksson Markus Jäntti, Oddbjörn Raaum & Eva Österbacka (2002) “Brother Correlations in Earnings in Denmark, Finland, Norway, and Sweden compared to the United States”, Journal of Population Economics.
Björklund Anders Tor Eriksson Markus Jäntti, Oddbjörn Raaum & Eva Österbacka (2004), “Family Structure and Labour Market Success: The Influence of Siblings and Birth Order on the Earnings of Young Adults in Finland, Norway and Sweden”. In Miles Corak (ed.) Generational Income Mobility in North America and Europe. Cambridge: Cambridge University Press.
Björklund Anders, Markus Jäntti and Gary Solon (2005), “Influences of Nature and Nurture on Earnings Variation: A Report from a Study of Various Sibling Types in Sweden.” In Unequal Chances: Family Background and Economic Success, edited by Samuel Bowles, Herbert Gintis and Melissa Osborne. Princeton University Press. 2005.
Björklund Anders, Mikael Lindahl & Erik Plug (2004), “Intergenerational Effects in Sweden: What can we Learn from Adoption Data”, IZA DP No. 1194, www.iza.org.
Björklund Anders, Mikael Lindahl & Erik Plug (2006), “The Origins of Intergenerational Associations: Lessons from Swedish Adoption Data”, Quarterly Journal of Economics.
Björklund Anders & Marianne Sundström (2007), “Parental Separation and Children’s Educational Attainment: A Siblings Analysis on Swedish Register Data”, Economica.
Björklund Anders, Donna Ginther & Marianne Sundström (2007), ”Family Structure and Child Outcomes in United States and Sweden”, Journal of Population Economics.
Björklund Anders & Laura Chadwick (2003), “Intergenerational income mobility in permanent and separated families”, Economics Letters.
Björklund Anders, Melissa C. Clark, Per-Anders Edin, Peter Fredriksson & Alan B. Krueger (2005), The Market comes to Education in Sweden, Russel Sage Foundation.
Björklund Anders, Mikael Lindahl & Krister Sund (2003), “Family background and school performance during a turbulent era of school reforms”, Swedish Economic Policy Review.
Björklund Anders, Bernt Bratsberg, Tor Eriksson, Markus Jäntti & Oddbjörn Raaum (2007), “Inter-Industry Wage Differentials and Unobserved Ability: Siblings Evidence from Five Countries”, Industrial Relations.
Black Sandra, Paul J. Devereux & Kjell G. Salvanes (2005), “Why the Apple Doesn’t Fall Far: Understanding Intergenerational Transmission of Education”, American Economic Review.
Böhlmark Anders (2005), “Age at Immigration and School Performance: A Siblings Analysis Using Swedish Register Data”, WP 2005:6, SOFI, Stockholm University.
Böhlmark Anders & Matthew J. Lindquist. (2006), “Life-Cycle Variation in the Association between Current and Lifetime Income: Country, Cohort and Gender Comparisons”, Journal of Labor Economics.
Fredriksson Peter & Per-Anders Edin (2000), “LINDA Longitudinal INdividual DAta for Sweden”, Working Paper, Department of Economics, Uppsala University.
Hammarstedt Mats & Mårten Palme (2005) “Intergenerational Mobility, Human Capital Transmission and the Economic Position of Second-Generation Immigrants in Sweden” Working Paper, Department of Economics, Stockholm University.
Holmlund Helena (2005), “Estimating Long-Term Consequences of Teenage Childbearing. An Examination of the Siblings Approach”, Journal of Human Resources.
Holmlund Helena, Mikael Lindahl & Erik Plug (2006) ”Estimating Intergenerational Schooling Effects: a comparison of methods”, in Holmlund’s doctoral thesis, Swedish Institute for Social Research, thesis no. 68.
Isacsson Gunnar (1999), “Estimates to the return to schooling in Sweden from a large sample of twins”, Labour Economics.
Isacsson Gunnar (2004), “Estimating the Economic Return to Educational Levels Using Data on Twins”, Journal of Applied Econometrics.
Jonsson Jan O. & Michael Gähler (1997), ”Family Dissolution, Family Reconstitution, and Children’s Educational Careers: Recent Evidence for Sweden”, Demography.
Lindahl Lena (2007), “Do Birth Order and Family Size Matter for Intergenerational Mobility? Evidence from Sweden”, Applied Economics.
Lindahl Lena & Håkan Regnér (2005), “College Choice and Subsequent Earnings: Results Using Swedish Sibling Data” Scandinavian Journal of Economics.
Meghir Costas & Mårten Palme (2005), “Educational Reform, Ability and Parental Background”, American Economic Review.
Nordström Skans Oskar (2004) “Scarring effects of the first labour market experience”, Working Paper 2004:14, IFAU.
Oreopoulus Philip, Marianne E. Page & Ann Huff Stevens (2006), “The Intergenerational Effects of Compulsory Schooling”, Journal of Labor Economics, forthcoming.
Plug Erik (2004), “Estimating the Effect of Mother’s Schooling on Children’s Schooling Using a Sample of Adoptees”, American Economic Review.
Røed, Knut & Oddbjörn Raaum (2003), “Administrative Registers - Unexplored Reservoirs of Scientific Knowledge?”, Economic Journal.
Österberg Torun (2000), “Intergenerational Income Mobility in Sweden: What do Tax-Data Show?, Review of Income and Wealth.