Bringing Big Data into Policy Research

Studying large public social programs can be a challenge for social scientists. Recent technological developments have allowed researchers to use big data—data sets that combine survey and administrative data—to examine how social programs work and how they affect their participants. One method by which researchers accomplish this is “text mining,” a process of finding patterns from large amounts of unstructured text data. In this example, we explore how text mining contributed to a better understanding of the behavior of families in the foster care system and the role of the courts on reunification.

What We Did

Chapin Hall researchers, in partnership with the Illinois Department of Children and Family Services leadership, identified substance abuse and court delays as two issues that impact the permanency of children. We worked with DCFS casework experts to identify what words and phrases indicated issues of substance abuse and court delay. We designed a case note coding framework that programmers used to indicate any instances of substance abuse or court delays in a case (and whether the instances were negative or positive).

Using this framework, we mined case notes from cases of 18,694 individual foster children in care in 2011. We extracted 1,174,989 case notes and identified patterns that indicated aspects of substance abuse and court delays across these case notes for the individual foster children.

What We Found

Based on the notes, we were able to determine the following patterns relating to substance abuse:

  • Overall, 71% of children came from families who had substance abuse issues.
  • Of white and African American foster children, 76% came from such families, compared to 67% of Hispanic children.
  • Children from families with issues related to substance abuse were slightly more likely to be placed in the care of a family member than other children in the child welfare system.

With regards to court delays, Cook County children had slightly fewer delays than children from the rest of Illinois. Children with a lower age at placement had a greater rate of court delays than youth placed from 13 to 17 years of age.

What It Means

Text mining is one tool that researchers and system leaders can use to create useful new information about social programs and the communities that interact with them. Our process generated information about individuals and the services they receive that is not usually available in structured data. Such data can not only enrich a caseworker’s decision making, but can also be used for evaluation and other research.

Download Report