Measuring Civil Justice for All

How to Acquire Essential Facts about Access to Justice in the United States

Back to table of contents
Making Justice Accessible: Legal Services for the 21st Century

Improving access to the essential facts described in the first section of this report will help scholars and policy-makers, courts and legal services providers, social services organizations, legislators, and the public better understand the scope and importance of the civil justice gap, how it manifests in the lives of individual Americans, and the vital importance of our courts and other justice system institutions. Only with these essential facts in hand can the United States ensure a legal system that is open and transparent and make changes to improve fairness and equity.

Collecting and releasing such data present a range of challenges. The courts and legal services providers must overcome technical limitations in their capacity to collect and share data, honor expectations of privacy, and assure appropriate use of data when it is shared. This section offers practical suggestions and tools for making facts accessible and thereby helping to make justice more accessible. It offers a model data use agreement that can be adapted to meet the needs of different stakeholders, further identifies core concepts in data access, and makes recommendations about a range of data collection practices, including a call for data scraping and the creation of a data commons.

Some data held by data keepers raise true concerns about individual privacy and data security provisions. People seeking the intervention of courts may understand that many court records are public records. Nonetheless, asking people who face potentially life-altering court cases to report on their dem­ographic characteristics—such as race, ethnicity, and gender—may feel intrusive to some or may prompt concern that such disclosures might adversely influence decision-making. People seeking help from legal services providers expect their personal and legal information to be held in confidence and may also be reluctant to share information that does not seem directly pertinent to their claims. The lines surrounding expectations of privacy are important to the individuals themselves and are also important to judges, lawyers, and organizations that are data keepers. Trust from clients, from litigants, and from the public generally is essential for these institutions to maintain their credibility and perform their roles effectively.

This report offers four approaches that can allow data keepers to collect, maintain, and safely share even sensitive data with scholars and other interested members of the public. Together, these four approaches can help to liberate civil justice data.



I. Liberating Civil Justice Data

A. Public Records

Most information about cases and litigants, as currently collected by courts, is a matter of public record and can be requested by researchers. This includes the personal identifying information of the parties. As court case records increasingly become electronic and as access to those records can be made available online rather than in the courthouse or the benefits office, courts and other government entities grapple with the loss of “practical obscurity” of their records. One consequence is the increasing risk that electronic records will be used by private entities and government agencies to monitor court involvement and curtail access to public benefits and private goods and services, like housing or employment.

What information is available now, and who uses it?

At present, most courts report at least some information in the aggregate, such as the number of cases heard in specific kinds of courts or the percentage of defaults in a particular kind of case, such as debt collection. These reports are sometimes posted publicly or shared in reports to funders such as legislators, but they are not standardized across states or from year to year.

Many courts make case-level data available through means only accessible to commercial data aggregators. These for-profit companies draw together public records as information resources that they then sell to private third parties. For example, these companies sell information to landlords who want to know whether potential tenants have ever been evicted, have a criminal history, or have sued previous landlords for repairs under housing laws. They also sell information to employers wishing to perform criminal background checks.

Researcher Tools: FOIA and Public Records Requests

Federal agencies are required to make much of their data public under the Freedom of Information Act (FOIA).6 Of relevance to civil justice are records and information collected by such agencies as the Department of Homeland Security, the Department of Housing and Urban Development, the Department of Labor, and the Department of Justice. While FOIA is the primary statutory mechanism for accessing such public records, the Privacy Act also provides rights to and limitations on public access to government information.7 The Privacy Act governs the “collection, maintenance, use and dissemination” of agency records containing personal identifying information about U.S. citizens and lawful residents.8 Notably, the Privacy Act rarely allows the disclosure of individually identifiable records without the written consent or request of the individual identified by the record, unless that disclosure is required by FOIA.9

Although nonfederal entities such as state and local governments are not covered under FOIA or the Privacy Act, many states have enacted their own public records laws. State sunshine laws, sometimes known as open records laws or public records laws, govern public access to governmental records in each state. These rules mandate varying degrees of accessibility to public records, however. In more than a dozen states, the state judiciary is exempt from state public records laws; a similar exemption applies in the District of Columbia.10 Some court rules provide access to court records. Most states have rules governing the bulk distribution of electronic case information, which vary from jurisdiction to jurisdiction.11 Each state and the federal court has its own privacy policies for court records with different levels of restrictions.12 At least one state has enacted a provision that allows researchers access to confidential court data involving children.13

B. Confidential Data

Information about cases and clients collected by legal services organizations, pro bono programs, and other direct service entities are typically not a matter of public record. Indeed, individual data collected during the course of legal representation are generally protected by the duty of confidentiality and attorney-client privilege.14

Depending on the agency or entity seeking to share data, other professional rules or laws may also be implicated. In the medical-legal partnership arena, for example, the Health Insurance Portability and Accountability Act (HIPAA) presents some obstacles to information sharing between physicians and attorneys. However, such partnerships, in order to further the legal representation and the patient’s medical treatment, often do share data after obtaining patient/client consent. Such consent indicates the person or agency to whom the identifiable information will be provided, the purpose of the disclosure, and the nature and scope of the information being disclosed. Consent-form requirements vary depending on what information is being shared and by whom. HIPAA, for example, has its own requirements for the sharing of protected health information. The sharing of data with research institutions will also implicate federal institutional review board (IRB) requirements.




II. Data Use Agreements

Few courts and legal services providers have structured their privacy and confidentiality agreements in ways that allow them to lawfully share data while preserving the privacy of litigants. When data are shared, each agreement and relationship between a data keeper and researcher is structured on an ad hoc basis, making the process unnecessarily burdensome as data keepers and data users reinvent procedures each time. Common language in a standard data use agreement can be a useful starting point for negotiating the appropriate and safe sharing of civil justice data.15

What are Data Use Agreements (DUAs)?

Improving data sharing among and between courts and other agencies and institutions can be facilitated through the use of a memorandum of understanding (MOU), a written agreement that outlines the relationship between two or more parties. Data use agreements (DUAs) are a type of MOU to facilitate transfer or use of data. Well-executed agreements govern who, when, how, and why individuals and entities will be able to access and use the data, in addition to ensuring compliance with regulatory structures.

What is the purpose of a DUA?

The overall goal of a DUA is to facilitate sharing of data while ensuring that information exchange rests on a solid legal framework and protects individual privacy. DUAs serve two important practical purposes for the parties. First, they protect the agency providing the data, ensuring that the data will not be misused. Second, they play an impor­tant role in guiding parties to think through otherwise unanticipated details of a data sharing relationship.

Are there legal restrictions on types of data that may be shared?

All federal laws and most state laws allow for the sharing of data, even individually identifiable information, for certain purposes. At the same time, federal and state laws restrict the use of certain types of data, including the following major categories restricted by federal regulations:

  • Health Information. HIPAA applies to “protected health information” provided to health plans, doctors, hospitals, and other healthcare providers. HIPAA only applies when the information is produced by specific entities, such as healthcare providers. When health-related information is produced by a court or another legal entity, HIPAA would likely not apply. Note that courts are restricted in their ability to receive data that are protected by HIPAA.16
  • Education Records. The Federal Education Rights and Privacy Act (FERPA) applies to education records, broadly defined as records directly related to a student and maintained by an educational agency or institution or by a party acting for the agency or institution.17 FERPA does provide for the release of de-identified records if certain requirements are met.
  • Alcohol and Substance Abuse Treatment Records. Part 2 of Title 42 of the Code of Federal Regulations protects the confidentiality of alcohol and substance abuse treatment records regardless of who has possession of them, as long as the information was “received or acquired by a federally assisted alcohol or drug program.”18
  • Homelessness Data. Federal law protects the confidentiality of data collected through the Homeless Management Information System, a data collection system that exists in most locations, under the guidance of the U.S. Department of Housing and Urban Development.19

Even with these legal restrictions, government entities with data that fall under these federal regulations have successfully structured agreements that, while preserving the privacy of individuals, allow them to lawfully share data. For example, scholars have linked federal tax records and other data sources in strictly controlled confidential data sites, creating so-called big data about individuals that can be used in scientific analysis.

In addition to federal and professional restrictions on data sharing, vendor services agreements between data keepers and the companies that provide their data management tools may limit sharing. Both courts and legal services providers often use case-management systems that are designed and operated by third-party vendors. Courts and providers may not be aware of who “owns” the data—whether it is themselves or the software vendor—and what authority they have to share the data with others under the terms of their service agreement.

What are the elements of a DUA?

Importantly, DUAs are made between organizations, not individuals. As a result, these agreements become organizational responsibilities and last longer than the tenure of a particular staff person.

The items in Table 1 are typically found in data use agreements.

Appendix A includes a data use agreement template.

Table 1: Items Typically Found in Data Use Agreements

Item Name


Parties involved


The name of the agency or programs entering into the agreement. Note: Be sure to specify who is a data provider and who is a data receiver.

Purpose of the agreement

The reason for the agreement and the allowed uses of the data.

Data description

The fields to be included, the level of detail, and the time period the data represent.

Data transmission

The file format and approved methods for transmission.

Data storage and security

Specifications of any security measures and, if appropriate, a date by which the data should be returned or destroyed.

Conditions for release of data to third parties

Provisions for the release of the file to third parties or prohibitions on such actions.

Conditions for release of results of analysis

Provisions for the release of any data analysis or results, including suppression rules to avoid identification of any individuals or agency names.

Fees and costs

A listing of all fees to be paid, including any associated fees or costs.

Time frame

Time period the agreement is in force and how often it must be renewed.

Amendment process

The process for amendments to the agreement.


The reasons why and the process by which either organization can terminate the agreement.



Signatures by persons who have the right and authority to execute the agreement on behalf of
the contracting agencies.


Limitations of DUAs

DUAs are an important mechanism to give researchers access to civil justice data, but they have significant limitations. They require finding data holders willing to enter into such agreements. In each case, the researchers and data holders need to devote resources to negotiating agreements that will vary from institution to institution and researcher to researcher. This approach is labor intensive, slows down the research process, and creates barriers to cross-jurisdictional comparisons, deterring researchers from entering the field.

A. Individual Consent and Participation in Human Subjects Research

Sometimes the use of data held by a data keeper requires the consent of the individual who is the subject of the record; sometimes this consent is not required. The principles governing the need for informed consent are contained in federal policy guiding the protection of people participating as subjects of research (also known as the Common Rule).20 The Common Rule directs that an IRB will oversee and determine whether individual consent is required. When determining whether individual consent should be obtained from the subjects of the records, the intended use of the data must be considered.

The Common Rule allows for individual consent requirements to be waived when

  • the research involves no more than minimal risk to the subjects;
  • the waiver or alteration will not adversely affect the rights and welfare of the subjects;
  • the research could not practicably be carried out without the waiver or alteration; and
  • whenever appropriate, the subjects will be provided with additional pertinent information after participation.

Researchers have successfully launched research projects that involve obtaining individual consent, in collaboration with courts, legal services providers, and other civil justice data keepers.

Publicly available data such as court records are largely exempt from human subjects’ review.

Most data privacy laws authorize the use of administrative data for public purposes such as evaluation, audit, and research without individual consent under certain conditions. In these cases, although individual identifiers are used to link records across data sets, typically only de-identified information will be released to the researcher, auditor, or evaluator. Where no identifying information is released, individual consent is not necessary.

B. Researchers and Secondary Uses of Data

For administrative data or other data in which individual consent is not required, secondary uses do not typically require IRB approval, but, even when they do, researchers can seek expedited review or exempt status under Exempt Category 4, particularly when the data being analyzed are publicly available.21

In 2017, the Common Rule was modified to permit pooling of data and secondary uses of personally identifying information (PII).22 While the HHS Secretary’s Advisory Committee on Human Research Protections published detailed “Recommendations for Broad Consent Guidance” in 2017,23 no guidance has yet been promulgated by HHS. In the absence of formal guidance, many academic IRBs have declined to implement broad consent within their universities.24

Academic institutions should adopt and promulgate guidance that can make effective use of the revised Common Rule. Guidance and a broad consent template that is directly applicable to civil justice data has been created by the University of Denver.25 The core elements of broad consent are as follows:

  1. Researchers must be able to provide a general description of the types of research that may be conducted with the PII and which types of information might be used in research.
  2. Researchers must be clear about the period of time the PII will be available for future research and the types of institutions or researchers that might have future access to that information.
  3. Research participants must be told that they will not be provided with details about future research studies using the PII.
  4. Participants must be given contact information for questions and the opportunity to withdraw consent. If participants withdraw consent, their data must be able to be taken out of storage and no longer shared.

The technology and data structuring required to isolate and remove individual participants from shared research repositories may be the most significant barrier to the implementation of broad consent. However, this barrier is not insurmountable.




III. Alternative Strategies for Accessing Data

The sharing of administrative record data by the courts and other entities through DUAs is the means by which these data have traditionally been acquired, but it is not the only or necessarily the best way of doing so. The volume of data for civil justice problems is quite limited. But when these matters make their way to courts, they become part of stores of administrative records that date back decades and can provide valuable insights to important questions about access to justice. While courts continue to improve the processes for gathering and managing administrative records, they have not been eager to share these data. Even when the courts are willing, they may not be able to share data in standard formats because of a lack of computing resources or because the laws governing data sharing in specific states can be cumbersome. Negotiation of an acceptable DUA, for example, can take up to two years. Given these limitations, it seems appropriate not only to expand opportunities for sharing but to explore other means of obtaining relevant administrative record data on civil matters from the courts. Specifically, researchers may consider using public records requests and bulk downloading of data, including website data scraping, as means of acquiring administrative record data on the processing of civil matters in courts.

Public Records Requests

The FOIA allows citizens access to information about how the federal government conducts the people’s business, and almost every state has a similar law relevant to state government. These laws specify the processes for requesting this information, and annual reports describe responses to these requests. Reporters have used this lever to great effect, but researchers have not used it to the same degree.

The Criminal Justice Administrative Records System (CJARS) at the University of Michigan is acquiring criminal justice data from the courts and other agencies. It has already obtained over 1.7 billion records from these organizations, approximately 17 percent of which were acquired through data requests. Their holdings include records for more than 21 million persons (47 percent of which were obtained from data requests).26 Measures for Justice is a nonprofit trying to assemble administrative record data from prosecutors and state courts throughout the nation.27 It has assembled data from more than 20 states, and it, too, relies on a mixed strategy of DUAs, information requests, and bulk downloads. While the criminal side of the court system differs from the civil side in many important ways, the success of these efforts on the criminal side are worthy of investigation as a strategy for acquiring the necessary administrative records on civil justice.28

Bulk Download

An alternative to establishing a DUA or submitting FOIA requests is accessing court data through bulk downloads, either by scraping court websites or obtaining access to internal electronic records. State court rules vary on whether bulk downloading of records is permitted, some prohibiting bulk downloads altogether, others requiring permission from the state judiciary, and still others freely allowing the download of electronic files.29

Bulk data downloading is a common procedure already used by data aggregators, investigative journalists, and, increasingly, the academic research community to gather and analyze data available on the Internet. Gathering research data in this manner is considerably less time consuming than the process of establishing formal agreements, though it requires some basic knowledge of computer programming and an ability to process and store large volumes of data. Data scraping, in its most general form, refers to a technique in which a computer program extracts data from output that is generated by another program. For court records, that means culling information from case search websites that were originally designed to display information for individual cases in the same way that physical case files were pulled by court clerks. Scraping programs can be used to search individual cases in rapid succession and capture the information that appears for each search result. These programs can be written narrowly to capture a few pieces of information from each case, or they can be written broadly to capture all data that are displayed and to make copies of associated files.

Although the information architecture for court records varies widely from one jurisdiction to the next, the data that are available can be invaluable for understanding how civil legal problems are processed in U.S. courts. Most case search websites provide unique case numbers, names of litigants, filing and hearing dates, the legal issue in question, names of associated attorneys (or null fields when no attorney is present), names of presiding judges, and judgment details. Many jurisdictions also include the street address of litigants and attorneys. This information can be used to build detailed profiles of the civil legal docket, measure the growth or decline in adjudicated legal matters, detect geographic patterns for different legal issues, identify repeat litigants, and more. In recent years, court data have been used to identify abusive debt collection practices, monitor the eviction crisis, highlight discriminatory practices in civil forfeiture, and focus public attention on the wildly varying fines and fees that are levied in courts.30

Working with data accessed through a bulk download process can be challenging, however. Administrative data often require a tremendous amount of cleaning to be usable in analysis, and documentation is rarely available to explain the contents of the downloaded data or any anomalies that might be found. Analysts should also be careful about the process they use to capture court data. They should read and respect the terms of service listed on case search websites and use programming techniques that dynamically delay the rate of data capture to avoid taxing court servers, particularly during peak business hours.31 But the value of this information to illuminate pressing public policy questions is clear.

The traditional practice of negotiating access between a single research team and a single institution is giving way to new approaches for data access.




IV. Moving toward a Civil Justice Data Commons

In health science research and other fields, data sharing systems, known as “data commons,” have emerged that facilitate access by researchers and stakeholders to data and incorporate best data governance practices to protect data privacy and security and the anonymity of data subjects.32 This model should be applied to the civil justice domain to facilitate the sharing of data by courts, legal services providers, and administrative agencies.33

A civil justice data commons would allow researchers to investigate questions about the basic functioning of the civil justice system, including whether legal representation makes a difference in outcomes and whether race, ethnicity, national origin, gender, or other demographic characteristics contribute to what happens in court. It would also facilitate the linkage of court data with financial, health, educational, and other data sets so that researchers could better understand the antecedents of civil justice problems—providing a basis for downstream interventions to prevent their emergence—and the long-term effects of involvement in the civil justice system on health, economic and housing security, and well-being. A civil justice data commons would also provide civil justice institutions and the public the ability to monitor organizational activities and patterns.

In other domains, data repositories facilitate the sharing of data, and computational science produces insights relevant to public policy. There has been no similar surge of computational social science research in the civil justice field. A civil justice data commons would improve the functioning of courts, agencies, and legal services providers in a community and help communities describe the consequences of civil justice issues like debt and eviction, causally test what effect interventions like legal aid have on parent and child outcomes, and build predictive models for targeting limited legal resources to the households most in need.

How a Data Commons Would Work

To address the different interests of stakeholders, a civil justice data commons would provide a tiered system of “frictionless and facilitated” access to different types of civil justice stakeholders. In the first tier, authorized researchers would have access to cleaned and harmonized data and statistical software packages to do computational analyses. They would also have tools to link data sets from different civil justice institutions as well as data sets from other sources. A civil justice data commons would allow researchers to find, access, and analyze data to answer questions such as, How prevalent are various civil legal problems across jurisdictions? How often do individuals experience multiple civil legal problems simultaneously? Do particular events, such as a job loss or medical debt, increase the likelihood of involvement in the civil justice system? Does having a lawyer make a difference in case outcomes? If so, in what ways? Do inequities exist in court practices, and, if so, how can they be mitigated?

The next tiers would be designed and built to provide courts, legal services providers, and other civil justice institutions with near real-time information to help them understand their functioning and allocate resources equitably and efficiently. A dashboard or visualization might help a court administrator spot an increase in self-represented litigants. Similar tools would allow legal services providers to follow trends that relate to the service delivery model, including patterns in the types of matters they take or decline, applicant and client demographics, and geographic and other gaps in the distribution of their services. They could also use the information captured to demonstrate the effectiveness of their services to funders. Other forms of access might be provided to community groups, which might have an interest in how court involvement affects members of the community. The public could also have some form of access to civil justice data to understand better how the civil justice system works. In all cases, the commons would be designed to provide information tailored to the specific interests of the intended users.

Data commons are created through a governance regime established by the data sharers and other parties involved. The terms of that regime dictate the requirements of data sharing, access, and use. The terms also specify the privacy, confidentiality, and security controls that apply to the data. These requirements are built into the technical infrastructure of the data commons. Educational institutions already have extensive experience administering data commons and ensuring that governance terms are adhered to. The trust they have already accrued as faithful stewards makes them an ideal partner for any new data commons involving civil justice data.

At present, the development of data commons in the civil justice arena faces challenges, including lack of shared case taxonomies, regulatory barriers, organizational skepticism, and costs. But if these barriers can be overcome, a civil justice data commons promises to accelerate the production of knowledge about the civil justice system.34


  • 32Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, and Walt Wells, “A Case for Data Commons: Toward Data Science as a Service,” Computing in Science and Engineering 18 (5) (2016): 10–20, DOI: 10.1109/MCSE.2016.92.
  • 33Margaret Hagan, Jameson Dempsey, and Jorge Gabriel Jiménez, “A Data Commons for Law (Part 1),” Legal Design and Innovation [blog], Medium, April 1, 2019, (accessed December 8, 2020).
  • 34Other models exist for justice and administrative data repositories. In addition to CJARS, models include the NYU Administrative Data Research Facility, the Coleridge Initiative, the District of Columbia’s Statewide Longitudinal Education Data System (, Harvard University’s Dataverse, the Children’s Data Network, ICPSR at the University of Michigan, and South Carolina’s Integrated Data System.