DHS Data Warehouse

The DHS Data Warehouse is a central repository of social and human services data related to DHS clients and the services they receive through DHS as well as through a number of other public entities. DHS created the Data Warehouse by consolidating its internal human services data (e.g., behavioral health, child welfare, intellectual disability, homelessness and aging). Over time, the warehouse expanded to include data from other sources. The Data Warehouse now includes data from 29 sources (e.g., DHS, PA Department of Human Services, Allegheny County and City of Pittsburgh Housing Authorities, almost 20 local school districts, the Allegheny County Medical Examiner, and the criminal justice system) and contains more than a billion records from over one million distinct clients.

The Data Warehouse was created in 1999 with support from the Human Service Integration Fund, a flexible funding pool created by a coalition of local foundations for the purpose of supporting integration and innovation within DHS. It was an important step in the ongoing integration of DHS, which was created in 1997 by the consolidation of previously-separate and independent county departments.

Integrating Data Sources

Each new data source added to the Data Warehouse involves a process that requires the development of trust and a shared vision as well as coordinating details such as the form in which data will be provided. Most partners send information weekly, and it is loaded into the Data Warehouse through an Extract, Transform and Load (ETL) Platform. The ETL is set up to accept data in different formats and load them into the central data area. At DHS, a team of programmers use IBM DataStage to create each ETL, and Oracle database management software to store it. Setting up the ETL is the most complex function involved; it accounts for about 80 percent of the technological work of the Data Warehouse.

Once client data are loaded into the Data Warehouse, each client is assigned a unique identifying number. In this way, all client-specific information can be pulled together to provide a comprehensive picture of client needs. It also ensures that individuals are not counted more than once.

The Data Warehouse requires ongoing data quality management. This is accomplished by an administrator who coordinates data, a support team that loads information, and an operational team that does weekly maintenance and performs data archiving.

Using Integrated Data

Data warehouses, also referred to as integrated data systems, are often used for purely research purposes or for one-shot purposes. The difference between those and the DHS Data Warehouse is that the DHS system was envisioned primarily as a management decision-making tool rather than a research tool. Opportunities for retrieving and utilizing the data are endless. They range from client-specific to system-wide and include being able to see the impact of a specific intervention or a practice model on a group of clients, determining the return-on-investment of a particular program, evaluating the effectiveness of a risk prediction tool, or comparing program effectiveness across a system.

Take the example of a child welfare caseworker who is responsible for making decisions that will impact a child’s safety and wellbeing. In the absence of integrated data, that caseworker must depend solely on what the child and family says and what is observable during a home visit. But with access to all of the data in the Data Warehouse, that caseworker can get reliable and immediate information about the family’s history with other social services, mental health treatment, involvement with the criminal justice system, whether the child is regularly attending school, and other issues that may pose a significant risk to that child.

A significant milestone in the use of the Data Warehouse occurred in 2009, with the creation of a data-sharing agreement with the Pittsburgh Public School district. Since then, similar agreements have been established with almost 20 local school districts, allowing for service and system activities designed to improve educational and well-being outcomes for school-aged children involved in human services. The DHS Methods page of the Education Resources section and Education page of the Research and Reports section of our website have more information about these data-sharing agreements.

The Data Warehouse allows department leadership to answer questions such as “Who are our clients?” “Are we duplicating services?” “How can we maximize resources by funding the best-performing services and strengthening others?” and make informed, data-driven decisions about service and resource allocation. Furthermore, these decisions can be made in the larger context of population, workforce and income trends that may affect the vulnerable populations served by DHS.

The charts and tables that can be configured from the Data Warehouse are used internally for operations and compliance reporting, as well as for program management and improvement, evaluating the effectiveness and quality of services, monitoring fiscal efficiencies, decision-making and predictive analytics. They are also available as a community resources for external researchers. Within strictly-monitored privacy and confidentiality guidelines, DHS releases aggregate data (i.e., without markers that would allow for identification of individuals) to students, universities and other organizations that are conducting research in areas determined to benefit the field or a specific group.

To Learn More About the Data Warehouse

The DHS Data Warehouse has been studied extensively, written about in a number of publications and recognized for its innovation in information technology. A selection of articles that appeared in national publications, as well as a podcast and a DHS-presented paper about the Data Warehouse, are available below.

How Allegheny County’s Data Warehouse is improving human services through integrated data
GovInnovator podcast, February 17, 2016

Data Warehouses: Using New Technology to Improve Human Services Administration
Re-issue of June 11 article below
Government Technology, June 12, 2014

Allegheny County, Pennsylvania: Department of Human Services’ Data Warehouse
Data-Smart City Solutions, Harvard University, June 11, 2014

Gaining Ground: A Guide to Facilitating Technology Innovation in Human Services
Data-Smart City Solutions, Harvard University, May 28, 2014

Allegheny County’s Data Warehouse: Leveraging Data to Enhance Human Service Programs and Policies
University of Pennsylvania, May 2014

Human Services: Sustained and Coordinated Efforts Could Facilitate Data-Sharing While Protecting Privacy
U.S. Government Accountability Office, February 2013.

Data Warehousing, Flow Models and Public Policy
This paper, presented at the 28th Annual APPAM Research Conference in November 2006, describes the context and need for integrated data in human services, data warehousing technology and its unique challenges in the public sector, innovative data warehouse applications that have arisen in part from joint projects with universities, and future opportunities for data warehousing