The TRACFed Data Warehouse

Definition: A data warehouse is an integrated compilation of data from various sources. Data warehouses often contain both transactional information (data about individual activities) and summarized data covering multiple years.

The TRACFed warehouse data comes from many U.S. Federal government databases, largely through the Freedom of Information Act. Computerized data are suplemented by statistical reports, hardcover publications, and interviews with current and former government officials, attorneys, and agency data processing specialists. Our warehouse is updated on a regular basis, but the exact schedule is dependent of the release schedule of each agency.

Primary data sources - these sources provide the backbone of the warehouse.

  • Executive Office for United States Attorneys
  • Administrative Office of the United States Courts
  • Office of Personnel Management
  • Internal Revenue Service
  • Environmental Protection Agency
  • Census Bureau
  • other specialized agencies

Contextual data - extra data that provide essential background information.

  • Geographic boundaries
  • Population trends for calculating rates
  • Inflation rates for calculating constant / real dollars
  • Historical information to capture organizational change
  • Changes in data recording
  • other contextual data

TRAC's general data procedures

  1. Obtain the data
    • requires background research on agency information systems, informal requests, formal FOIA requests, and sometimes lawsuits (see TRAC Sues IRS from April 2005)
  2. Run statistical validity checks (see Judging the Quality of Government Data)
  3. Prepare the data for entry into the warehouse
    • includes initial preparation to intrepret codes, merge files, and more, plus monthly update preparation of searching for and adding new codes, linking records, etc.

Next: Criminal Enforcement Data

TRAC public access site