Definition: A data warehouse is an integrated compilation of data from various sources.
Data warehouses often contain both transactional information (data about individual activities) and summarized data covering
multiple years.
The TRACFed warehouse data comes from many U.S. Federal government databases, largely through the Freedom of Information Act. Computerized data are suplemented by
statistical reports, hardcover publications, and interviews with current and former government officials, attorneys, and agency data
processing specialists.
Our warehouse is updated on a regular basis, but the exact schedule is dependent of the release schedule of each agency.
Primary data sources - these sources provide the backbone of the warehouse.
- Executive Office for United States Attorneys
- Administrative Office of the United States Courts
- Office of Personnel Management
- Internal Revenue Service
- Environmental Protection Agency
- Census Bureau
- other specialized agencies
Contextual data - extra data that provide essential background information.
- Geographic boundaries
- Population trends for calculating rates
- Inflation rates for calculating constant / real dollars
- Historical information to capture organizational change
- Changes in data recording
- other contextual data
TRAC's general data procedures
- Obtain the data
- requires background research on agency information systems, informal requests, formal FOIA requests, and sometimes lawsuits
(see TRAC Sues IRS from April 2005)
- Run statistical validity checks (see Judging the Quality of Government Data)
- Prepare the data for entry into the warehouse
- includes initial preparation to intrepret codes, merge files, and more, plus monthly
update preparation of searching for and adding new codes, linking records, etc.
|