Identifying who plays each data-related role allows organizations to establish who has the responsibility of fielding external inquiries, designing sharing procedures, and executing requests. The first step in setting up a strong data governance model and maintaining institutional knowledge of the data sharing process is to establish and communicate these roles.
Although the roles below are described separately, the same person may exercise more than one role and may have a separate job title and function. For example, a single contractor may act as both the steward and custodian for the agency’s data. In small agencies, the data officer may also fulfill the data owner, custodian, and steward roles. The important takeaway is that agency leadership and the team need to know who’s the go-to for each set of responsibilities in the data system.
Agency data officers serve as the main contact person for inquiries, requests, or concerns regarding access to the data of an agency. The agency data officer, in consultation with the Chief Data Officer and the executive agency head, establishes procedures to ensure that the agency complies with requests for data in an appropriate and prompt manner. 1
The role of agency data officer was established in Connecticut by Public Act 18-175, which required executive branch agencies to designate an employee to serve as the primary contact for inquiries, requests, and concerns about access to data at their agency, and to be responsible for implementing the provisions of P.A. 18-175. See the list of agency data officers.
The data owner is accountable for the quality and security of the data and holds the decision-making authority about data within their domain. The data owner varies by dataset, and there may be multiple data owners. For some datasets, a data officer is the data owner, and in others, it may be the program lead.
The data steward is responsible for the governance of data and ensures the fitness of content and metadata. Stewards exercise established processes, policies, guidance, compliance, and rules in this effort. They are usually the subject matter experts and data analysts that work with the data on a daily basis.
A data custodian is responsible for the technology used to store and transport the data. The role can be filled by either a person or team, and data custodians are usually database administrators, data analysts, or software engineers.
Legal counsel is a person or team that can evaluate data access and use and help craft appropriate legal agreements when needed.
This person or team develops and implements policies and procedures to protect individual rights and comply with federal and state law. The privacy and compliance officer also investigates any data incidents and breaches.
A publicly-available data dictionary helps requesters understand what data your agency collects and possesses. It can also help them craft requests that reference specific tables and fields, making the request easier to fulfill. The data dictionary should:
Metadata is a set of information that describes the fields in a dataset. It provides data about your data. It includes information such as when and how the data was gathered or any other information that might describe an aspect of the data. It is important to keep detailed notes on the metadata and process by which the data was collected because this information can facilitate easier and more effective use later on.
One common misconception about metadata is that it is solely the definitions of the various fields in a dataset. However, metadata includes much more than these surface-level characteristics. Anything that gives additional information about the nature, structure, or gathering process of the dataset counts as metadata. Some examples of metadata for different types of media include:
When tracking metadata, it’s important to:
Connecticut’s High Value Data Inventory (Non GIS) and Connecticut’s High Value Data Inventory (GIS) are data catalogs that highlight general information about high-value datasets possessed by state agencies. The annual maintenance of the high value data inventories is required by P.A. 18-175. At the end of each year, OPM will reach out to agencies to provide updates by December 31 of that year. By keeping your agency’s datasets up to date in the catalog, you help other agencies and the public understand what data your agency owns and who to contact for more information.
To update the inventory, email both scott.gaul@ct.gov and pauline.zaldonis@ct.gov with the subject line “CT High Value Data Inventory Change Request.”
As organizations become more data-driven, data experts are discovering more instances in which unaccounted biases in data perpetuate racism, sexism, and other forms of discrimination.
The data that government agencies, academic researchers, and other organizations collect most likely contain implicit biases. These biases can be introduced due to:
Consider possible sources of bias in your agency’s data carefully. If you don’t identify possible bias, communicate it to data requesters, and work to reduce it, the decisions made based on your data may have serious unintended societal implications.
Researchers discovered that a major health provider’s algorithm favored white patients over black patients when deciding who would benefit from extra medical care. The researchers attributed the algorithm’s bias to the data that was used to create it. Researchers noted that the health provider attempted to prevent bias by omitting the patient’s race in the algorithm. Nevertheless, the algorithm amplified underlying inequities in access to healthcare. In the US, white patients incur more medical costs than black patients due to long-standing disparities in wealth and access to healthcare. Because of this difference in access to care, the algorithm perpetuated the disparity by determining that white patients would benefit more from extra medical care than sicker black patients. 2
Data analysts are ultimately responsible for how they use your agency’s data; however, as the data owners and experts, you can help data analysts avoid biases in data that perpetuate racism, sexism, and other forms of discrimination.
First, be open about the limitations of the agency’s data to reduce the likelihood that it will be used in ways that have unintended consequences. Second, work towards systemic changes to data collection practices. Finally, require data requesters to demonstrate responsible use of your agency’s data. The recommended reading below provides guidance on identifying and eliminating sources of implicit bias.
A clearly documented data request process can facilitate successful requests. This section covers some of the supporting documents to develop as part of a comprehensive data request process.
Remember that the data request process must abide by the regulations and laws that apply to each dataset. For more detailed information, refer to Establish a privacy policy and the report on Legal Issues in Interagency Data Sharing, including the appendices reviewing state and federal laws and regulations.
Ensure that the data requester answers the questions below in order to evaluate the benefits and mitigate the risks of sharing data.
It’s important to have a way to illustrate or describe the data sharing process from start to finish. Common approaches include using a flow diagram or descriptions for each step.
A data dictionary describes the agency’s data. (See Create and publish a data dictionary.)
A request fee schedule communicates the cost of requesting data. Both the Department of Public Health and P20-WIN have fee schedules, but each agency may have unique procedures for enacting request fees. We recommend that you consult your agency’s legal counsel for specific guidance on fee schedules.