State of Connecticut
Data Sharing Playbook

Enabling Data Sharing

The steps below are best practices for agencies to develop an efficient data sharing process.

Identifying who plays each data-related role allows organizations to establish who has the responsibility of fielding external inquiries, designing sharing procedures, and executing requests. The first step in setting up a strong data governance model and maintaining institutional knowledge of the data sharing process is to establish and communicate these roles.

Although the roles below are described separately, the same person may exercise more than one role and may have a separate job title and function.

Agency data officer

Agency data officers serve as the main contact person for inquiries, requests, or concerns regarding access to the data of an agency. The agency data officer, in consultation with the Chief Data Officer and the agency head, establishes procedures to ensure that the agency complies with requests for data in an appropriate and prompt manner.

Section 4-67p of the Connecticut General Statutes defines the role and responsibilities of agency data officers, and a list of agency data officers is published on the Connecticut Open Data Portal.

Data owner

The data owner is accountable for the quality and security of the data and holds the decision-making authority about data within their domain. The data owner varies by database, and there may be multiple data owners.

Data steward

The data steward is responsible for the governance of data and ensures the fitness of content and metadata. Stewards exercise established processes, policies, guidance, compliance, and rules in this effort. They are usually the subject matter experts and data analysts that work with the data on a daily basis.

Legal counsel is a person or team that can evaluate data access and use and help craft appropriate legal agreements when needed.

Privacy and compliance officer

This person or team develops and implements policies and procedures to protect individual rights and comply with federal and state law. The privacy and compliance officer also investigates any data incidents and breaches.

Create and publish a data dictionary.

A publicly-available data dictionary helps requesters understand what data your agency collects and maintains. It can also help them craft requests that reference specific tables and fields, making the request easier to fulfill. The data dictionary should:

  • Describe all of the datasets for which your agency is responsible
  • Contain information on how each of the datasets was collected
  • Define the individual fields in each of the datasets
  • Indicate levels of access for each of the datasets, including which data is already open, which data is restricted, and which should not be used, or is not available

P20 WIN Data Dictionary and metadata policies

P20 WIN has a data dictionary containing the elements available for request. Participating agencies provide updates each year in accordance with the P20 WIN metadata policies and processes.

Document metadata.

Metadata is a set of information that describes the fields in a dataset. It provides data about your data. It includes information such as when and how the data was gathered or any other information that might describe an aspect of the data. It is important to keep detailed notes on the metadata and process by which the data was collected because this information can facilitate easier and more effective use later on.

One common misconception about metadata is that it is solely the definitions of the various fields in a dataset. However, metadata includes much more than these surface-level characteristics. Anything that gives additional information about the nature, structure, or gathering process of the dataset counts as metadata. Some examples of metadata for different types of media include:

  • Photographs / images: date and time the photo was taken, who took the photo, location where the image was captured, and camera settings used to take the photo
  • Books / reports / documents: title, author, publishing information, year of publication, table of contents, index, date of last update / modification, and number of pages
  • Emails / communication records: person sending the communication, person receiving the communication, message text, date and time of correspondence, subject line, IP addresses of sender and responder, and encryption details
  • Spreadsheets / databases: names of column fields, explanation of fields, number of users / respondents surveyed, number of missing data entries, integrity constraints, data types included in the table, and date and time the information was collected (including multiple records if gathered over a period of time)

When tracking metadata, it’s important to:

  • Document as much information as you can about the higher-level aspects of a dataset: its source, update frequency, timestamps of collection, expected level of detail, explanations of tags, data quality, etc.
  • Be consistent about the language you use to describe metadata
  • Avoid acronyms and language that might be specific to you or your agency, since metadata can help recipients of data sharing understand what a dataset is all about

Update Connecticut’s High Value Data Inventory.

Connecticut’s High Value Data Inventory is a data catalog that highlights general information about high-value datasets possessed by state agencies. The annual maintenance of the high value data inventories is required by C.G.S. § 4-67p. At the end of each year, OPM will reach out to agency data officers to provide updates by December 31 of that year. By keeping your agency’s datasets up to date in the catalog, you help other agencies and the public understand what data your agency owns and who to contact for more information.

To update the inventory, email both Scott Gaul and Pauline Zaldonis with the subject line “CT High Value Data Inventory Change Request.”

Review data for implicit biases.

As organizations become more data-driven, data experts are discovering more instances in which unaccounted biases in data perpetuate racism, sexism, and other forms of discrimination.

The data that government agencies, academic researchers, and other organizations collect most likely contain implicit biases. These biases can be introduced due to:

  • Whose data is collected — Does a dataset contain a representative sample of people across different demographics and backgrounds (i.e. multiple races, ethnicities, geographic locations, ages, genders, etc.)?
  • Whose data isn’t collected — Does the data leave out a specific demographic group that might not frequent the service where the data is collected?
  • How the data is collected — For example, is the data collected via interview in one area and via a form somewhere else?

Consider possible sources of bias in your agency’s data carefully. If you do identify possible bias, communicate it to data requesters, and work to reduce it, the decisions made based on your data may have serious unintended societal implications.

Work to eliminate possible sources of bias.

Data analysts are ultimately responsible for how they use your agency’s data; however, as the data owners and experts, you can help data analysts avoid biases in data that perpetuate racism, sexism, and other forms of discrimination.

First, be open about the limitations of the agency’s data to reduce the likelihood that it will be used in ways that have unintended consequences. Second, work towards systemic changes to data collection practices. Finally, require data requesters to demonstrate responsible use of your agency’s data.

Develop a data request process.

A clearly documented data request process can facilitate successful requests. This section covers some of the supporting documents to develop as part of a comprehensive data request process.

Remember that the data request process must abide by the regulations and laws that apply to each dataset. For more detailed information, refer to Establish a privacy policy and the report on Legal Issues in Interagency Data Sharing, including the appendices reviewing state and federal laws and regulations.

Request form

Ensure that the data requester answers the questions below in order to evaluate the benefits and mitigate the risks of sharing data.

  • What is the requester’s contact information and organization?
  • What is the purpose of the request?
  • How does the requester plan to use the data?
  • Who will have access to the data?
  • What are the specific data they are requesting, and what are the specific parameters, such as individual or aggregate data and over what time period?
  • How will the data be used? What methods will be used in the analysis of the data?
  • How will results be reported? With whom will they be shared? How will they be disseminated?
  • How frequently will this data be needed? For example, is this a one-time need or a recurring need?
  • How long is the requester seeking to keep the data? When and how will the data be destroyed? How is this reported or disseminated to the data owner?

Examples

Flow diagram or detailed narrative of the steps

It’s important to have a way to illustrate or describe the data sharing process from start to finish. Common approaches include using a flow diagram or descriptions for each step.

Examples

Data dictionary

A data dictionary describes the agency’s data. (See Create and publish a data dictionary.)

Examples

Data request fees

A request fee schedule communicates the cost of requesting data. Each agency may have unique procedures for enacting request fees. Consult your agency’s legal counsel for specific guidance on fee schedules.

Examples

Enabling data sharing as a member of P20 WIN

Connecticut state agencies can better leverage data for decision-making through P20 WIN’s data governance framework. P20 WIN uses an enterprise framework to facilitate data sharing across participating agencies. By participating in the data governance structure of P20 WIN, state agencies can enable the secure sharing of data to address critical policy questions in the state.

If your agency is not yet a participating agency in P20 WIN, contact Katie Breslin, Outreach and Engagement Coordinator, to learn about how your agency can join P20 WIN.

This playbook is available on GitHub GitHub Project
Connecticut