The Data Sharing Playbook is a resource for those navigating the data sharing process in Connecticut state government. The playbook presents strategies for enabling data sharing, making data requests, responding to requests, and transferring and linking data.
Sharing and integrating data from multiple state agencies can improve program administration, policy analysis, and performance management. Leveraging data from multiple systems can enable a whole-person perspective on data and enhance the ability to to use data to inform decision-making.
The sharing of data across Connecticut state agencies occurs through various data sharing agreements and other frameworks, including the state’s longitudinal data sharing system, P20 WIN.
The Office of Policy and Management (OPM) and its Chief Data Officer are taking steps to make data sharing a more efficient, uniform, and secure process. The 2021 Legal Issues in Interagency Data Sharing Report summarizes recent progress, including building on the success of P20 WIN to expand its data sharing partners and drafting flexible and durable data sharing agreements between agencies and with outside entities.
The data sharing playbook is designed to support Connecticut state agencies in sharing data safely, securely, and ethically. This playbook will continue to be updated by the Data and Policy Analytics unit within OPM. More information about data sharing within P20 WIN and outside of it is summarized below.
P20 WIN is Connecticut’s statewide longitudinal data system and is the mechanism by which data from multiple agencies are matched to address critical policy questions. P20 WIN informs sound policies and practice through the secure sharing of longitudinal data across the participating agencies to ensure that individuals successfully navigate supportive services and educational pathways into the workforce.
P20 WIN has a membership of ten state agencies, institutions of higher education, and nonprofits, including:
All participating agencies sign the Enterprise Memorandum of Understanding (E-MOU), which establishes guidelines for data sharing, governance, and security within the P20 WIN data governance framework. Operating support to P20 WIN is provided by the Office of Policy and Management.
Requests for data from two or more agencies participating in P20 WIN should follow the P20 WIN data request process, summarized on the P20 WIN website. Data requests are prioritized if they align with either a participating agency’s individual research agenda or the P20 WIN Learning Agenda.
Data from state agencies is also shared outside the voluntary P20 WIN governance structure. A summary of data sharing efforts in executive branch agencies is included in the Legal Issues in Interagency Data Sharing Report.
The process for sharing data varies by agency. The strategies outlined in the Preparing a Successful Data Request section provide guidance for navigating the varying data request processes across Connecticut state government.
To learn how to request data from one or more state agencies, refer to these strategies:
If you represent an agency who receives data requests, the sections below describe best practices for enabling data sharing:
To learn about transferring data or linking data from different sources, refer to these strategies:
The data sharing playbook was developed to support Connecticut state agencies in sharing data safely, securely, and ethically. The playbook presents strategies both for those requesting data from state agencies and for the agencies receiving data requests. The strategies presented in this playbook are based on best practices from other states, specific examples and methods from Connecticut state agencies, and the recommendations found in OPM’s report Legal Issues in Interagency Data Sharing. The strategies are focused on interagency data sharing, although many practices will also benefit sharing data with the public.
Note that the playbook does not represent official IT or technology policy for Connecticut.
The playbook was first developed in 2019 by a team from the Connecticut Office of Policy and Management (OPM), the Office of Early Childhood (OEC), and Skylight Digital, a digital consultancy for government, and is being maintained and updated by OPM to reflect changes in the data sharing landscape in Connecticut. The research involved:
The steps below are best practices for making data requests. Data owners are frequently overburdened with daily operations. You can make it as easy as possible for them to fulfill your request by planning carefully and addressing all of the relevant questions up front.
Before you can make an effective data request, you need to know exactly what you are looking for and why. Below are some questions to answer before moving forward.
Knowing what you want before turning to data is a prerequisite to effective data use.
Thinking about what questions you want to answer will help you identify the most relevant metrics. Some examples of research questions for your data could include:
In the age of “big data” it may be tempting to collect as much information as possible and sort through it later. But this approach is counterproductive. Looking at too many metrics may overwhelm the analysis or distract you from your research question. And requesting more data than you need from a state agency may make responding to your request more time consuming. Some methods of reducing the amount of data you have to sift through include:
Before you can request data, you will need to identify the agency and program that’s likely to have the data you are looking for. This data may be found in Connecticut’s High Value Data Inventory, which lists the high value data maintained by each executive branch agency. The high value data inventory should provide the name of the data owner who you can contact about the data you are interested in.
If you can’t find the data you are looking for in the high value data inventory, you can reach out to the Agency Data Officer at the agency that maintains the data you need. Agency data officers are the point people for data requests at each executive branch agency and can help direct your request.
Outside of the governance structure of P20 WIN, each agency has its own process for managing data requests. The agency data officer should be able to provide more information about how data requests are handled at their agency and what next steps you should follow for requesting the data.
To request data through P20 WIN, review the Data Dictionary to identify the data that you will need for your analysis.
Data owners are accountable for the proper use and security of their data. In order to evaluate the risks and benefits of sharing data, they will need an explanation of how you plan to protect and use it. The data owner’s agency will have its privacy and security processes, and your practices will need to comply with them. When requesting data, you should be prepared to explain:
The data owner needs to understand exactly how to fulfill the request. Some useful parameters or filters to consider include:
Ensure that the data owner can fulfill the data request in a reasonable timeframe and with their available resources. If a data request is too taxing on the data owner, they may reject the request until the parameters change or the requester offers additional resources. Consider the time required for crafting and signing a data sharing agreement.
P20 WIN is the state longitudinal data system and is used to answer policy questions, fulfill federal and state reporting requirements, support program review and evaluation, and support research and analysis on a variety of topics. The P20 WIN system has its own data request process for requesting data from two or more participating agencies.
Participating Agencies that are part of P20 WIN include:
Requests for data through P20 WIN should include data from two or more participating agencies and should align with either a participating agency’s individual issue priorities or the P20 WIN Learning Agenda.
To submit a data request through P20 WIN, the requestor should follow the steps summarized below.
Once a data request has been submitted:
More information about the P20 WIN data request process can be found on the P20 WIN website and, with additional detail, in the P20 WIN Data Governance Manual.
The steps below are best practices for agencies to develop an efficient data sharing process.
Identifying who plays each data-related role allows organizations to establish who has the responsibility of fielding external inquiries, designing sharing procedures, and executing requests. The first step in setting up a strong data governance model and maintaining institutional knowledge of the data sharing process is to establish and communicate these roles.
Although the roles below are described separately, the same person may exercise more than one role and may have a separate job title and function.
Agency data officers serve as the main contact person for inquiries, requests, or concerns regarding access to the data of an agency. The agency data officer, in consultation with the Chief Data Officer and the agency head, establishes procedures to ensure that the agency complies with requests for data in an appropriate and prompt manner.
Section 4-67p of the Connecticut General Statutes defines the role and responsibilities of agency data officers, and a list of agency data officers is published on the Connecticut Open Data Portal.
The data owner is accountable for the quality and security of the data and holds the decision-making authority about data within their domain. The data owner varies by database, and there may be multiple data owners.
The data steward is responsible for the governance of data and ensures the fitness of content and metadata. Stewards exercise established processes, policies, guidance, compliance, and rules in this effort. They are usually the subject matter experts and data analysts that work with the data on a daily basis.
Legal counsel is a person or team that can evaluate data access and use and help craft appropriate legal agreements when needed.
This person or team develops and implements policies and procedures to protect individual rights and comply with federal and state law. The privacy and compliance officer also investigates any data incidents and breaches.
A publicly-available data dictionary helps requesters understand what data your agency collects and maintains. It can also help them craft requests that reference specific tables and fields, making the request easier to fulfill. The data dictionary should:
P20 WIN has a data dictionary containing the elements available for request. Participating agencies provide updates each year in accordance with the P20 WIN metadata policies and processes.
Metadata is a set of information that describes the fields in a dataset. It provides data about your data. It includes information such as when and how the data was gathered or any other information that might describe an aspect of the data. It is important to keep detailed notes on the metadata and process by which the data was collected because this information can facilitate easier and more effective use later on.
One common misconception about metadata is that it is solely the definitions of the various fields in a dataset. However, metadata includes much more than these surface-level characteristics. Anything that gives additional information about the nature, structure, or gathering process of the dataset counts as metadata. Some examples of metadata for different types of media include:
When tracking metadata, it’s important to:
Connecticut’s High Value Data Inventory is a data catalog that highlights general information about high-value datasets possessed by state agencies. The annual maintenance of the high value data inventories is required by C.G.S. § 4-67p. At the end of each year, OPM will reach out to agency data officers to provide updates by December 31 of that year. By keeping your agency’s datasets up to date in the catalog, you help other agencies and the public understand what data your agency owns and who to contact for more information.
To update the inventory, email both Scott Gaul and Pauline Zaldonis with the subject line “CT High Value Data Inventory Change Request.”
As organizations become more data-driven, data experts are discovering more instances in which unaccounted biases in data perpetuate racism, sexism, and other forms of discrimination.
The data that government agencies, academic researchers, and other organizations collect most likely contain implicit biases. These biases can be introduced due to:
Consider possible sources of bias in your agency’s data carefully. If you do identify possible bias, communicate it to data requesters, and work to reduce it, the decisions made based on your data may have serious unintended societal implications.
Data analysts are ultimately responsible for how they use your agency’s data; however, as the data owners and experts, you can help data analysts avoid biases in data that perpetuate racism, sexism, and other forms of discrimination.
First, be open about the limitations of the agency’s data to reduce the likelihood that it will be used in ways that have unintended consequences. Second, work towards systemic changes to data collection practices. Finally, require data requesters to demonstrate responsible use of your agency’s data.
A clearly documented data request process can facilitate successful requests. This section covers some of the supporting documents to develop as part of a comprehensive data request process.
Remember that the data request process must abide by the regulations and laws that apply to each dataset. For more detailed information, refer to Establish a privacy policy and the report on Legal Issues in Interagency Data Sharing, including the appendices reviewing state and federal laws and regulations.
Ensure that the data requester answers the questions below in order to evaluate the benefits and mitigate the risks of sharing data.
It’s important to have a way to illustrate or describe the data sharing process from start to finish. Common approaches include using a flow diagram or descriptions for each step.
A data dictionary describes the agency’s data. (See Create and publish a data dictionary.)
A request fee schedule communicates the cost of requesting data. Each agency may have unique procedures for enacting request fees. Consult your agency’s legal counsel for specific guidance on fee schedules.
Connecticut state agencies can better leverage data for decision-making through P20 WIN’s data governance framework. P20 WIN uses an enterprise framework to facilitate data sharing across participating agencies. By participating in the data governance structure of P20 WIN, state agencies can enable the secure sharing of data to address critical policy questions in the state.
If your agency is not yet a participating agency in P20 WIN, contact Katie Breslin, Outreach and Engagement Coordinator, to learn about how your agency can join P20 WIN.
Having an established protocol for responding to a request for data will save you time and effort (see Develop a data request process). The following suggestions will help you respond smoothly to different types of requests.
Establish a process for how key questions are asked, answered, and documented. We’ve included a detailed list of questions to ask in the section on developing a request process, including:
If the requester is planning to combine data with another dataset, this will require careful review and consideration from both teams. This could be a complex process, and we’ve included some discussion of data linking in the Linking Datasets section.
Legal counsel should advise you on the specific type of legal agreement needed to share data. However, the information below can help frame productive conversations with your data-sharing partners.
The type of agreement you will need depends on factors like:
There are multiple types of data sharing mechanisms available to state agencies. Each of them is governed by unique requirements and legal considerations.
Sharing data that is open to the public does not require an agreement. If the requesting party doesn’t need to identify specific individuals, it may be preferable to release the data to the public by publishing it on the Connecticut Open Data Portal (data.ct.gov). To publish data on the open data portal, refer to the publication guidelines developed by OPM.
The following section provides a brief description of these common types of agreement and when to use them:
While each of these agreements has a specific function (and a context in which it is appropriate to use), it is not necessarily the case that an agency looking to share data can solely choose any one of these agreements and move forward. These agreements often work together to provide the full details of the nature of a data sharing agreement (for example, the E-MOU, DSA, and DUA tend to work together rather than operating alone).
MOUs are best suited for ongoing data transfers that have consistent and formalized parameters. An MOU:
MOUs are especially important when the basis for a data sharing relationship is grant funding or a service contract. The process of establishing an “MOU enables potential partners to identify similarities and differences in their priorities and goals, available resources (time, money, and expertise), project timelines, and expected outcomes prior to collaboration.”1
Data Use Agreements (DUAs) or Data Use Licenses (DULs) are best suited for individual data sharing transactions. DUAs precisely specify the parameters for the data transfer, who will have access to the data, the intended use of the data, and how the requester should destroy data.
They may also “include specific time parameters for data use or provide special provisions for data disclosure or requirements for the data holding agency to review resulting research before its publication.”2
An E-MOU is a long-term agreement signed by multiple parties in order to facilitate multiple and diverse data sharing requests. E-MOUs usually:
E-MOUs are mostly used to facilitate government agency to government agency data sharing and have been implemented in multiple states.3
Data Sharing Agreements are best suited for establishing long-term data sharing relationships that involve multiple transfers with different parameters. Data Sharing Agreements identify the involved parties and the terms and conditions for the partnership. They can stand independently or be an addendum to an MOU or E-MOU.
Since it defines an ongoing relationship for multiple transfers, a DSA may also define a process for authorizing data requests along with requirements for storing, protecting, and disposing of shared data.
A Business Associate Agreement is a written arrangement that specifies each party’s responsibilities when it comes to PHI (personal health information). HIPAA requires covered entities to only work with business associates who assure complete protection of PHI.
The statement of work is a detailed overview of the project in all its dimensions. It’s also a way to share what the project entails with those who are working on the project, whether they are collaborating or contracted to work on the project. This includes vendors and contractors who are bidding to work on the project.
A non-disclosure agreement is a binding contract between two or more parties that prevents sensitive information from being shared with any others.
The P20 WIN Enterprise Memorandum of Understanding (Enterprise MOU) outlines the guidelines for data sharing, governance structure, and confidentiality and security requirements for P20 WIN. The Enterprise MOU is signed by all agencies participating in P20 WIN.
Data Sharing Agreements are signed by Participating Agencies, the Data Integration Hub, and Data Recipients for specific data requests. This agreement outlines the responsibilities of all parties, data users, cell suppression policies, fees, and Exhibits. Exhibits include:
The steps below are best practices for protecting the security of data maintained by your agency.
Policies are high-level statements about how data should be handled, similar to a vision statement. Standards outline the rules that govern putting policies into action, and controls provide specific instructions about how to implement a standard.
In order to facilitate secure and compliant data sharing:
A privacy policy is an externally-facing document for the people from whom you might collect data. It explains how your agency uses personal information that may be collected when the public interacts with the agency. The privacy policy should include the types of information gathered, how the information is used, to whom the information is disclosed, and how the information is safeguarded.
Here are some of the questions to ask when you document a privacy policy:
Confidential Information (CI) is any non-public information pertaining to the agency’s business. Personally identifiable information (PII) is any data that can be used to identify an individual. Examples of PII include a user’s name, address, phone number, and social security number.
Data owners should also document subsets of PII, such as:
State agencies need to understand the laws that govern each dataset based on its CI and PII. The standards and laws that govern data are critical in order to know:
For more information about applicable federal and state laws, refer to the Legal Issues in Interagency Data Sharing report and accompanying appendices.
Define acceptable use standards based on the laws and regulations that govern the use of your agency’s data. These standards will help define the specific requirements in data sharing agreements for keeping data secure. For example, for sensitive data, the data owner may require that the requesting party dispose of the data after a specific amount of time.
Your agency will need legal assistance creating a comprehensive data-security program that adequately protects CI. The program will need to be consistent with and comply with all applicable federal and state laws and written policies related to protecting CI.
The data-security program should cover considerations like:
A control is a safeguard to avoid, detect, or minimize security risks that might compromise the confidentiality, integrity, and accessibility of data. For example, a data owner might require a quarterly review of all users with access to a database or that people working with the data undergo compliance training.
This section will give an overview of the steps involved in transferring data as part of a data request.
Depending on the data request, the data owner may need to de-identify data in order to protect the privacy and rights of the individuals represented in the data. There are a number of ways to de-identify data, and these are summarized below.
One way to de-identify data is to remove all of the fields that could be used to identify a specific individual from the data. Examples include names, phone numbers, and birthdays. (For more information about confidential data, see the section Document critical data elements.)
Data owners can also choose to aggregate data. This is accomplished by providing counts of specific fields for a dataset. For example, sensitive fields like birthday and address can be converted to age range and zip code in order to provide the counts of each age group living in a specific area.
When aggregating data, it’s important to ensure that groups aren’t split up so much that it’s still possible to identify individuals. For example, if you’re aggregating based on school, test scores, grade, and race and ethnicity, the counts can’t be small enough for someone to identify individual students.
Once the parties have agreed to share data, it’s time to consider the logistics of transferring the data. The method will vary based on the sensitivity of the data.
Data that is open to the public doesn’t require a secure channel for data transfer. Some options that might be suited for file transfers are:
Technology | File size limits | Usage notes |
---|---|---|
Email (ct.gov and po.state.ct.us) | 20MB, 35MB | 35MB but depends on the recipients size limit also, they could have a 20MB limit. |
Microsoft Office 360 OneDrive (ct.gov and po.state.ct.us) | 100GB | |
Approved external device | varies | Ask IT department for more information |
Shared network drive | varies | Ask IT department for more information |
All data that isn’t open to the public should be transferred through secure channels. These data include data governed by HIPAA, FERPA, or state laws and data that are confidential, subject to misuse, or simply not authorized for public consumption due to outstanding approval.
Failure to transfer non-public data securely may result in harm to citizens, lawsuits filed against the responsible government office, and severe professional consequences for the offending employee. It’s important to pay careful attention when sharing non-public data. Secure channels include:
Technology | File size limits | Usage notes |
---|---|---|
Government-approved SFTP service |
|
|
Government-approved Encrypted External Drive | varies |
|
To accelerate secure data transfer, zip and encrypt data files before initiating a data transfer.
To zip a file:
To encrypt a file:
The Department of Administrative Services Bureau of Information Technology Solutions (BITS) offers Executive Branch agencies an Enterprise Secure File Transport (SFT) Service for agencies that need to share sensitive content between other agencies, or business partners in a secure manner.
More information can be found on the DAS website here. The website states that this service offers the following:
Use of our Enterprise Secure File Transport Service has the following conditions:
See Appendix A for a step by step guide to using SFTP.
Data linking or data matching is the process of combining two or more datasets. It allows program administrations to provide more integrated and client-friendly government services.
Data linking also provides policy analysts and researchers a wider lens to draw insights and improve services. There are two ways to link datasets — deterministic and probabilistic.
Deterministic data linking combines individual records only if the fields that are being compared match exactly. For example, two agencies could use social security numbers to combine their datasets. This type of data linking is most suitable when both datasets have a consistent, unique identifier.
Probabilistic data linking combines individual records using a special algorithm that compares multiple fields to determine if two records are the same entity. For example, P20 WIN’s data linking process uses identifiers such as name, birthday, and other fields present in both datasets to combine datasets. Probabilistic data linking is best suited for datasets that don’t have a unique identifier. It’s also best suited when two datasets have a unique identifier that’s inconsistently present or untrusted.
When comparing two datasets, checking every single possible pair is computationally taxing. For example, two datasets each containing 100 records would require 10,000 pairwise comparisons. This computational reality quickly becomes unmanageable when linking larger administrative datasets.
Blocking solves this computational challenge by only comparing pairs that are likely to match according to particular fields. For example, by using age for blocking, only records with the same birth year are compared to each other. A common strategy is to run multiple block passes because some fields are missing or erroneous. The Australian Government’s Open Data Toolkit provides a table of fields to consider when blocking.
There are two types of matching errors that arise when linking records. The first is a false negative, which implies two records that fail to meet the set matching criteria actually are a match. The second is a false positive, which is when two records meet the set matching criteria when they are in fact not a match. There is a tradeoff between false negatives and false positives when determining the stringency of any matching criteria. A stricter matching criteria will decrease the number of false positives but increase the number of false negatives, while a more lenient matching criteria will increase the number of false positives but decrease the number of false negatives. The context of the data match provides guidance on which type of error to minimize.
It is very difficult to capture the true number of false negatives and false positives in a data merge. Researchers have implemented creative methods for assessing the accuracy of record linkage algorithms. For example, researchers at the University of Michigan tested a supervised learning, record linkage algorithm by training it on a large, novel dataset that includes biometric identifiers (fingerprints) to construct unbiased measures of error. Other researchers have examined the performance of widely-used record linking algorithms with hand-linked datasets and synthetic datasets. Synthetic datasets have known errors introduced to fields of interest.
P20 WIN’s data linking process uses identifiers such as name, birthday, and other fields present in both datasets to combine datasets. This video provides more details about P20 WIN’s data linking process. Additional detail can be found in the P20 WIN Data Governance Manual key processes.
DAS/BEST offers an Enterprise Secure File Transport (SFT) Service for executive branch agencies, as described in the Transferring Data section. The SFT service can be used to securely transfer data between state agencies, or other business partners, in a secure manner. The steps for using the SFT service to transfer data are described below.
Identify the SFT Liaison at your agency. SFTP service is available to the executive branch, constitutional offices, and quasi-public agencies and their business partners. Agencies who have enrolled in the service should have appointed an SFT Liaison, who is responsible for administering the SFT service at each agency. If you are not sure who the SFT Liaison is at your agency, your IT office might be able to help find them.
Ask your SFT Liaison to set you up with SFTP access. They should be able to grant you access and tell you how to log on. Your username and password should be the same as the ones you use to log into Windows, but you should confirm this with your SFT Liaison.
Log into the SFTP site. Once your account has been set up, go to https://sft.ct.gov and log in. Be aware that the username and password are case sensitive.
Work with your SFT Liaison to set up folders to organize the files you wish to transfer. The SFT Liaison can set up folders for specific data transfers, and then grant access to the desired folders to the recipient of your data transfer.
Make sure the person you want to share data with also has SFTP access. If the person with whom you want to share data is at another Connecticut state agency, their SFT Liaison must grant them access to the SFTP site. Once they have access, your SFT Liaison should be able to give them access to the folder with the files you wish to share. Your SFT Liaison should also be able to help you share files with a partner outside of state government.
Ask your SFT Liaison to give the data recipient access to the folder on the SFTP site where your data is saved. Once you both have access to the shared folder, you should be able to use that folder to transfer files.
Upload the data that you want to share into the designated folder. Click the “Upload” button and select the file that you want to share. The person you want to share data with should also now have access to the file via your shared folder.
If you have difficulties using the SFTP site, ask your SFT Liaison for help, or call DAS/BEST at 860-622-2300 and select Option 9.
This page highlights resources from different organizations about data sharing and related topics.