Introduction

The Data Sharing Playbook is a resource for those navigating the data sharing process in Connecticut state government. The playbook presents strategies for enabling data sharing, making data requests, responding to requests, and transferring and linking data.

Data sharing in Connecticut

Sharing and integrating data from multiple state agencies can improve program administration, policy analysis, and performance management. Leveraging data from multiple systems can enable a whole-person perspective on data and enhance the ability to to use data to inform decision-making.

The sharing of data across Connecticut state agencies occurs through various data sharing agreements and other frameworks, including the state’s longitudinal data sharing system, P20 WIN.

The Office of Policy and Management (OPM) and its Chief Data Officer are taking steps to make data sharing a more efficient, uniform, and secure process. The 2021 Legal Issues in Interagency Data Sharing Report summarizes recent progress, including building on the success of P20 WIN to expand its data sharing partners and drafting flexible and durable data sharing agreements between agencies and with outside entities.

The data sharing playbook is designed to support Connecticut state agencies in sharing data safely, securely, and ethically. This playbook will continue to be updated by the Data and Policy Analytics unit within OPM. More information about data sharing within P20 WIN and outside of it is summarized below.

Data sharing through P20 WIN

P20 WIN is Connecticut’s statewide longitudinal data system and is the mechanism by which data from multiple agencies are matched to address critical policy questions. P20 WIN informs sound policies and practice through the secure sharing of longitudinal data across the participating agencies to ensure that individuals successfully navigate supportive services and educational pathways into the workforce.

P20 WIN has a membership of ten state agencies, institutions of higher education, and nonprofits, including:

  • State Department of Education (SDE)
  • Connecticut State Colleges and Universities (CSCU)
  • University of Connecticut (UConn)
  • Department of Labor (DOL)
  • Connecticut Conference of Independent Colleges (CCIC)
  • Office of Early Childhood (OEC)
  • Office of Higher Education (OHE)
  • Department of Social Services (DSS)
  • Department of Children and Families (DCF)
  • Connecticut Coalition to End Homelessness (CCEH)

All participating agencies sign the Enterprise Memorandum of Understanding (E-MOU), which establishes guidelines for data sharing, governance, and security within the P20 WIN data governance framework. Operating support to P20 WIN is provided by the Office of Policy and Management.

Requests for data from two or more agencies participating in P20 WIN should follow the P20 WIN data request process, summarized on the P20 WIN website. Data requests are prioritized if they align with either a participating agency’s individual research agenda or the P20 WIN Learning Agenda.

Other data sharing arrangements

Data from state agencies is also shared outside the voluntary P20 WIN governance structure. A summary of data sharing efforts in executive branch agencies is included in the Legal Issues in Interagency Data Sharing Report.

The process for sharing data varies by agency. The strategies outlined in the Preparing a Successful Data Request section provide guidance for navigating the varying data request processes across Connecticut state government.

How to use this playbook

To learn how to request data from one or more state agencies, refer to these strategies:

If you represent an agency who receives data requests, the sections below describe best practices for enabling data sharing:

To learn about transferring data or linking data from different sources, refer to these strategies:

About the data sharing playbook

The data sharing playbook was developed to support Connecticut state agencies in sharing data safely, securely, and ethically. The playbook presents strategies both for those requesting data from state agencies and for the agencies receiving data requests. The strategies presented in this playbook are based on best practices from other states, specific examples and methods from Connecticut state agencies, and the recommendations found in OPM’s report Legal Issues in Interagency Data Sharing. The strategies are focused on interagency data sharing, although many practices will also benefit sharing data with the public.

Note that the playbook does not represent official IT or technology policy for Connecticut.

Methodology

The playbook was first developed in 2019 by a team from the Connecticut Office of Policy and Management (OPM), the Office of Early Childhood (OEC), and Skylight Digital, a digital consultancy for government, and is being maintained and updated by OPM to reflect changes in the data sharing landscape in Connecticut. The research involved:

  • User interviews with 13 data sharing practitioners across Connecticut state agencies
  • Research into best practices from data sharing experts
  • Research into case studies from other states
  • A detailed technology tool review to document the secure channels for data transfer available to Connecticut state agencies.

Preparing a Successful Data Request

The steps below are best practices for making data requests. Data owners are frequently overburdened with daily operations. You can make it as easy as possible for them to fulfill your request by planning carefully and addressing all of the relevant questions up front.

Design questions that can be answered with data.

Before you can make an effective data request, you need to know exactly what you are looking for and why. Below are some questions to answer before moving forward.

“What is the overarching goal or objective that I want to accomplish?”

Knowing what you want before turning to data is a prerequisite to effective data use.

“What do I want to find out or measure?”

Thinking about what questions you want to answer will help you identify the most relevant metrics. Some examples of research questions for your data could include:

  • To what extent is the service provided by [Program X] reaching constituents?
  • Do budget allocations meet needs? Are budget allocations used by the populations in need?
  • How has enrollment in [Program X] changed over the last five years?
  • How did [Event X] impact residents of Connecticut?

“How much data do I actually need?”

In the age of “big data” it may be tempting to collect as much information as possible and sort through it later. But this approach is counterproductive. Looking at too many metrics may overwhelm the analysis or distract you from your research question. And requesting more data than you need from a state agency may make responding to your request more time consuming. Some methods of reducing the amount of data you have to sift through include:

  • Consider the time frame. Looking at smaller windows of time can provide clues about how better to dissect the bigger picture without having to sort through as much information.
  • Identify the fields that you need for your analysis. By having a clear understanding of what you want to measure, you can limit the number of fields in your data request.
  • Build some redundancy into the data you request. Once you’ve identified the information you need, consider how you will work around a record that’s missing data in a field. Are there other fields you could use as a proxy? Add these fields to your request.
  • Use derived statistics whenever possible. If you know what question you’d like to answer, you should be able to define specific stats that provide an answer. Using a limited selection of fields, you can compute averages, differences, percentages, and other derived statistics that construct these measurements out of less data while still answering your question.

Identify the data.

Before you can request data, you will need to identify the agency and program that’s likely to have the data you are looking for. This data may be found in Connecticut’s High Value Data Inventory, which lists the high value data maintained by each executive branch agency. The high value data inventory should provide the name of the data owner who you can contact about the data you are interested in.

If you can’t find the data you are looking for in the high value data inventory, you can reach out to the Agency Data Officer at the agency that maintains the data you need. Agency data officers are the point people for data requests at each executive branch agency and can help direct your request.

Outside of the governance structure of P20 WIN, each agency has its own process for managing data requests. The agency data officer should be able to provide more information about how data requests are handled at their agency and what next steps you should follow for requesting the data.

To request data through P20 WIN, review the Data Dictionary to identify the data that you will need for your analysis.

Explain your objective and the details of your plan clearly.

Data owners are accountable for the proper use and security of their data. In order to evaluate the risks and benefits of sharing data, they will need an explanation of how you plan to protect and use it. The data owner’s agency will have its privacy and security processes, and your practices will need to comply with them. When requesting data, you should be prepared to explain:

  • What your objective is (i.e. the question you would like to answer)
  • How a partnership with the partner agency can you help answer that question, and how that question is a shared inquiry for both agencies
  • What data you need to study your objective (including as much specific discussion of the fields of interest as possible)
  • How you will account for potential sources of implicit bias in the data you are requesting. See Review data for implicit biases for more information.
  • Who will have access to the data if you receive it
  • How you will ensure that the data is handled safely, ethically, and securely
  • The timeline with which you hope to answer the question and analyze the data
  • How communication between your agency and your partner agency will take place (should there be weekly check-ins about the data usage? Reports filed about data activity? etc.)

Specify the parameters of the request.

The data owner needs to understand exactly how to fulfill the request. Some useful parameters or filters to consider include:

  • The date range
  • Specific fields or columns you are requesting
  • Specific datasets or databases
  • Filters such as age, people tied to a specific program, or geography

Keep the scope and timeframe realistic.

Ensure that the data owner can fulfill the data request in a reasonable timeframe and with their available resources. If a data request is too taxing on the data owner, they may reject the request until the parameters change or the requester offers additional resources. Consider the time required for crafting and signing a data sharing agreement.

Requesting data through P20 WIN

P20 WIN is the state longitudinal data system and is used to answer policy questions, fulfill federal and state reporting requirements, support program review and evaluation, and support research and analysis on a variety of topics. The P20 WIN system has its own data request process for requesting data from two or more participating agencies.

Participating Agencies that are part of P20 WIN include:

  • State Department of Education (SDE)
  • Connecticut State Colleges and Universities (CSCU)
  • University of Connecticut (UConn)
  • Department of Labor (DOL)
  • Connecticut Conference of Independent Colleges (CCIC)
  • Office of Early Childhood (OEC)
  • Office of Policy and Management (OPM)
  • Office of Higher Education (OHE)
  • Department of Social Services (DSS)
  • Department of Children and Families (DCF)
  • Connecticut Coalition to End Homelessness (CCEH)

Requests for data through P20 WIN should include data from two or more participating agencies and should align with either a participating agency’s individual issue priorities or the P20 WIN Learning Agenda.

To submit a data request through P20 WIN, the requestor should follow the steps summarized below.

  1. Submit a written proposal that outlines the request and data needed to the P20 WIN staff at OPM.
  2. Once the P20 WIN Outreach and Engagement Coordinator reviews the data request proposal and determines viability, the data requestor will receive a Data Sharing Request Form to complete, which will then be reviewed by P20 WIN staff for completeness and alignment with the P20 WIN Learning Agenda.

Once a data request has been submitted:

  1. Data Governing Board members whose agency’s data are included in the data request review and determine if the requested data may be shared. The request is then approved, rejected, or recommended to be modified.
  2. The Outreach and Engagement Coordinator communicates the decision from the Data Governing board to the data requestor.
  3. If the request is approved, the Outreach and Engagement Coordinator drafts the data sharing agreement, circulates for signatures, and collects confidentiality and related documents according to the data request.
  4. P20 WIN staff and data stewards coordinate the production and secure transfer of data files for the data matching process to the data requestor.
  5. The data requestor conducts their analysis using the data files provided.

More information about the P20 WIN data request process can be found on the P20 WIN website and, with additional detail, in the P20 WIN Data Governance Manual.

Enabling Data Sharing

The steps below are best practices for agencies to develop an efficient data sharing process.

Identifying who plays each data-related role allows organizations to establish who has the responsibility of fielding external inquiries, designing sharing procedures, and executing requests. The first step in setting up a strong data governance model and maintaining institutional knowledge of the data sharing process is to establish and communicate these roles.

Although the roles below are described separately, the same person may exercise more than one role and may have a separate job title and function.

Agency data officer

Agency data officers serve as the main contact person for inquiries, requests, or concerns regarding access to the data of an agency. The agency data officer, in consultation with the Chief Data Officer and the agency head, establishes procedures to ensure that the agency complies with requests for data in an appropriate and prompt manner.

Section 4-67p of the Connecticut General Statutes defines the role and responsibilities of agency data officers, and a list of agency data officers is published on the Connecticut Open Data Portal.

Data owner

The data owner is accountable for the quality and security of the data and holds the decision-making authority about data within their domain. The data owner varies by database, and there may be multiple data owners.

Data steward

The data steward is responsible for the governance of data and ensures the fitness of content and metadata. Stewards exercise established processes, policies, guidance, compliance, and rules in this effort. They are usually the subject matter experts and data analysts that work with the data on a daily basis.

Legal counsel is a person or team that can evaluate data access and use and help craft appropriate legal agreements when needed.

Privacy and compliance officer

This person or team develops and implements policies and procedures to protect individual rights and comply with federal and state law. The privacy and compliance officer also investigates any data incidents and breaches.

Create and publish a data dictionary.

A publicly-available data dictionary helps requesters understand what data your agency collects and maintains. It can also help them craft requests that reference specific tables and fields, making the request easier to fulfill. The data dictionary should:

  • Describe all of the datasets for which your agency is responsible
  • Contain information on how each of the datasets was collected
  • Define the individual fields in each of the datasets
  • Indicate levels of access for each of the datasets, including which data is already open, which data is restricted, and which should not be used, or is not available

P20 WIN Data Dictionary and metadata policies

P20 WIN has a data dictionary containing the elements available for request. Participating agencies provide updates each year in accordance with the P20 WIN metadata policies and processes.

Document metadata.

Metadata is a set of information that describes the fields in a dataset. It provides data about your data. It includes information such as when and how the data was gathered or any other information that might describe an aspect of the data. It is important to keep detailed notes on the metadata and process by which the data was collected because this information can facilitate easier and more effective use later on.

One common misconception about metadata is that it is solely the definitions of the various fields in a dataset. However, metadata includes much more than these surface-level characteristics. Anything that gives additional information about the nature, structure, or gathering process of the dataset counts as metadata. Some examples of metadata for different types of media include:

  • Photographs / images: date and time the photo was taken, who took the photo, location where the image was captured, and camera settings used to take the photo
  • Books / reports / documents: title, author, publishing information, year of publication, table of contents, index, date of last update / modification, and number of pages
  • Emails / communication records: person sending the communication, person receiving the communication, message text, date and time of correspondence, subject line, IP addresses of sender and responder, and encryption details
  • Spreadsheets / databases: names of column fields, explanation of fields, number of users / respondents surveyed, number of missing data entries, integrity constraints, data types included in the table, and date and time the information was collected (including multiple records if gathered over a period of time)

When tracking metadata, it’s important to:

  • Document as much information as you can about the higher-level aspects of a dataset: its source, update frequency, timestamps of collection, expected level of detail, explanations of tags, data quality, etc.
  • Be consistent about the language you use to describe metadata
  • Avoid acronyms and language that might be specific to you or your agency, since metadata can help recipients of data sharing understand what a dataset is all about

Update Connecticut’s High Value Data Inventory.

Connecticut’s High Value Data Inventory is a data catalog that highlights general information about high-value datasets possessed by state agencies. The annual maintenance of the high value data inventories is required by C.G.S. § 4-67p. At the end of each year, OPM will reach out to agency data officers to provide updates by December 31 of that year. By keeping your agency’s datasets up to date in the catalog, you help other agencies and the public understand what data your agency owns and who to contact for more information.

To update the inventory, email both Scott Gaul and Pauline Zaldonis with the subject line “CT High Value Data Inventory Change Request.”

Review data for implicit biases.

As organizations become more data-driven, data experts are discovering more instances in which unaccounted biases in data perpetuate racism, sexism, and other forms of discrimination.

The data that government agencies, academic researchers, and other organizations collect most likely contain implicit biases. These biases can be introduced due to:

  • Whose data is collected — Does a dataset contain a representative sample of people across different demographics and backgrounds (i.e. multiple races, ethnicities, geographic locations, ages, genders, etc.)?
  • Whose data isn’t collected — Does the data leave out a specific demographic group that might not frequent the service where the data is collected?
  • How the data is collected — For example, is the data collected via interview in one area and via a form somewhere else?

Consider possible sources of bias in your agency’s data carefully. If you do identify possible bias, communicate it to data requesters, and work to reduce it, the decisions made based on your data may have serious unintended societal implications.

Work to eliminate possible sources of bias.

Data analysts are ultimately responsible for how they use your agency’s data; however, as the data owners and experts, you can help data analysts avoid biases in data that perpetuate racism, sexism, and other forms of discrimination.

First, be open about the limitations of the agency’s data to reduce the likelihood that it will be used in ways that have unintended consequences. Second, work towards systemic changes to data collection practices. Finally, require data requesters to demonstrate responsible use of your agency’s data.

Develop a data request process.

A clearly documented data request process can facilitate successful requests. This section covers some of the supporting documents to develop as part of a comprehensive data request process.

Remember that the data request process must abide by the regulations and laws that apply to each dataset. For more detailed information, refer to Establish a privacy policy and the report on Legal Issues in Interagency Data Sharing, including the appendices reviewing state and federal laws and regulations.

Request form

Ensure that the data requester answers the questions below in order to evaluate the benefits and mitigate the risks of sharing data.

  • What is the requester’s contact information and organization?
  • What is the purpose of the request?
  • How does the requester plan to use the data?
  • Who will have access to the data?
  • What are the specific data they are requesting, and what are the specific parameters, such as individual or aggregate data and over what time period?
  • How will the data be used? What methods will be used in the analysis of the data?
  • How will results be reported? With whom will they be shared? How will they be disseminated?
  • How frequently will this data be needed? For example, is this a one-time need or a recurring need?
  • How long is the requester seeking to keep the data? When and how will the data be destroyed? How is this reported or disseminated to the data owner?

Examples

Flow diagram or detailed narrative of the steps

It’s important to have a way to illustrate or describe the data sharing process from start to finish. Common approaches include using a flow diagram or descriptions for each step.

Examples

Data dictionary

A data dictionary describes the agency’s data. (See Create and publish a data dictionary.)

Examples

Data request fees

A request fee schedule communicates the cost of requesting data. Each agency may have unique procedures for enacting request fees. Consult your agency’s legal counsel for specific guidance on fee schedules.

Examples

Enabling data sharing as a member of P20 WIN

Connecticut state agencies can better leverage data for decision-making through P20 WIN’s data governance framework. P20 WIN uses an enterprise framework to facilitate data sharing across participating agencies. By participating in the data governance structure of P20 WIN, state agencies can enable the secure sharing of data to address critical policy questions in the state.

If your agency is not yet a participating agency in P20 WIN, contact Katie Breslin, Outreach and Engagement Coordinator, to learn about how your agency can join P20 WIN.

Responding to Data Requests

Having an established protocol for responding to a request for data will save you time and effort (see Develop a data request process). The following suggestions will help you respond smoothly to different types of requests.

Ask key questions up front.

Establish a process for how key questions are asked, answered, and documented. We’ve included a detailed list of questions to ask in the section on developing a request process, including:

  • What is the purpose of the request?
  • How does the requester plan to use the data?
  • Who will have access to the data?
  • What is the specific data they are requesting and what are the specific parameters?

If the requester is planning to combine data with another dataset, this will require careful review and consideration from both teams. This could be a complex process, and we’ve included some discussion of data linking in the Linking Datasets section.

Legal counsel should advise you on the specific type of legal agreement needed to share data. However, the information below can help frame productive conversations with your data-sharing partners.

The type of agreement you will need depends on factors like:

  • Whether the data contains personally identifiable information (PII)
  • The sensitivity of the data requested
  • The type of organization requesting the data
  • How the data will be used
  • The scope and duration of the request

There are multiple types of data sharing mechanisms available to state agencies. Each of them is governed by unique requirements and legal considerations.

But first: Do you even need an agreement?

Sharing data that is open to the public does not require an agreement. If the requesting party doesn’t need to identify specific individuals, it may be preferable to release the data to the public by publishing it on the Connecticut Open Data Portal (data.ct.gov). To publish data on the open data portal, refer to the publication guidelines developed by OPM.

Common types of agreements

The following section provides a brief description of these common types of agreement and when to use them:

  • Memorandum of Understanding (MOU)
  • Data Use Agreement (DUA)
  • Enterprise Memorandum of Understanding (E-MOU)
  • Data Sharing Agreement (DSA)
  • Business Associate Agreement (BAA)
  • Statement of Work (SOW)
  • Non-Disclosure Agreement (NDA)

While each of these agreements has a specific function (and a context in which it is appropriate to use), it is not necessarily the case that an agency looking to share data can solely choose any one of these agreements and move forward. These agreements often work together to provide the full details of the nature of a data sharing agreement (for example, the E-MOU, DSA, and DUA tend to work together rather than operating alone).

Memorandum of Understanding (MOU)

MOUs are best suited for ongoing data transfers that have consistent and formalized parameters. An MOU:

  • Identifies the roles and responsibilities of the involved groups
  • Describes why an agreement is required
  • Specifies the terms and conditions for the partnership

MOUs are especially important when the basis for a data sharing relationship is grant funding or a service contract. The process of establishing an “MOU enables potential partners to identify similarities and differences in their priorities and goals, available resources (time, money, and expertise), project timelines, and expected outcomes prior to collaboration.”1

Data Use Agreement (DUA) or Data Use Licenses (DUL)

Data Use Agreements (DUAs) or Data Use Licenses (DULs) are best suited for individual data sharing transactions. DUAs precisely specify the parameters for the data transfer, who will have access to the data, the intended use of the data, and how the requester should destroy data.

They may also “include specific time parameters for data use or provide special provisions for data disclosure or requirements for the data holding agency to review resulting research before its publication.”2

Enterprise Memorandum of Understanding (E-MOU)

An E-MOU is a long-term agreement signed by multiple parties in order to facilitate multiple and diverse data sharing requests. E-MOUs usually:

  • Describe involved parties
  • Set up governance boards
  • Define codified request procedures
  • Highlight the rights and responsibilities of data stewards and requesters

E-MOUs are mostly used to facilitate government agency to government agency data sharing and have been implemented in multiple states.3

Data Sharing Agreement (DSA)

Data Sharing Agreements are best suited for establishing long-term data sharing relationships that involve multiple transfers with different parameters. Data Sharing Agreements identify the involved parties and the terms and conditions for the partnership. They can stand independently or be an addendum to an MOU or E-MOU.

Since it defines an ongoing relationship for multiple transfers, a DSA may also define a process for authorizing data requests along with requirements for storing, protecting, and disposing of shared data.

Business Associate Agreement (BAA)

A Business Associate Agreement is a written arrangement that specifies each party’s responsibilities when it comes to PHI (personal health information). HIPAA requires covered entities to only work with business associates who assure complete protection of PHI.

Statement of Work

The statement of work is a detailed overview of the project in all its dimensions. It’s also a way to share what the project entails with those who are working on the project, whether they are collaborating or contracted to work on the project. This includes vendors and contractors who are bidding to work on the project.

Non-Disclosure Agreement (NDA)

A non-disclosure agreement is a binding contract between two or more parties that prevents sensitive information from being shared with any others.

P20 WIN data sharing agreements

The P20 WIN Enterprise Memorandum of Understanding (Enterprise MOU) outlines the guidelines for data sharing, governance structure, and confidentiality and security requirements for P20 WIN. The Enterprise MOU is signed by all agencies participating in P20 WIN.

Data Sharing Agreements are signed by Participating Agencies, the Data Integration Hub, and Data Recipients for specific data requests. This agreement outlines the responsibilities of all parties, data users, cell suppression policies, fees, and Exhibits. Exhibits include:

  • Exhibit A - Data Sharing Request Form and requested variables;
  • Exhibit B - User Acknowledgement Form;
  • Exhibit C - Confidentiality and Non-Disclosure Agreement;
  • Exhibit D - IRB approval, if needed and appropriate;
  • Exhibit F - Data Destruction Certificate; and the
  • Enterprise MOU

Footnotes

Safeguarding Data

The steps below are best practices for protecting the security of data maintained by your agency.

Develop privacy and security compliance policies, standards, and controls.

Policies are high-level statements about how data should be handled, similar to a vision statement. Standards outline the rules that govern putting policies into action, and controls provide specific instructions about how to implement a standard.

In order to facilitate secure and compliant data sharing:

  • Data requesters must understand the privacy and security compliance standards of the data they are requesting
  • Data owners must ensure that they clearly define the privacy and security compliance standards that govern the data they own

Establish a privacy policy.

A privacy policy is an externally-facing document for the people from whom you might collect data. It explains how your agency uses personal information that may be collected when the public interacts with the agency. The privacy policy should include the types of information gathered, how the information is used, to whom the information is disclosed, and how the information is safeguarded.

Here are some of the questions to ask when you document a privacy policy:

  • Why do we collect personal information?
  • What information do we collect? (Review the data dictionary.)
  • When and how do we disclose/share information?
  • How do we protect personal information, including the administrative, technical, and physical strategies?
  • How do we protect the confidentiality, integrity, and availability of confidential information that is created, received, maintained, or transmitted?

Document critical data elements.

Confidential Information (CI) is any non-public information pertaining to the agency’s business. Personally identifiable information (PII) is any data that can be used to identify an individual. Examples of PII include a user’s name, address, phone number, and social security number.

Data owners should also document subsets of PII, such as:

  • Payment Card Industry (PCI) data — credit card information
  • Protected Health Information (PHI) — information about an individual’s health
  • Education records — data maintained by a school about students that includes information like test scores, special education records, courses taken, and attendance

Understand the laws that govern critical data elements.

State agencies need to understand the laws that govern each dataset based on its CI and PII. The standards and laws that govern data are critical in order to know:

  • How data should be stored
  • How data can be used
  • What data can be shared (e.g., individual rows or aggregate totals)
  • How data are transferred
  • How data are disposed of

For more information about applicable federal and state laws, refer to the Legal Issues in Interagency Data Sharing report and accompanying appendices.

Define acceptable use standards.

Define acceptable use standards based on the laws and regulations that govern the use of your agency’s data. These standards will help define the specific requirements in data sharing agreements for keeping data secure. For example, for sensitive data, the data owner may require that the requesting party dispose of the data after a specific amount of time.

Develop, implement, and maintain a comprehensive data-security program.

Your agency will need legal assistance creating a comprehensive data-security program that adequately protects CI. The program will need to be consistent with and comply with all applicable federal and state laws and written policies related to protecting CI.

The data-security program should cover considerations like:

  • A security policy for employees related to the storage, access, and transportation of data containing confidential information
  • Reasonable restrictions on access to records containing confidential information, including access to any locked storage where such records are kept
  • A process for reviewing policies and security measures at least annually
  • The creation of secure access controls to confidential information, including but not limited to passwords
  • Encryption of confidential information that is stored on laptops or portable devices or that is being transmitted electronically

Enforce compliance controls.

A control is a safeguard to avoid, detect, or minimize security risks that might compromise the confidentiality, integrity, and accessibility of data. For example, a data owner might require a quarterly review of all users with access to a database or that people working with the data undergo compliance training.

Transferring Data

This section will give an overview of the steps involved in transferring data as part of a data request.

De-identify data as needed.

Depending on the data request, the data owner may need to de-identify data in order to protect the privacy and rights of the individuals represented in the data. There are a number of ways to de-identify data, and these are summarized below.

Removing PII and confidential data

One way to de-identify data is to remove all of the fields that could be used to identify a specific individual from the data. Examples include names, phone numbers, and birthdays. (For more information about confidential data, see the section Document critical data elements.)

Aggregating data

Data owners can also choose to aggregate data. This is accomplished by providing counts of specific fields for a dataset. For example, sensitive fields like birthday and address can be converted to age range and zip code in order to provide the counts of each age group living in a specific area.

When aggregating data, it’s important to ensure that groups aren’t split up so much that it’s still possible to identify individuals. For example, if you’re aggregating based on school, test scores, grade, and race and ethnicity, the counts can’t be small enough for someone to identify individual students.

Choose the right method for transferring data.

Once the parties have agreed to share data, it’s time to consider the logistics of transferring the data. The method will vary based on the sensitivity of the data.

Open and public data

Data that is open to the public doesn’t require a secure channel for data transfer. Some options that might be suited for file transfers are:

Technology File size limits Usage notes
Email (ct.gov and po.state.ct.us) 20MB, 35MB 35MB but depends on the recipients size limit also, they could have a 20MB limit.
Microsoft Office 360 OneDrive (ct.gov and po.state.ct.us) 100GB  
Approved external device varies Ask IT department for more information
Shared network drive varies Ask IT department for more information

Non-public data

All data that isn’t open to the public should be transferred through secure channels. These data include data governed by HIPAA, FERPA, or state laws and data that are confidential, subject to misuse, or simply not authorized for public consumption due to outstanding approval.

Failure to transfer non-public data securely may result in harm to citizens, lawsuits filed against the responsible government office, and severe professional consequences for the offending employee. It’s important to pay careful attention when sharing non-public data. Secure channels include:

Technology File size limits Usage notes
Government-approved SFTP service
  • 1.5GB - using the web client
  • Unlimited - using an approved client
  • The IT department/Delegated Admin will be needed to set up an SFTP connection and provide instructions on how to upload files.
  • Files will be removed from the system after 60 days of inactivity.
  • The IT department/Delegated Admin must be informed if a user is planning on uploading a file more than 5GB.
  • FileZilla and WinSCP are both approved FTP clients for uploading file sizes larger than 1.5GB
Government-approved Encrypted External Drive varies
  • Doesn’t include encrypted personal or even all government-approved flash drives. An encrypted external drive must be approved by the IT department and be password protected to prevent misuse of data if a non-authorized person accesses the drive.

Zipping and encryption

To accelerate secure data transfer, zip and encrypt data files before initiating a data transfer.

To zip a file:

  1. Right-click on file or folder
  2. Navigate to “Send to” option
  3. Click on “Compressed (zipped) folder”

To encrypt a file:

  1. Right-click on the zipped folder and open Properties.
  2. Under the General tab, click Advanced.
  3. Check the “Encrypt contents to secure data” box.

Enterprise Secure File Transport Services

The Department of Administrative Services Bureau of Information Technology Solutions (BITS) offers Executive Branch agencies an Enterprise Secure File Transport (SFT) Service for agencies that need to share sensitive content between other agencies, or business partners in a secure manner.

More information can be found on the DAS website here. The website states that this service offers the following:

  • Powered by the Axway Secure Transport solution, an industry leader in managed file transport solutions.
  • Provides for the exchange of file based content that ensures protection and encryption in transit and at rest.
  • Meets the federal government’s strict security compliance requirements,
  • Supports the use of a web-based file management environment as well as traditional secure file transport clients (e.g., FileZilla),
  • Is part of the state’s data center ecosystem, providing a secure and highly available environment.
  • Leverages the data center’s virtualization and storage services to further enhance reliability.
  • Is available 24 X 7 X 365, including critical incident response.
  • Customer support is available from 8:00 AM to 4:00 PM, Monday through Friday.

Conditions of Use:

Use of our Enterprise Secure File Transport Service has the following conditions:

  • Each agency who has enrolled in this service will need to appoint a primary and alternate SFT Liaison, with whom we will work on matters associated with your agency’s use of this system. The SFT Liaison will be granted an elevated level of permissions to provide basic support to your agency (e.g., password resets, etc.)
  • Content on the SFT environment is considered transitional and is automatically purged after 60 days.

See Appendix A for a step by step guide to using SFTP.

Linking Datasets

Data linking or data matching is the process of combining two or more datasets. It allows program administrations to provide more integrated and client-friendly government services.

Data linking also provides policy analysts and researchers a wider lens to draw insights and improve services. There are two ways to link datasets — deterministic and probabilistic.

Deterministic data linking

Deterministic data linking combines individual records only if the fields that are being compared match exactly. For example, two agencies could use social security numbers to combine their datasets. This type of data linking is most suitable when both datasets have a consistent, unique identifier.

Probabilistic data linking

Probabilistic data linking combines individual records using a special algorithm that compares multiple fields to determine if two records are the same entity. For example, P20 WIN’s data linking process uses identifiers such as name, birthday, and other fields present in both datasets to combine datasets. Probabilistic data linking is best suited for datasets that don’t have a unique identifier. It’s also best suited when two datasets have a unique identifier that’s inconsistently present or untrusted.

Blocking

When comparing two datasets, checking every single possible pair is computationally taxing. For example, two datasets each containing 100 records would require 10,000 pairwise comparisons. This computational reality quickly becomes unmanageable when linking larger administrative datasets.

Blocking solves this computational challenge by only comparing pairs that are likely to match according to particular fields. For example, by using age for blocking, only records with the same birth year are compared to each other. A common strategy is to run multiple block passes because some fields are missing or erroneous. The Australian Government’s Open Data Toolkit provides a table of fields to consider when blocking.

Measuring Accuracy

There are two types of matching errors that arise when linking records. The first is a false negative, which implies two records that fail to meet the set matching criteria actually are a match. The second is a false positive, which is when two records meet the set matching criteria when they are in fact not a match. There is a tradeoff between false negatives and false positives when determining the stringency of any matching criteria. A stricter matching criteria will decrease the number of false positives but increase the number of false negatives, while a more lenient matching criteria will increase the number of false positives but decrease the number of false negatives. The context of the data match provides guidance on which type of error to minimize.

It is very difficult to capture the true number of false negatives and false positives in a data merge. Researchers have implemented creative methods for assessing the accuracy of record linkage algorithms. For example, researchers at the University of Michigan tested a supervised learning, record linkage algorithm by training it on a large, novel dataset that includes biometric identifiers (fingerprints) to construct unbiased measures of error. Other researchers have examined the performance of widely-used record linking algorithms with hand-linked datasets and synthetic datasets. Synthetic datasets have known errors introduced to fields of interest.

Transferring and linking data in P20 WIN

P20 WIN’s data linking process uses identifiers such as name, birthday, and other fields present in both datasets to combine datasets. This video provides more details about P20 WIN’s data linking process. Additional detail can be found in the P20 WIN Data Governance Manual key processes.

Appendix A: Steps to Use SFTP to Securely Transfer Files

DAS/BEST offers an Enterprise Secure File Transport (SFT) Service for executive branch agencies, as described in the Transferring Data section. The SFT service can be used to securely transfer data between state agencies, or other business partners, in a secure manner. The steps for using the SFT service to transfer data are described below.

  1. Identify the SFT Liaison at your agency. SFTP service is available to the executive branch, constitutional offices, and quasi-public agencies and their business partners. Agencies who have enrolled in the service should have appointed an SFT Liaison, who is responsible for administering the SFT service at each agency. If you are not sure who the SFT Liaison is at your agency, your IT office might be able to help find them.

  2. Ask your SFT Liaison to set you up with SFTP access. They should be able to grant you access and tell you how to log on. Your username and password should be the same as the ones you use to log into Windows, but you should confirm this with your SFT Liaison.

  3. Log into the SFTP site. Once your account has been set up, go to https://sft.ct.gov and log in. Be aware that the username and password are case sensitive.

  4. Work with your SFT Liaison to set up folders to organize the files you wish to transfer. The SFT Liaison can set up folders for specific data transfers, and then grant access to the desired folders to the recipient of your data transfer.

  5. Make sure the person you want to share data with also has SFTP access. If the person with whom you want to share data is at another Connecticut state agency, their SFT Liaison must grant them access to the SFTP site. Once they have access, your SFT Liaison should be able to give them access to the folder with the files you wish to share. Your SFT Liaison should also be able to help you share files with a partner outside of state government.

  6. Ask your SFT Liaison to give the data recipient access to the folder on the SFTP site where your data is saved. Once you both have access to the shared folder, you should be able to use that folder to transfer files.

  7. Upload the data that you want to share into the designated folder. Click the “Upload” button and select the file that you want to share. The person you want to share data with should also now have access to the file via your shared folder.

  8. If you have difficulties using the SFTP site, ask your SFT Liaison for help, or call DAS/BEST at 860-622-2300 and select Option 9.

Appendix B: Resource Library

This page highlights resources from different organizations about data sharing and related topics.

Preparing a Successful Data Request

Data Aggregation

Linking Data

Metadata

Bias in Data

Safeguarding Data