We collect several kinds of potentially identifiable data. Katie has special permission from Penn’s IRB to collect these data and has promised to keep them safe. They include:
Personally Identifiable Information (PII): We collect full name, date-of-birth (sometimes), contact information, and demographic information from children who are interested in participating in our research studies.
Audio and video recordings: These files are considered potentially identifiable because they contain the participant’s voice, image, and any personal details they have disclosed during the study.
Coded data (transcriptions of audio or video recordings): We can’t control what people disclose during recording (their full name, for example), so we treat coded data as potentially identifiable.
Data sheets: We record (1) where a child was run (schools, daycares) and (2) what languages they speak. This is potentially identifiable via deductive disclosure (e.g. if we ran a child at a particular school and they are the only child at that school who speaks Italian, they could be identified by the data sheet).
We keep these data safe by following our IRB approved data management plan. We promised to store data in a specific place and back it up in (at least) two secure places. Here's an infographic to help illustrate how our data is handled:
PII: Entered into the Salesforce database, which is backed up by Penn.
Runtime data: Automatically collected on our experiment server (experiments.childlanglab.com) and stored in our database. The database is backed-up by Digital Ocean every night and to our Google Drive every week.
Audio and video files: Audio and video recordings of participants are saved directly in our Google Drive and pushed to Databrary.
Coded data: Entered directly into our database.
Keep our computers safe: lock the door when the lab is empty and keep laptops and tablets locked up when not in use.
Store data in the right place, and only the right place. Always save the original file in its proper location (outlined in the data management plan above). Never save a file to an intermediate or temporary location (e.g. the desktop), even if you plan to move it later. Never copy the file to other locations, including the desktop. And never, never email data to yourself or anyone else.
Follow the lab’s data protocol for file naming. Always name the file properly, according to our lab protocols.
Do not take photos or screenshots. Be extremely careful when taking photographs or screenshots in the lab or on lab computers. Never take a screenshot or photograph that includes confidential data.
Report any breaches within one hour. Report mis-handled data within one hour of noticing it. This will help us minimize damage.
We follow the 3-2-1 Backup guidelines (infographic below) to keep our data safe from accidental loss:
Copy 1 | Primary - original copies of our data are stored on the lab’s database or the Google Drive folder.
Copy 2 | Backup #1 - data from the experiment server and the ChildLangLab-Data folder are uploaded to Google Drive into the project's raw-data folder.
Copy 3 | Backup #2 - Data uploaded to Google Drive are backed up on the Yellow Brick Road, our cloud backup drive. For access to this backup, email Katie.
If you make a mistake, don’t worry. It’s likely we can restore a previous version, either within Google drive or from the Yellow Brick Road cloud backup.
When you make a mistake, the first thing you should do is check the version history in Google Drive. Google drive allows you to see previous versions of a file, name previous versions (for easy roll-back) and restore previous versions. If you are unable to see the version history, email Katie and she can probably access this for you.
While we take great care to protect our data, we are still vulnerable to data loss. This is because we rely on humans (you and me!) to collect, store, and manage data, and humans inevitably make mistakes. Here are some places we are most vulnerable to human error.
Laptops and iPads could be lost or stolen, or they could be “off the network”, so the data they contain is not being backed-up according to plan.
Audio/video files and data sheets must be saved in the correct location (ChildLangLab-Data) by a human in order for them to be stored and backed-up. Failure to save them properly could result in complete loss.
Files and folders must also be named according to our documentation protocol in order for us to find them in the future, and to be clear about what they are.
Experiment runtime files are named from data entered by a human; if you accidentally enter the wrong subject ID number or condition, the data will be misnamed (and potentially misrun).
Necessary access privileges mean that it is possible for you to download or move files you shouldn’t -- there is no way around this except to warn you not to and ask you to be vigilant about where and how you save lab files.
You can help protect against these mistakes by (1) doing careful science, (2) adhering to this data management plan, (3) asking questions when you aren’t sure where or how to save something (never just guess!), and (4) notifying Katie immediately when you notice a mistake.
You’ll never get in trouble for admitting a mistake -- only praised for your honesty and integrity!
Lab work and files are the Intellectual Property of me (Katie) and you and our collaborators. Because our work is sponsored by the University of Pennsylvania and government grants, it's also Penn's IP. Work we do in collaboration with other people is their IP, too, and usually their home universities.
This means you need permission from everyone on a research team before you share lab files (IP) with anyone. Even Katie can't share without permission from all collaborators. It is important to emphasize that these rules aren’t just our lab’s or Penn’s; this is a legal obligation.