Data sharing

When sharing (or intending to share) data, scientists should keep certain things in mind to make the lives of their colleagues easier, as well as to protect privacy of the participants.

Metadata

For the purposes of maintaining as much metadata as possible, it is important to not only keep track of each step carried out in the preprocessing and analysis processes, but also to maintain detailed information about its collection, such as the exact type of electrodes used, their diameter and depth, as well as the data acquisition system that was used (for example, Neuralynx or Micromed, mentioned above). Detailed information about what programs and methods were used to place electrodes, discarded channels and reference electrode placement are also crucial for others to understand and evaluate your data.

The brain imaging data structure (BIDS) (Pernet et al., 2018) is a format to share neuroimaging data using agreed upon standards created by the neuroimaging community. BIDS offers a systematic way to organize data into folders using dedicated names, in association with text files, either as tabulated separated value file (.tsv) or JavaScript Object Notation file (.json) to store metadata.

BIDS was originally created for the fMRI modality, however, it has recently been adapted to several other modalities, and more are in the works. One recently added specification is iEEG (Holdgraf et al., 2019). BIDS for iEEG demands that an electrode coordinate system is provided as well as the images that allow one to visualize the electrode placement (typically defaced MRI or CT scans). These should be provided as NIfTI files with the BIDs naming scheme. Localization data is stored in the _electrodes.tsv and _coordsystem.json files. Event timing data is presented in a _events.tsv file. BIDS for iEEG allows for several neurophysiological data formats to be used. This is because it was difficult to reach a consensus about a gold standard among researchers. Therefore, BIDS allows for two compliant formats: the European Data Format (.edf) and the Brain Vision Core Data format (.vhdr, .vmrk, .eeg). For more information on the Brain Vision format, see our guide Working with EEG Data. Unofficially supported formats include Neurodata Without Borders (.nwb), EEGLab (.set), and Multiscale Electrophysiology Format (MEF3).

BIDS additionally has a javascript validator that can help researchers pinpoint issues when their dataset does not conform to the standard.

We encourage the iEEG community to share their data by using this data structure as it facilitates communications, increases reproducibility and makes easier to develop data analysis pipelines. It also facilitates compliance with the FAIR principles of findability, accessibility, interoperability, and reusability. Incentives to using the structure increase continuously, and include the creation of toolboxes that automate working with BIDs compliant data. See our handbook Structuring Data with BIDS for more information.See our handbook Structuring Data with BIDS for more information.

Research data archives

There are both national, international, and domain-specific archives that meet international standards for archiving research data and making it accessible. UiO’s researchers can choose the archiving solutions that are most appropriate to their discipline and that meet the conditions of applicable legal frameworks. Depositing data resources within a trusted digital archive can ensure that they are curated and handled according to best practices in digital preservation.

Some archival resources are:

Re3data.ord (a global list of archives)
Zenodo (EU’s archive)
NSD (national archives)
NIRD/Sigma 2 (national archives)
DataverseNO (national archives)

For more information, see our guide DataManagement: DMPs & Best Practices.

Anonymization

iEEG data on its own (as well as the related behavioral data) do not constitute sensitive data, however, the headers associated with the iEEG raw data files contain sensitive data that is recorded for clinical purposes, including the participant’s name. Due to the large number of files produced by iEEG acquisition systems, the removal of this sensitive data must be automated. You may contact the data manager for help with this task, or it may be performed for you in advance before the data is placed in Lagringshotell.

It is important to anonymize data from the start of data collection by providing participants with participant numbers and keeping any identifying information like name, address, phone number, birthday or national identification number separate from the EEG data. Name, contact information and the subject’s ID are not stored together. Documents where ID and name are linked are stored on encrypted storage mediums. Those devices should be stored in locked cabinets separate from the data. These steps are of even greater importance when one is working with patient populations.

MRI or fMRI data used in conjunction with EEG data is inherently sensitive and requires safe handling of data. Data is removed from the scanner on passwordprotected, encrypted hard drives. At the point of transfer of data from the scanner to external encrypted hard drive, one must always remember to check the box for anonymization of the data at the scanner console and designate a participant number or alias instead. This is because DICOMs have headers that contain identifying patient data. If anonymization is not performed at the source, the DICOM headers must be anonymized by hand, which can be time-consuming. Storage of neuroimaging data is permitted only in TSD or Lagringshotellet at UiO. Some patient groups may only be stored in TSD. Prior to data archival and sharing, neuroimaging data must be defaced.

Creating a Data Management Plan for EEG Research

Now that you are familiar with some of the data-related topics and concerns that should be considered when conducting an iEEG study, you may feel better equipped to complete a data management plan document. The following offers questions specific to iEEG data to guide the creation of such a document. For a more in-depth exploration of how to create a data management plan, please see the HTD guide Data Management. DMPs & Best Practices.

Who is the study’s P.I.? What other team members will be involved in data collection and analysis? What medical personnel will be involved in the research and at what hospital will the research be performed?
What software will you use for stimuli/experiment presentation and behavioral data collection?
What software do you foresee using for analysis?
- Knowing what software will be used ahead of data collection will help you to make an appropriate data management plan from the start.
How will your data be handled in the curation phase, once the study is over? What data repository will you use? What requirements do they have for data management? Can you maintain this standard throughout the study?
- What will data curation cost? Many data repositories have requirements for file structure and naming conventions, as well as the file types that are preferred. If you are aware of these requirements beforehand, you can save time once the time for curation arrives by using those guidelines over the course of the study. You will need to know and plan for the costs that will be incurred for data sharing and curation when applying for funding.
What types of electrodes will be used? What programs will be used in their localization and what neuroimaging methods will be employed in this process?
What iEEG system will be used in data collection? What file types does it generate?
Who will be responsible for identifying epileptic activity in the data and which channels will be discarded? How will this process be carried out?
What file types do your behavioral data programs generate?
Will you share your data? Under what license will you share your data? What limitations will be placed upon access to your data?
- For a comprehensive description of the dif erent licenses that can be applied to data sharing and usage see https://www.ucl.ac.uk/library/research-support/research-datamanagement/licenses-data-sharing-creative-commons
How will you pilot your project? What will be done with the data from the pilot study?
How many participants will your study have? How will you ensure the welfare of the patient group?
- Special considerations must be taken when handling data related to studies involving patient groups to ensure their welfare and anonymity.
How many runs or sessions will the study have per participant?
Will you collect sensitive classes of data?
- Sensitive classes of data include identification numbers, birthdates, neuroimaging, information related to health conditions, information about race/ethnicity, political af iliation, sexual orientation or in some cases gender identity/biological sex (for example when working with transgender or intersex populations).
What ethics committees will you need to apply to? What are their data management requirements?
- Most studies conducted under PSI will be required to submit applications to REK, NSD or both. Their data management requirements are detailed on the committees’ websites.
Who is funding the study? What are the funder’s requirements for data management plans?
- Most funding entities require at least a basic data management and data sharing plan to accompany funding applications.
What information will be provided to participants in advance of participation and in what form? What are your plans for collecting informed consent? How will you store the consent forms once they are collected?
How will you store the data key linking participants to their data?
If you will collect sensitive data, how will it be stored and analyzed? What protections will be put in place to ensure anonymity is maintained?
- At UiO, sensitive data can be stored and analyzed in TSD or on Lagringshotellet, depending on the degree of sensitivity of the data.
How will you store non-sensitive data? In what environment do you plan to analyze it?
Who will have access to the data? Who is responsible for transferring data to storage once it is collected?
What file naming conventions will be used for the various file types that make up your dataset?
Will you use in-house written code? What programming language will you use? Will this code be made available to the scientific community? If so, where will it be made available?
Will you maintain a lab notebook? How will the lab notebook be used and who will have access?
- Digital lab notebooks are now quite common, but can raise questions of data security.
Will you convert proprietary file types to more standardized file types prior to sharing your data?
How will the findings be disseminated?
How will you record metadata related to the study for future sharing?
- Metadata is data about your data that can help researchers who later want to access your data to understand how the study was conducted. It may also help members of your current team understand the data they are working with.
Who will be responsible for the data in the long term once the study is completed?
- This is important to consider for all studies, but especially important for longitudinal studies. Personnel and staf may leave your institution or be dif icult to contact in the future. A primary contact person who is responsible for the data should be designated. Should they leave the institution or retire and a new responsible party is designated, then the repository should be updated with their information.

By Elian E. Jentoft & Rene S. Skukies

Published Aug. 31, 2020 10:14 PM - Last modified Aug. 31, 2020 11:30 PM