Glossary

 

Image may contain: Text, Font, Number.

   
Analysis The methods and processes involved in making sense of data and other relevant information.
Archive files File formats which are intended to preserve mutiple files, as well as their metadata, for longterm storage. Some also compress the file collection so that it takes up less storage space, but can still be "unpacked" for typical usage when needed. Examples: .zip, .tar, .rar., etc.
Audit records Documentation recording the findings of quality control checks (audits) and changes made to data.
Bit rot A colloquial term for data degradation.
Checksum A small parcel of data which is used to aid discovery of data degradation under storage. A "checksum function" creates and retrieves the checksum.
Coding Another word for computer programming, which involves the creation of programs for software, analysis, data management tasks, apps, websites, etc.
Commenting In programming, the act of creating explanations, what a portion of code is intended to do, directly within the code.
Curation Processes involved in the organization, presentation and preservation, of a collection of objects, such as those represented in a dataset.
Data Facts, figures, measurements, observations, images, recordings etc. (often in digital format) collected for analysis.
Data dictionary An organized listing of metadata, meanings behind specific codes in the data, certain data's relationship to other data, etc.
(Data) Degradation Gradual decay in the collections of zeros and ones that make up digital data at its root, or of the storage formats upon which they are stored (hard drives, flash drives, floppy discs, etc.). This decay can corrupt files in such a way that they become unreadable over time.
(Data) Integrity Refers to a dataset's continued completeness, accuracy, reliability and freedom from degradation.
Data key A list of participants in a study which directly ties them to their subject number or alias.
Dataset A collection of data.
Dependencies In programming, other programs or libraries that are required to run a program.
File naming conventions Set standards for the naming of files.
Freeware Programs distributed free of cost to the user.
Function In computer programming, a block of command script which can later be "called" for reuse without needing to rewrite the entire block of code.
Hash function In the context of this guide, an algorithm designed to map arbitrary numerical identifiers to other, fixed data, for later retrieval.
Header Metadata for a file which is either contained in a separate file (often .hdr) or embedded within the file itself.
Licensing In the context of this guide, standard agreements which detail the conditions under which a dataset can be obtained and re-used, as well as how it should be cited.
Master files The main file copies in a dataset from which other file copies are made. These will not be changed or manipulated without higher level permissions. Metadata - Data about data, ie. how it was collected, the format in which it was collected, information on how a particular piece of data fits into the larger dataset, details on its transformation via pre-processing, quality assurance, & analysis, etc. Non-proprietary files - File types which are created as an open-accesss common or standard format, which can be created by freeware and are intended to be readable by many applications. Examples: .jpeg, .nii, .dcm, .tif, .mp4, .mp3, .tsv, .csv, etc.
Metadata Data about data, ie. how it was collected, the format in which it was collected, information on how a particular piece of data fits into the larger dataset, details on its transformation via pre-processing, quality assurance, & analysis, etc. Non-proprietary files - File types which are created as an open-accesss common or standard format, which can be created by freeware and are intended to be readable by many applications. Examples: .jpeg, .nii, .dcm, .tif, .mp4, .mp3, .tsv, .csv, etc.
Non-proprietary files File types which are created as an open-accesss common or standard format, which can be created by freeware and are intended to be readable by many applications. Examples: .jpeg, .nii, .dcm, .tif, .mp4, .mp3, .tsv, .csv, etc.
Non-solid compression A method of compiling and compressing multiple files into an archive file. This method treats each individual file as its own block of data, which it compresses and then compiles them into one archive file.
Open Science A cultural movement with a goal of ensuring transparency and an understanding of research data as a valuable asset to be shared and reused.
Open Source A movement of creators of everything from art to computer software (freeware) and programming languages, which encourages creation for the common good.
Quality control Routines that ensure accurate and uniformly formatted documentation in data collection.
Pre-processing The techniques deployed in transforming data so that it can be analysed.
Proprietary files Refers to files that can primarily only be read or created by certain (usually commercial) software. Examples: .docx, .xlsx, .m, .mat, .e, etc.
Quality control Routines that ensure accurate and uniformly formatted documentation in data collection.
Raw data Data in its pre-processed, pre-analysis form, directly as it is at collection (or after minor correction of data entry errors).
Re-linkage A privacy breach that occurs when someone is able to tie data collected from a study participant to a particular person, often using other public data.
Scripts Another term for a computer program.
Solid state compression A method of compiling a series of files into an archive file. This method treats all the files as a single block of data which it then compresses. .7z & .rar formats use this method of compression.
Standards An established set of parameters (for example, of file formats, file organization schemes, data formatting, etc.) developed and agreed upon by members of a scientific community.
Two-pass verification In the context of this guide, a method of data entry which involves two persons entering the same data and then comparing their records for accuracy.
Version control/versioning Methods of markating changes made to data, individual files, or code, who made them and at what point to, for example, know that one is working on the most recent version of a file, to return to an earlier version when necessary, or to aid communication and decision making in teams.

PREV

NEXT

By Elian Eve Jentoft
Published May 20, 2020 6:07 PM - Last modified May 21, 2020 12:50 PM