Data management is a broad subject and when planning a study the process should be outlined at the very early planning stages. A research is as good as the data that feeds it. The garbage-in-garbage-out idea is not foreign to data analysis and research; data quality and relevance must be ensured at all stages of data generation, storage, analysis and dissemination.

We are fast moving towards an era where reproducible research won’t be an option. When you share results people would want to be able to reproduce them with minimal effort, so you have to be in a position to share documentations such that someone else can follow and replicate your results.

Data documentation is part of the data management process. Metadata is “data about data”. It is a set of data that describes and gives information about other data. The University of Oregon library lists three categories of metadata: descriptive, technical or structural, and administrative. All objects also have a unique identifier metadata element.

  1. Descriptive metadata elements consist of information about the content and context of an object. For example, descriptive metadata for an image may include: title, creator, subject (tags), and description (abstract).
  2. Technical/structural metadata elements describe the format, process, and inter-relatedness of objects. For example, technical/structural metadata for an image may include: camera, aperture, exposure, file format, and set (if in a series).
  3. Administrative metadata elements describe information needed to manage or use the object. For example, administrative metadata for an image may include: creation date, copyright permissions and required software, history and file integrity checks.

What’s should you to document? These are some of the important details to document.

  1. Context of data collection
  2. Data collection methodology
  3. Structure and organization of data files
  4. Data validation and quality assurance
  5. Data manipulations through data analysis from raw data
  6. Data confidentiality, access and use conditions

At the data-level, documentation should include but not limited to:

  1. Variable names and descriptions
  2. Definition of codes and classification schemes
  3. Codes of, and reasons for, missing values
  4. Definitions of specialty terminology and acronyms
  5. Algorithms used to transform data
  6. File format and software used

There are a variety of metadata standards, usually for a particular file format or discipline.  Some examples include the following. A more general purpose metadata standard is the Dublin Core, but there are others out there.

So, let’s nature the habit of documenting our data; apart from being useful for effective data management it is useful if we want to make significant strides towards reproducible research.

If you love your data clothe it!