💽 Data Management

Author

Peter Kraus

Published

May 13, 2025

Basic principles

Reproducibility is one of the core values of our lab. In practice, this means all research outputs should be documented and tracked, along with sufficient instructions and descriptions (i.e. metadata) to allow their independent reproduction. To facilitate this, at ConCat we follow these basic data management principles:

  • Always document your work. Whether you prefer to use a paper notebook, a digital one, or an ELN, you should always keep a track (including dates) of the work you do. This includes:
    • Planning of experiments (why, what, where, how)?
    • Observations, oddities, or changes to instruments.
    • Ideas for new projects.
  • Always archive raw data. Without storing the raw data that comes out of your instrument, you won’t be able to check your work – which means nobody will be able to check it. Best practice is:
    • Upload everything into datalab.
    • Have a consistent file structure on your local PC.
    • Attach everything into supporting information archives. Peter will check this!
  • Always back-up your work. Instrumental data loss can be avoided easily by having a copy of raw or processed data stored elsewhere. Your documents on your local PC might be the weak link. You should use the following tools to back-up:
    • Windows has a “Files backup” functionality, which lets you back-up your files onto any drive. You can use the network drive at the FGKW.
    • Syncthing can be used to synchronise any number of folders over the network with any number of other computers. You can use the VPS that runs our server, ask Peter for an account.
    • TUB cloud can be used to synchronise a folder with your TUB cloud account.
  • Use Jupyter notebooks for data processing. It may be tempting to do your data (post-)processing in Excel, but this is very likely a trap - the moment you import your data into a spreadsheet, any link back to the origin is lost. You should learn to use Jupyter notebooks (with Python or any other language) to do your data processing reproducibly.
    • The notebook will document in executable code what kind of data processing you do.
    • The notebook will let you re-do your analysis.
    • You should generate figures directly using your notebook (e.g. using matplotlib or seaborn or a similar library).
    • Never manually postprocess your figures!
Use of Jupyter notebooks is mandatory.

If you are a PhD student in our lab, you will be required to do your data analysis in Jupyter after the 1st year of your project. Use of our automation (tomato) and data processing tools (yadg and dgpost) is strongly encouraged.

Publications from our lab have to include all raw and processed data in the supporting information archive, in the form of a Jupyter notebook.

Data Management Plan

The Data Management Plan (DMP) is a plan, in which you think, assess, and discuss how you will manage your research data. The key questions you should think about for your DMP are:

  • What data and in what formats will you generate in your work during the project?
  • What data and in what formats will you receive from project partners?
  • Where will you store the data?
  • How will you store it? How is the storage backed up?
  • What tools will you use to process your data?
  • How will you publish the data?
  • How do you tell Peter where and what your data is?

TU Berlin provides a short overview on what data management is and how to develop a DMP. A special tool for designing DMPs, called TUB-DMP is also available; feel free to check it out and use it.

Your project might require a DMP.

In some cases, designing a DMP is compulsory as it is a project requirement. This usually means the DMP has to be done in a particular, often external, tool. Peter will inform you of the details.

In any case, you should think about data management throughout your research, and design and discuss a DMP together with Peter in a Review Meeting during your first year, after your Research Proposal is ready.

datalab

Caution

The use of datalab at ConCat is a work-in-progress. Ideas and suggestions welcome.

At ConCat, we use datalab as our electronic lab notebook (ELN). You are expected to use datalab to track:

  • any starting materials, i.e. the open inventory of chemicals or samples we receive from suppliers or partners
  • all samples you create from any starting materials or other samples,
  • any raw data files related to any sample or starting material in our datalab.

This way, we will be able to track which materials are used to prepare which samples, and keep a copy of all raw data associated with a certain sample in a semi-structured way.

Use the QR code functionality in datalab.

All samples and starting materials that we use must be labelled with a QR code generated by datalab.

This ensures that anyone with a possession of the sample can tell what it is by scanning the QR code, improving safety. It also means that when you hand in your sample for analysis, they will know what sample it is, avoiding confusion. In cases where the sample vial is too small, you may use the refcode (i.e. concat:ABCDEF).

Permissions and sharing.

By default, you should see all starting materials. The inventory of starting materials is not private.

By default, you will also only see samples that you created or were shared with you directly (by adding you to creators) or via collections.

The administrators (i.e. Peter and Matt Evans) can see all samples, starting materials, and collections.

Starting Materials

The starting materials in datalab are used to track any chemicals or consumables that are purchased from outside the lab, i.e. things supplied from Sigma-Aldrich, Carl Roth or Crystal GmbH etc.

The following conventions should be followed:
  • Add a photo of the container, if possible.
  • The name must match what is written on the bottle. If it’s in German, you write the name in German!
  • The supplier should be the company manufacturing/supplying the material, not the name of the person that gave you the chemical.
  • The location should be the room where the chemical can be found, i.e. 10a or Halle or 112b / Schrank XYZ. For starting materials sent away for analysis, write where they were sent to.
  • The date acquired should be either the manufacture date of the item (preferred) or the delivery date of the item (if the manufacture date is not known).
  • The chemical formula is optional. Only enter it for substances with simple, known composition (e.g. V2O5). For mixtures, solutions, and alloys, leave empty.
  • The chemical purity should be used to track things like “>99% pure”, “single crystal”, or “alloy”. Be consistent in your labelling across a group.
  • The GHS hazard codes field should contain the codes from the safety data sheet.
  • Enter the Lot Number / Batch Number of this item into the description.
  • Always upload a safety data sheet (SDS or MSDS).

Attach any characterisation data (e.g. XRD, XPS, surface area, elemental analysis…) carried out on the starting material to its entry on datalab.

Samples

The samples in datalab are all chemicals, samples, or materials created from starting materials or other samples in our lab or by one of our project partners. By definition, these samples have to link to one or more of the above starting materials.

Create new samples when treating materials!

When you do a sample preparation (e.g. pressing, sieving, calcining…) or a catalytic study on an existing starting material or a sample, you have to create a new sample for each such experiment!

The following conventions should be followed:
  • Every sample must have the original starting material or sample as parents in Synthesis Information section.
  • The Procedure section should describe the preparation method or catalytic test used to prepare the sample.
  • The name should be descriptive as it can be searched.
  • The date created should correspond to the date of the experiment.

Attach any catalysis testing data (raw GC data, processed data) from the experiment used to create the sample. In case any post-mortem characterisation is done, attach the characterisation results to this sample.

Collections

The collections are an easy way to organise samples and starting materials and share them with other team members. Use collections as tags for groups of samples / starting materials belonging to projects, papers, ideas.

Have one main collection per project.

Each sample should belong into at least one collection, identifying the overall project. In this way, your collections will be easy to share with new team members working on your projects.

AiiDA

Caution

This section is a stub.

To track large-scale computational tasks, the use of a workflow manager such as AiiDA is strongly recommended.