Tools and workflows for data & metadata management of complex experiments : building a foundation for reproducible & collaborative analysis in the neurosciences

Sprenger, Julia; Grün, Sonja Annemarie (Thesis advisor); Kampa, Björn Michael (Thesis advisor)

Jülich : Forschungszentrum Jülich GmbH, Zentralbibliothek, Verlag (2020)
Book, Dissertation / PhD Thesis

In: Schriften des Forschungszentrums Jülich. Reihe Schlüsseltechnologien 222
Page(s)/Article-Nr.: 1 Online-Ressource (X, 168 Seiten) : Illustrationen, Diagramme

Dissertation, RWTH Aachen University, 2020


The scientific knowledge of mankind is based on the verification of hypotheses by carrying out experiments. As the construction and conduct of an experiment becomes increasingly complex more and more scientists are involved in a single project. In order to make the generated data easily accessible to all scientists and, at best, to the entire scientific community, it is essential to comprehensively document the circumstances of the data generation, as these contain essential information for later analysis and interpretation. In this thesis, I present two complex neuroscience projects and the strategies, tools, and concepts that were used to comprehensively track, process, organize, and prepare the collected data for joint analysis. First, I describe the older of the two experiments and explain in detail the generation of data and metadata and the pipeline used for aggregating metadata. A hierarchical approach based on the open source software odMLfor metadata organization was implemented to capture the complex meta information of this project. I evaluate the design concepts and tools used and derive a general catalogue of requirements for scientific collaboration in complex projects. Also, I identify issues and requirements that were not yet addressed by this pipeline. There were, in particular, the difficulties in i) entering manual metadata and structuring the metadata collection, ii) combining metadata with the actual data, and iii) setting up the pipeline in a modular generic and transparent manner. Guided by this analysis, I describe concept and tool implementations to address these identified issues. I developed a complementary tool (odMLtables) to i) facilitate the capture of metadata in a structured way and to ii) convert these easily into the hierarchical, standardized metadata format odML. odMLtables provides an interface between the easy-to-read tabular metadata representation in the formats commonly used in lab-oratory environments (csv/xls) and the hierarchically organized odML format based on xml, which is designed for a comprehensive collection of complex metadata records in an easily machine-readable manner. Supplementing the coordinated capture of metadata, I contributed to and shaped the Neo toolbox for the standardized representation of electrophysiological data. This toolbox is a key component for electrophysiological data analysis as it integrates different proprietary and non-proprietary file formats and serves as a bridge between different file formats. I emphasize new features that simplify the process of data and metadata handling in the data acquisition workflow. I introduce the concept of workflow management into the field of scientific data pro-cessing, based on the common Python-based snake make package. For the second, more recent electrophysiological experiment, I designed and implemented the workflow for capturing and packaging metadata and data in a comprehensive form. Here I used the generic neuroscience information exchange format (Nix) for the user-friendly packaging of data sets including data and metadata in combined form. Finally, I evaluate the improved workflow against the requirements of collaborative scientific work in complex projects. I establish general guidelines for conducting such experiments and workflows in a scientific environment. In conclusion, I present the next development steps for the presented workflow and potential avenues for deploying this prototype as a production prototype to a wider scientific community.


  • Department of Biology [160000]
  • Theoretical Systems Neurobiology Teaching and Research Area (cooperation with FZ Jülich) [163110]