MDI program

Micro-data Infrastructure (MDI) Training

The MDI Training is a three-session program designed to equip researchers (NPBs) with the skills to effectively work with cross-country firm-level data provided by the MDI system. Through hands-on sessions, participants explore key features of the MDI system, practice writing efficient research modules in R, and learn how to consolidate and visualize data from multiple countries.

The MDI Manual serves as the comprehensive guide and primary documentation for the MDI. It provides an in-depth overview of the project, detailing its purpose, functionality, and implementation process for National Statistical Institutes (NSIs). The manual offers step-by-step guidance on setting up the MDI for staff and researchers, instructions for module writers, and preparation for launches, including infrastructure-related tasks. Together, the MDI Training and Manual provide researchers and NSIs with the tools and knowledge needed to maximize the potential of the MDI system.

Accessing the Training

The training is hosted on Nuvolos, our dedicated training server. This server replicates the environment of a national statistical institute and includes the MDI infrastructure.

The Nuvolos server provides all the necessary tools, scripts, and libraries required to develop research modules. It also contains a mock dataset—a simulated version of real data that preserves the structure and functionality of actual data while excluding any sensitive details.

While the primary purpose of the server is to host this training, it can also be used beyond the training sessions to write, test, and familiarize yourself with research modules and the MDI environment.

If you’re interested in participating in the training, please complete this form. Once submitted, you’ll receive an invitation to access the server environment.

The training is divided into 3 sections, with video guidance for each session of the training. Here you can find all the training videos. Below is an overview of each training section:

Session 1: MDI Overview

The training begins with an introduction to the MDI, which is a centralized framework for harmonizing firm-level data across countries. You learn how to use the MDI manual and explore the MD catalogue for data selection.

• MDI Workflow: A detailed breakdown of the Countdown setup for directories, setting flags and disclosure criteria, followed by the program selection for metadata checking, setting up the MDI environment, creating metadata or finally launching the rocket (harmonization and output generation from research modules).

• Data Selection: Using the Shiny Catalog Viewer and MD_catalogue.csv to explore datasets and country-specific variables.

• Practical Exercises: Hands-on exploration of file structures, metadata, and MDI manual. Running countdown.

This session lays the foundation for using MDI as a framework for cross-country research, ensuring that you understand the ecosystem, metadata, and tools required to setup the system and run data harmonization/research modules.

Session 2: Efficient Module Writing in R

In the second session, you delve into the practicalities of writing well-structured and reproducible research modules using R. The session emphasizes the use of data.table for efficient data manipulation and explores MDI-specific tools for consistent module development.

• Module Structure: Designing modules with clear workflows for importing, processing, and exporting data.

• Basic Operations with data.table: Hands-on practice with subsetting, grouping, summarizing, and joining datasets efficiently.

• Best Practices: Using tools like mdi_aggregate() andmdi_regress() for analysis and mdi_export() for disclosure-compliant outputs. The session also highlights the importance of consistent naming conventions (e.g., snake_case).

• Hands-On Exercises: Practical exercises focus on essential data.table operations and exploring the MDI package containing essential MDI tools for standard analysis and disclosure checks.

By the end of Session 2, you can confidently design and document R modules for the MDI.

Session 3: Output Consolidation and Data Visualization

The final session focuses on consolidating outputs from multiple countries and creating visual representations of research results. You learn to stack and merge output results, generate detailed summaries, and explore insights through visualizations.

• Data Consolidation: Techniques for combining datasets from different countries using tools like rbind() and dplyr::left_join() while merging data with concordance tables.

• Summary and Regression Outputs: Generating descriptive statistics and regression tables using tools like stargazer for LaTeX and text outputs.

• Data Visualization: Using ggplot2 and custom functions (e.g., create_viz()) to create histograms, bubble charts, and other visual tools to explore variable distributions and relationships.

By the end of this session, you are equipped to consolidate multi-country research outputs from the MDI, generate summaries, and communicate findings effectively through visualizations.

MDI Training Outcomes

Upon completing the MDI Training, you will:

1. Understand the MDI framework, its stakeholders, and workflows for harmonizing firm-level data and running research modules across countries.

2. Develop well-structured research modules using R and MDI tools.

3. Consolidate and visualize outputs to explore cross-country insights.

This training prepares researchers to conduct cross-country analyses using the MDI while learning about NSI disclosure criteria.