DALG: The Data Aware Event Log Generator

Tags

Programming

Table of Contents


Motivation

When I was working on my master thesis, I was tasked with researching methods for mitigating data scarcity in process management research. During my research I developed an approach that could generate fairly realistic synthetic process data, specifically event logs, with a focus on a semantically correct data perspective. To evaluate the approach, I wanted to implement it in a software. Most implementations that are the result of student theses usually never reach beyond the state of a research prototype. I, however, did not want my implementation to suffer the same fate, so I decided to implement not just a research prototype but a fully fledged standalone software solution for generating synthetic process data.

The Software

DALG: The Data Aware Event Log Generator is a standalone software for generating synthetic event logs with a focus on realistic data perspectives. DALG is open source and available for Windows and Linux at github.com/DavidJilg/DALG. I will only briefly describe the software in this article since, together with some of my researcher colleagues, I have already published several publications concerning the approach and the software. All publications associated with DALG can be found in the following section (click here to view that section).

Process data typically consists of a series of events that occurred as part of a process. Additionally, it can contain data that is generated by an event or that can affect the occurrence of events. This part of a process dataset is usually called the data perspective, while the series of events is called the control flow perspective. There are many tools that allow to generate synthetic process data. However, most tools only allow for the generation of synthetic control flow perspectives and the tools that allow to generate a data perspective cannot generate realistic or even semantically correct data.

DALG innovates in this field by allowing users to generate synthetic process data with a fairly realistic data perspective. It achieves this by allowing the user to specify a large quantity of different semantic information about the process so that the tool knows what kind of data an event should generate.

Publications

The following three publications are associated with DALG.

Master Thesis

Both DALG and its approach have their origin in my master thesis, which you can find on ResearchGate.

Jilg, David. (2022). Generating Synthetic Procedural Multi-Perspective Electronic Healthcare Treatment Cases

SAMPLE Approach

The following paper describes the SAMPLE (semantic approach for multi-perspective event log generation) approach which DALG used to generate synthetic process data. It does not provide many of details about DALG itself.

Grüger, Joscha & Geyer, Tobias & Jilg, David. (2022). SAMPLE: A semantic approach for multi-perspective event log generation.

DALG

The following paper focuses specifically on DALG and highlights its best features and innovations:

Jilg, David & Grüger, Joscha & Geyer, Tobias. (2023). DALG: The Data Aware Event Log Generator.