Biography
I am a doctoral researcher at INDElab, University of Amsterdam, advised by Dr. Victoria Degeler and Prof. Dr. Paul Groth.
My research focuses on interpretable decision-making through Neurosymbolic knowledge discovery and inference, applied to Digital Twins and tabular datasets.
I hold an MSc in Computer Science from TU Munich and a BSc in Computer Engineering from Yildiz Technical University, Istanbul . Previously, I worked as a research assistant, software engineer and consultant.
I am open to academic and non-academic collaborations -- feel free to reach out: e.karabulut@uva.nl
Recent Activities
- December 2025 - On the 8th of December, I will be giving a talk at the VU Machine Learning seminar on "Scalable Knowledge Discovery with Neurosymbolic Rule Learning."
- November 2025 - PyAerial documentation is now much more comprehensive and has a lot of examples. If you are interested in interpretable machine learning or knowledge discovery, check out PyAerial!
- November 2025 - I gave a seminar on Scalable Knowledge Discovery with Neurosymbolic Rule Learning at the Data Science Center of University of Amsterdam, and wrote a blog post derived from the seminar content.
- November 2025 - I will be starting a research visit at Translational AI Lab - Amsterdam UMC for 3 months as of November, applying Neurosymbolic knowledge discovery to medical datasets.
- October 2025 - On Sunday, October 26th, I will be presenting our work on Discovering Association Rules in High-Dimensional Small Tabular Data at the ECAI2025 conference workshop ANSyA. See our preprint on arXiv.
Publications
-
E. Karabulut, D. Daza, P. Groth and V. Degeler. "Discovering Association Rules in High-Dimensional Small Tabular Data". In ANSyA'25: 1st International Workshop on Advanced Neuro-Symbolic Applications, co-located with 28th European Conference on Artificial Intelligence (ECAI 2025). Abstract PDF
Association Rule Mining (ARM) aims to discover patterns between features in datasets in the form of propositional rules, supporting both knowledge discovery and interpretable machine learning in high-stakes decision-making. However, in high-dimensional settings, rule explosion and computational overhead render popular algorithmic approaches impractical without effective search space reduction, challenges that propagate to downstream tasks. Neurosymbolic methods, such as Aerial+, have recently been proposed to address the rule explosion in ARM. While they tackle the high dimensionality of the data, they also inherit limitations of neural networks, particularly reduced performance in low-data regimes. This paper makes three key contributions to association rule discovery in high-dimensional tabular data. First, we empirically show that Aerial+ scales one to two orders of magnitude better than state-of-the-art algorithmic and neurosymbolic baselines across five real-world datasets. Second, we introduce the novel problem of ARM in high-dimensional, low-data settings, such as gene expression data from the biomedicine domain with around 18k features and 50 samples. Third, we propose two fine-tuning approaches to Aerial+ using tabular foundation models. Our proposed approaches are shown to significantly improve rule quality on five real-world datasets, demonstrating their effectiveness in low-data, high-dimensional scenarios.
-
E. Karabulut, P. Groth, and V. Degeler. "Pyaerial: Scalable association rule mining from tabular data". SoftwareX, 31:102341, 2025. ISSN 2352-7110. Abstract PDF
Association Rule Mining (ARM) is a knowledge discovery technique that identifies frequent patterns as logical implications within transaction datasets and has been applied across domains such as e-commerce, healthcare, and cyber–physical systems. However, many state-of-the-art ARM methods, typically algorithmic or nature-inspired, suffer from rule explosion and long execution times. Aerial is a novel neurosymbolic ARM algorithm for tabular datasets that mitigates rule explosion using neural networks, while remaining compatible with existing approaches. Aerial transforms tables into transactions, uses an autoencoder to learn compact neural representations, and extracts logical rules from the neural representations. This paper presents PyAerial, a Python library that makes Aerial accessible and easy to use on generic tabular datasets for end users in a domain-independent way. Besides association rules, PyAerial can also be used to extract frequent itemsets, learn classification rules, apply item constraints to learn rules over the features of interest rather than all features, pre-discretize numerical data for ARM, and can be run on a GPU.
-
E. Karabulut, P. Groth, V. Degeler, Neurosymbolic association rule mining from tabular data, in: Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, volume 284 of Proceedings of Machine Learning Research, PMLR, 2025, pp. 565–588. Abstract PDF
Association Rule Mining (ARM) is the task of mining patterns among data features in the form of logical rules, with applications across a myriad of domains. However, high-dimensional datasets often result in an excessive number of rules, increasing execution time and negatively impacting downstream task performance. Managing this rule explosion remains a central challenge in ARM research. To address this, we introduce Aerial+, a novel neurosymbolic ARM method. Aerial+ leverages an under-complete autoencoder to create a neural representation of the data, capturing associations between features. It extracts rules from this neural representation by exploiting the model's reconstruction mechanism. Extensive evaluations on five datasets against seven baselines demonstrate that Aerial+ achieves state-of-the-art results by learning more concise, high-quality rule sets with full data coverage. When integrated into rule-based interpretable machine learning models, Aerial+ significantly reduces execution time while maintaining or improving accuracy.
-
Erkan Karabulut, Paul Groth, and Victoria Degeler. "Learning Semantic Association Rules from Internet of Things Data". Neurosymbolic Artificial Intelligence, 2025:1. doi:10.1177/29498732251377518. Abstract PDF
Association rule mining (ARM) is the task of discovering commonalities in data in the form of logical implications. ARM is used in the Internet of Things (IoT) for different tasks, including monitoring and decision-making. However, existing methods give limited consideration to IoT-specific requirements such as heterogeneity and volume. Furthermore, they do not utilize important static domain-specific description data about IoT systems, which is increasingly represented as knowledge graphs. In this paper, we propose a novel ARM pipeline for IoT data that utilizes both dynamic sensor data and static IoT system metadata. Furthermore, we propose an autoencoder-based neurosymbolic ARM method (Aerial) as part of the pipeline to address the high volume of IoT data and reduce the total number of rules that are resource-intensive to process. Aerial learns a neural representation of a given dataset and extracts association rules from this representation by exploiting the reconstruction (decoding) mechanism of an autoencoder. Extensive evaluations on three IoT datasets from two domains show that ARM on both static and dynamic IoT data results in more generically applicable rules while Aerial can learn a more concise set of high-quality association rules than the state-of-the-art, with full coverage over the datasets.
-
Erkan Karabulut, Paul Groth, and Victoria Degeler. "3K: Knowledge-Enriched Digital Twin Framework." (2024). In LongevIoT'24: 1st International Workshop on Longevity in IoT Systems, co-located with 14th International Conference on Internet of Things, November 19–22, 2024, Oulu, Finland. Abstract PDF
Digital Twins (DTs) are the digital equivalent of physical entities that facilitate, among others, monitoring and decision-making, thus helping extend the longevity of the twinned entity. DTs with automated decision-making capabilities require explainable inference mechanisms, especially for critical infrastructures such as water networks. Here we introduce 3K, a DT framework that aims for knowledge-enriched inference that is explainable and fast, by synthesizing knowledge representation (semantics) and knowledge discovery methods. 3K constructs a knowledge graph, which is becoming a mainstream way of metadata storage in DTs, and proposes a new method that can run on both sensor data and knowledge graphs to learn semantic association rules. The rules represent the expected working conditions of the DT and we argue that when combined with domain knowledge in the form of ontological axioms, semantic association rules can help perform downstream tasks in DTs, including extending the longevity of the twinned entities such as an Internet of Things (IoT) system. Furthermore, we demonstrate the 3K framework in a water distribution network use case and show how it can be used for downstream tasks.

