Practical benchmark of open-source MLOps platforms: Comparing MLflow, Metaflow and ZenML across model type

BADEA, Dan Gabriel; MONEA, Damian; SAVA, Lilia

Practical benchmark of open-source MLOps platforms: Comparing MLflow, Metaflow and ZenML across model type

BADEA, Dan Gabriel; MONEA, Damian; SAVA, Lilia

URI: https://doi.org/10.1109/RoEduNet68395.2025.11208376
https://repository.utm.md/handle/5014/35303

Date: 2025

Abstract:

This paper presents a comparison between three popular open-source MLOps frameworks: MLflow, Metaflow, and ZenML, studied in three real-world machine learning scenarios: extractive text summarization using a BERT-based model, image analysis using Res Net, and tabular data classification using Random Forest. The comparison was carried out by developing MLOps-enhanced versions of the baseline code using each studied framework, for each of the three models. Of the three frameworks studied MLflow is notable for its low level of integration: less than 1.2% additional runtime and less than 104 lines of additional code. Although ZenML requires about 208 additional lines and increases execution time by about 19.6%, traceability is significantly improved in exchange. Furthermore, Metaflow provides strong automatic artifact versioning, which adds approximately 195 additional lines of code and increases runtime by about 110.7%. Despite these variations, reproducibility was confirmed by the fact that all platforms maintained consistent model performance under the same conditions, within a margin of 0.1 % (Table IV). Disk usage increased by about 220.4M× for MLflow, 220× for ZenML and 143.4Mx for Metaflow, these findings indicate that Metaflow provides thorough provenance at the cost of additional code and runtime overhead, ZenML strikes a reasonable balance between control and usability and MLflow is best suited for fast, low-overhead experiment tracking.