Why WFCommons

Workflow research is evolving: heterogeneous nodes (CPU/GPU), data-intensive AI pipelines, and autonomous “agentic” control loops demand datasets and benchmarks that reflect reality.

HPC + data at scale

Study scheduling, throughput, and robustness for complex workflows with thousands of tasks and diverse I/O footprints—across clusters, clouds, and supercomputers.

AI-ready workflow data

Build training and evaluation corpora for anomaly detection, runtime prediction, and resource-aware optimization by working from a consistent, validated schema.

Agentic workflows & autonomy

Test LLM/agent planners that adapt DAGs on the fly (replanning, retries, provenance-aware decisions) using realistic workflow “digital twins” derived from production traces.

Ecosystem components

WFCommons is organized as interoperable building blocks—each useful on its own, stronger together.

Architecture docs →

WfFormat — a common schema

A JSON specification for representing workflow execution instances and synthetic workflows, enabling tools and simulators to consume data consistently.

WfFormat schema

WfInstances — open workflow executions

A curated collection of real workflow runs (instances), shared in WfFormat so they can be analyzed, simulated, or used to derive generators and benchmarks.

WfInstances browser WfInstances repo

WfChef — derive recipes automatically

Analyze real instances to discover recurring dependency patterns and statistical distributions, producing “recipes” that capture workflow structure and behavior.

WfGen / WfBench / WfSim — generate & evaluate

Generate synthetic workflows from recipes, produce benchmark specs for repeatable experiments, and run simulations in compatible frameworks.

Cite WFCommons

Use these references in papers, reports, and benchmark artifacts.

Primary paper

Future Generation Computer Systems (FGCS), 2022.

@article{WFCommons,
  title   = {WFCommons: A Framework for Enabling Scientific Workflow Research and Development},
  author  = {Coleman, Taina and Casanova, Henri and Pottier, Loic and Kaushik, Manav and Deelman, Ewa and Ferreira da Silva, Rafael},
  journal = {Future Generation Computer Systems},
  volume  = {128},
  pages   = {16--27},
  year    = {2022},
  doi     = {10.1016/j.future.2021.09.043}
}

DOI

Docs & repositories

These are the canonical entry points for users and reviewers.

Documentation WFCommons (Python package) WfInstances (datasets) WfFormat (JSON schema)

Contact: support@WFCommons.org

Research Outcomes Enabled by WFCommons

WFCommons has enabled research in 54 research articles. These articles include research outcomes produced by our own team as well as other researchers from the workflows community.

Y. Semenov, O. Sukhoroslov, Bi-objective Workflow Scheduling in the Cloud: What is the Real State-of-the-Art?, Supercomputing, RuSCDays 2024, 2025

F. Lehmann, J. Bader, F. Tschirpke, N. de Mecquenem, A. Loser, S. Becker, K. E. Lewinska, L. Thamsen, U. Leser, WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows, 25th IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

M. Wilhelm, T. Pionteck, Static task mapping for heterogeneous systems based on series-parallel decompositions, 34th Heterogeneity in Computing Workshop (HCW 2025), 2025

L. C. R. Alvarenga, Y. Frota, D. de Oliveira, R. Coutinho, Optimizing Resource Estimation for Scientific Workflows in HPC Environments: A Layered-Bucket Heuristic Approach, Concurrency and Computation: Practice and Experience, 2025

S. Kulagina, A. Benoit, H. Meyerhenke, Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures, 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

S. Wang, H. Zhang, T. Wu, Y. Zhang, W. E. Zhang, Q. Z. Sheng, Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers, IEEE Transactions on Services Computing, 2025

J. Coleman, R. V. Agrawal, E. Hirani, B. Krishnamachari, Evaluating the Impact of Algorithmic Components on Task Graph Scheduling, 28th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2025), 2025

J. Coleman, B. Krishnamachari, PISA: An Adversarial Approach To Comparing Task Graph Scheduling Algorithms , 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2025), 2025

M. Kroczek, J. Zawalska, K. Rycerz, Workflow decomposition algorithm for scheduling with quantum annealer-based hybrid solver, , 2025

P. Barredo, J. Puente, Energy-aware cooperative multi-fitness evolutionary algorithm for workflow scheduling in cloud computing, Natural Computing, 2025

Q. Gao, L. Han, S. Hunold, Y. Robert, F. Vivien, Coping with Silent Errors for Workflows of Moldable Tasks, , 2025

D. Schweisgut, A. Benoit, Y. Robert, H. Meyerhenke, Carbon-Aware Workflow Scheduling with Fixed Mapping and Deadline Constraint, , 2025

H. Lee, J. Firoz, N. R. Tallent, L. Guo, M. Halappanavar, FlowForecaster: Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Forecasts, 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025

M. Wang, S. Jahangiri, K. Maheshwari, C. Li, Bundled for Success: A WfBench-Driven Assessment of Texera, 54th International Conference on Parallel Processing Companion (ICPP Companion '25), 2025

J. McDonald, Y.C. Wong, K. Mehta, F. Suter, R. Ferreira da Silva, L. Pottier, E. Deelman, H. Casanova, Determining Levels of Detail for Simulators of Parallel and Distributed Computing Systems via Automated Calibration, Proceedings of the SC'25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2025

J. Chamorro, G. Twigg-Ho, J. Coleman, T. Coleman, B. Krishnamachari, M. Khodabandehlou, Adapting Classic Scheduling Heuristics for Online Execution under Uncertainty, Proceedings of the SC'25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2025

M. Fan, L. Ye, X. Zuo, X. Zhao, A bidirectional workflow scheduling approach with feedback mechanism in clouds, Expert Systems with Applications, 2024

K. Karmakar, A. Tarafdar, R. K. Das, S. Khatua, Cost-efficient Workflow as a Service using Containers, Journal of Grid Computing, 2024

P. Barredo, J. Puente, Cooperative Multi-fitness Evolutionary Algorithm for Scientific Workflows Scheduling, International Work-Conference on the Interplay Between Natural and Artificial Computation, 2024

J. McDonald, J. Dobbs, Y.C. Wong, R. Ferreira da Silva, H. Casanova, An exploration of online-simulation-driven portfolio scheduling in workflow management systems, Future Generation Computer Systems, 2024

S Kulagina, H Meyerhenke, A Benoit, Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms, 53rd International Conference on Parallel Processing (ICPP '24), 2024

J. R. Coleman, Dispersed Computing in Dynamic Environments, PhD Thesis, 2024

Y. Su, V. Anand, J. Yu, J. Tan, A. Wierman, Learning-Augmented Energy-Aware List Scheduling for Precedence-Constrained Tasks, ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 2024

B. Lin, C. Lin, X. Chen, M. Lin, G. Huang, Z. Xu, Cost-Driven Scheduling for Workflow Decision Making Systems in Fuzzy Edge-Cloud Environments, IEEE Transactions on Automation Science and Engineering, 2024

M. Fan, X. Zhao, X. Zuo, L. Ye, A Budget-Constrained Workflow Scheduling Approach With Priority Adjustment and Critical Task Optimizing in Clouds, IEEE Transactions on Automation Science and Engineering, 2024

J. Shin, D. Arroyo, A. Tantawi, C. Wang, A. Youssef, R. Nagi, Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition, ACM Symposium on Cloud Computing (SoCC'24), 2024

L. F. D. Versluis, Reproducible Performance Analysis & Engineering of Large-Scale IT Infrastructures, , 2024

A. A. Da Silva, R. P. Hong Enriquez, G. Rattihalli, V. Thurimella, R. Ferreira da Silva, D. Milojicic, Enabling HPC Scientific Workflows for Serverless, , 2024

E. Saeedizade, M. Ashtiani, Scientific workflow scheduling algorithms in cloud environments: a comprehensive taxonomy, survey, and future directions, , 2024

B. Qin, Q. Lei, X. Wang, DGCQN: a RL and GCN combined method for DAG scheduling in edge computing, The Journal of Supercomputing, 2024

K. Alam, B. Roy, A. Serebrenik, Reusability Challenges of Scientific Workflows: A Case Study for Galaxy, arXiv preprint, 2023

L. Yang, L. Ye, Y. Xia, Y. Zhan, Look-ahead workflow scheduling with width changing trend in clouds, Future Generation Computer Systems, 2023

T. Coleman, H. Casanova, R. Ferreira da Silva, Automated generation of scientific workflow generators with WfChef, Future Generation Computer Systems, 2023

T. Coleman, Scientific Workflow Generation and Benchmarking, PhD Thesis, 2023

J. Zhang, X. Li, L. Chen, R. Ruiz, Scheduling Workflows with Limited Budget to Cloud Server and Serverless Resources, IEEE Transactions on Services Computing, 2023

Q. Zhang, Q. Wu, M. Zhou, J. Wen, S. Yao, A Communication Contention-Cognizant Scheduling Approach for Workflow Execution Across Public and Private Clouds, IEEE Transactions on Automation Science and Engineering, 2023

O. Sukhoroslov, Scheduling of Workflows with Task Resource Requirements in Cluster Environments, International Conference on Parallel Computing Technologies, 2023

P. Barredo, J. Puente, Precise makespan optimization via hybrid genetic algorithm for scientific workflow scheduling problem, Natural Computing, 2023

H. Casanova, K. Berney, S. Chastel, R. Ferreira da Silva, WfCommons: Data Collection and Runtime Experiments using Multiple Workflow Systems, 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), 2023

O. Sukhoroslov M. Gorokhovskii, Benchmarking DAG Scheduling Algorithms on Scientific Workflow Instances, Russian Supercomputing Days, 2023

A. Benoit, L. Perotin, Y. Robert, H. Sun, Checkpointing Workflows à la Young/Daly Is Not Good Enough, ACM Transactions on Parallel Computing, 2022

Z. Li, Y. Liu, L. Guo, Q. Chen, J. Cheng, W. Zheng, M. Guo, FaaSFlow: enable efficient workflow execution for function-as-a-service, 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022

L. Xing, M. Zhang, H. Li, M. Gong, J. Yang, K. Wang, Local search driven periodic scheduling for workflows with random task runtime in clouds, Computers & Industrial Engineering, 2022

H. Casanova, Y. C. Wong, L. Pottier, R. Ferreira da Silva, On the Feasibility of Simulation-driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems, Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2022

Z. Zhang, Q-S. Hua, X. Zhang, H. Jin, X. Liao, DAG Scheduling with Communication Delays Based on Graph Convolutional Neural Network, Wireless Communications and Mobile Computing, 2022

T. Coleman, H. Casanova, L. Pottier, M. Kaushik, E. Deelman, R. Ferreira da Silva, Wfcommons: A framework for enabling scientific workflow research and development, Future Generation Computer Systems, 2022

M. Kiamari, B. Krishnamachari, GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks, GNNet '22: Proceedings of the 1st International Workshop on Graph Neural Networking, 2022

P. Barredo, J. Puente, Robust Makespan Optimization via Genetic Algorithms on the Scientific Workflow Scheduling Problem, International Work-Conference on the Interplay Between Natural and Artificial Computation, 2022

W. Koch, An Approach for Automating the Calibration of Simulations of Parallel and Distributed Computing Systems, PhD Thesis, 2021

T. Coleman, H. Casanova, T. Gwartney, R. Ferreira da Silva, Evaluating Energy-Aware Scheduling Algorithms for I/O-Intensive Scientific Workflows, International Conference on Computational Science (ICCS), 2021

E. Saeedizade, M. Ashtiani, DDBWS: a dynamic deadline and budget-aware workflow scheduling algorithm in workflow-as-a-service environments, The Journal of Supercomputing, 2021

M. Orr, O. Sinnen, Optimal task scheduling for partially heterogeneous systems, Parallel Computing, 2021

S. Tuli, G. Casale, N. R. Jennings, MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems, IEEE Transactions on Parallel and Distributed Systems, 2021

T. Coleman, H. Casanova, R. Ferreira da Silva, WfChef: Automated Generation of Accurate Scientific Workflow Generators, 2021 IEEE 17th International Conference on eScience (eScience), 2021