Build, test, and compare workflow systems on realistic data

WfCommons is an open-source ecosystem of workflow execution instances, synthetic workflow generators, and benchmark specifications. It helps the community study scheduling, performance, resilience, and emerging AI-driven workflow automation on modern distributed and HPC platforms.

   
Real instances Workflow executions curated in a common JSON format (WfFormat).
Synthetic realism Generate realistic workflows from real traces.
Benchmarks Produce executable specs for repeatable experiments and fair comparisons.

Why WFCommons

Workflow research is evolving: heterogeneous nodes (CPU/GPU), data-intensive AI pipelines, and autonomous “agentic” control loops demand datasets and benchmarks that reflect reality.

HPC + data at scale

Study scheduling, throughput, and robustness for complex workflows with thousands of tasks and diverse I/O footprints—across clusters, clouds, and supercomputers.

AI-ready workflow data

Build training and evaluation corpora for anomaly detection, runtime prediction, and resource-aware optimization by working from a consistent, validated schema.

Agentic workflows & autonomy

Test LLM/agent planners that adapt DAGs on the fly (replanning, retries, provenance-aware decisions) using realistic workflow “digital twins” derived from production traces.

Ecosystem components

WFCommons is organized as interoperable building blocks—each useful on its own, stronger together.

Architecture docs →

WfFormat — a common schema

A JSON specification for representing workflow execution instances and synthetic workflows, enabling tools and simulators to consume data consistently.

WfInstances — open workflow executions

A curated collection of real workflow runs (instances), shared in WfFormat so they can be analyzed, simulated, or used to derive generators and benchmarks.

WfChef — derive recipes automatically

Analyze real instances to discover recurring dependency patterns and statistical distributions, producing “recipes” that capture workflow structure and behavior.

WfGen / WfBench / WfSim — generate & evaluate

Generate synthetic workflows from recipes, produce benchmark specs for repeatable experiments, and run simulations in compatible frameworks.

Cite WFCommons

Use these references in papers, reports, and benchmark artifacts.

Primary paper

Future Generation Computer Systems (FGCS), 2022.

@article{WFCommons,
  title   = {WFCommons: A Framework for Enabling Scientific Workflow Research and Development},
  author  = {Coleman, Taina and Casanova, Henri and Pottier, Loic and Kaushik, Manav and Deelman, Ewa and Ferreira da Silva, Rafael},
  journal = {Future Generation Computer Systems},
  volume  = {128},
  pages   = {16--27},
  year    = {2022},
  doi     = {10.1016/j.future.2021.09.043}
}
DOI

Docs & repositories

These are the canonical entry points for users and reviewers.

Research Outcomes Enabled by WFCommons

WFCommons has enabled research in 54 research articles. These articles include research outcomes produced by our own team as well as other researchers from the workflows community.