The Kubeflow 1.4 release lays several important building blocks for the use of advanced metadata workflows. A quick summary of 1.4’s top deliveries includes:

  • Advanced metadata workflows with improved metric visualization and pipeline step caching in Kubeflow Pipelines (KFP) via the KFP Software Development Kit (SDK)
  • A new KFServing model user interface that displays ML model status, configuration, yaml, logs, and metrics
  • New Optuna Suggestion Service with multivariate TPE algorithm and Sobol’s Quasirandom Sequence support for hyperparameter tuning
  • A new, unified training operator that supports all deep learning frameworks with a Python SDK, enhanced monitoring and advanced scheduling support

Kubeflow 1.4 enables the use of metadata in advanced machine learning (ML) workflows, especially in the Kubeflow Pipelines SDK. With the Pipelines SDK and its new V2-compatible mode, users can create advanced ML pipelines with Python functions that use the MLMD as input/output arguments. This simplifies metrics visualization.

Another enhancement to Pipelines is the option to use the Emissary executor for non-Docker Kubernetes container runtime requirements. In addition, 1.4 can support metadata-based workflows to streamline the creation of TensorBoard visualizations and to serve ML models.

Core improvements to code, process, and documentation

For the Kubeflow Working Groups, 1.4 was primarily a maintenance release, which enabled the Community to concentrate on core improvements to code, process, and documentation. In the 2021 Kubeflow User Survey, users requested documentation improvements (please see the figure below). The Kubeflow 1.4 release cycle included the 1.4 Docs Sprint that generated nearly fifty (50) PRs. These PRs were tracked in this issue and this Kanban board, and we encourage more users to contribute by reading and improving the Kubeflow documentation.

docs sprint

The 1.4 release improvements simplify future feature development by reducing redundant code, increasing CI/CD, and automating testing. An important delivery was the new Unified Training Operator for Tensorflow, PyTorch, MXNet, and XGBoost PR#1302. 1.4 also initiated the Community’s adoption of a defined release process in its new Kubeflow Release Handbook. The Handbook defines the stages of the release and contributors’ roles, which has helped to improve responsibilities and quality.

Simplified installation

As shown in the Kubeflow User Survey (see the figure above), users have also asked for installation improvements. In Kubeflow 1.3, the Community refactored the Kubeflow deployment pattern to use manifests files (in yaml or json), which are stored in Git repositories, and then deployed using the Kustomize installation tool. This flexible installation pattern simplifies customization by overlaying manifests. This pattern is now being exploited in 1.4.

In 1.4, the Community provides an upstream set of base manifests in the Kubeflow manifest repo. Third parties have built custom installation guides or distributions with overlays that extend the base manifests. In 1.4, the third party overlays were removed from the Kubeflow manifest repo and moved to the repository of their choosing. This pattern provides third parties more flexibility to upgrade and document their overlays. You can see a full set of installation guides and distributions here.

In addition, on-prem Kubeflow users can use the base installation manifests which utilize open source solutions like Istio, Dex, and AuthService for authentication. The Community and the Manifests Working Group are actively working to provide extra overlays and patches to accommodate more advanced use cases and installations. For example, we recently configured Knative to work with the AuthService and Dex.

Dependencies, change logs, tracking issues and roadmaps

Kubeflow has many software dependencies. In 1.4, the top dependencies used in testing are defined below:

Kubeflow Dependency Version
Kubernetes 1.19.0
Istio 1.9.6
Knative 0.22.1
Kustomize 3.2.0

This chart provides links to important details from the Working Groups, including their 1.4 tracking issues, change logs, and roadmaps. Please note that the Working Groups use version numbers that are specific to their project. As a result, many Kubeflow components, which have been incorporated and tested in Kubeflow 1.4, may have a different version number than 1.4.

Working Group Changelog / Release Notes Roadmap
Notebooks 1.4,
Training Operators 1.3,
Training Operator Changelog
Training Operators Roadmap
Katib V0.12,
Katib Release Notes
PR for v0.12
Katib Roadmap
Kubeflow Pipelines v1.7
Release Notes, Changelog
Pipelines Roadmap
KFServing v0.6.1,
KFServing Roadmap

Kubeflow 1.4 video update and tutorials

The Kubeflow Working Group representatives have recorded a presentation on Kubeflow 1.4’s new features, which you can find on the Kubeflow YouTube channel. Additionally, Kubeflow 1.4’s new features are easy to try in these tutorials:

  • AutoML Tutorial with metadata based workflows to build TensorBoards and to serve models
  • Run Katib from your local laptop by following this example.
  • KFP Tutorial using Pipelines SDK v2 to orchestrate your ML workflow as a pipeline
  • KFServing Tutorial
  • Training Operator Tutorial

What’s coming

The Kubeflow Community is working on Kubeflow 1.5 planning and the Kubeflow Conformance Program proposal. Please watch for updates on these topics and more.

Join the community

We would like to thank everyone for their efforts on Kubeflow 1.4, especially the users, code contributors and working group leads. As you can see from the extensive contributions to Kubeflow 1.4, the Kubeflow Community is vibrant and diverse, and solving real world problems for organizations around the world.

Want to help? The Kubeflow Community Working Groups hold open meetings, public lists, and are always looking for more volunteers and users to unlock the potential of machine learning. If you’re interested in becoming a Kubeflow contributor, please feel free to check out the resources below. We look forward to working with you!