Technology
Mar 13, 2026
Technology

Dave Smith-Uchida, Technical Leader at Veeam, has spent more than three decades working across some of the most complex areas of computing, from embedded systems to supercomputing and modern cloud platforms. Today, as a Technical Leader at Veeam working on the Kasten product, he continues to focus on one of the most pressing challenges in modern infrastructure: protecting data in Kubernetes environments. His involvement in the Kubernetes ecosystem runs deep. Smith-Uchida is a founding member of the Kubernetes Data Protection Working Group and previously served as the architect behind Velero, the open-source Kubernetes backup project. He also remains active in several open-source initiatives including Kubernetes, Velero, Kanister and Astrolabe.
One of the most persistent myths about Kubernetes is that it is inherently “stateless”. According to this belief, a Kubernetes cluster can simply be deleted and recreated with little consequence beyond a brief interruption to service. While this may have reflected Kubernetes’ early design philosophy, the platform has evolved significantly beyond its original role.
Kubernetes began as an orchestration platform for stateless containers, where applications relied on external systems for data storage. Over time, however, it has grown into a platform capable of hosting stateful workloads directly within the cluster itself. Applications now commonly store data in Persistent Volumes (PVs), and many Kubernetes-native applications also rely on the Kubernetes API server to store operational state and working information.
This shift has fundamentally changed the way organisations must approach data protection. As Kubernetes environments increasingly host critical workloads, the need to protect data stored within these clusters has become essential. Data loss, corruption, accidental deletion and even ransomware attacks are real risks that must be addressed through robust protection strategies.
Protecting data in Kubernetes, however, is not as straightforward as applying traditional backup and recovery methods. Kubernetes environments are highly dynamic by design. Applications can allocate storage resources automatically, scale new services on demand and even create additional applications as part of their operations.
At the same time, Kubernetes clusters are inherently distributed systems. Workloads run across multiple worker nodes, and storage may be attached to several physical machines while moving dynamically between nodes as workloads are scheduled or rescheduled. These characteristics create additional complexity for backup, restore and replication processes.
To address these challenges, the Kubernetes community—led in part by the Kubernetes Data Protection Working Group—has been developing guidance and technical capabilities that support effective data protection within Kubernetes environments. A community white paper explores these protection workflows in depth and outlines when and how Kubernetes data should be protected.
Several Kubernetes features have already emerged to support data protection strategies.
Persistent Volume snapshots provide the ability to create point-in-time copies of storage volumes. These snapshots can then be used as the source for backup operations without requiring applications to be paused. While the underlying storage system ultimately performs the snapshot operation, Kubernetes provides a standardised interface to manage these snapshots, similar to how Persistent Volumes standardise storage allocation across different storage platforms.
Another feature, Volume Populator, provides a mechanism for creating Persistent Volumes and populating them with data before an application begins using them. This approach can simplify and accelerate data restoration processes while avoiding complications related to provisioning and managing storage during recovery.
Kubernetes is also expanding support for object storage through the Container Object Storage Interface (COSI). By introducing standard APIs for managing object storage, COSI simplifies how data protection tools interact with storage resources and enables operators and applications to implement consistent protection workflows.
Performance improvements are also emerging. Changed Block Tracking (CBT), currently in beta, improves the efficiency of backup and replication processes by identifying which blocks of data have changed between snapshots. Rather than copying entire volumes during each backup, systems can copy only the modified blocks. While many storage systems already support CBT through proprietary APIs, Kubernetes is working to standardise this capability within its storage ecosystem.
Development continues in other areas as well. One important enhancement currently under development is Volume Group Snapshots. Many modern applications rely on multiple storage volumes simultaneously, and ensuring consistency across these volumes during backup operations can be challenging. Snapshotting volumes individually while an application is running may introduce inconsistencies between volumes. Traditionally, applications would need to be paused to ensure consistent snapshots, which can lead to downtime. Volume Group Snapshots aim to solve this by capturing consistent snapshots across multiple volumes simultaneously without requiring applications to be quiesced.
Kubernetes operators introduce another dimension to data protection. Operators allow complex applications to be managed through the Kubernetes control plane using custom resources that define desired behaviour. They can automate deployment, monitor application health and perform maintenance tasks. Some operators even include built-in capabilities for backup, restore and replication.
However, operator-based environments also introduce challenges for data protection. Backup systems must consider resources created dynamically by operators, ensure proper ordering of backup operations and coordinate with operator-native protection mechanisms. Achieving consistent configuration across operator-driven backup and replication processes remains an area of active discussion within the Kubernetes Data Protection Working Group.
Ultimately, forming a reliable data protection strategy for Kubernetes begins with understanding an application’s resiliency requirements. Two key metrics guide these decisions: the Recovery Time Objective (RTO), which defines how quickly systems must be restored after a disruption, and the Recovery Point Objective (RPO), which determines how much data loss is acceptable.
Different strategies—from GitOps-based redeployment to backup and restore or full replication—offer different trade-offs in cost, complexity and recovery performance.
Replication can significantly improve uptime and reduce RPO, potentially to zero with synchronous replication. However, replication alone cannot replace traditional backup. Software bugs may replicate corrupted data across systems, ransomware attacks require immutable recovery points, and synchronous replication can introduce performance overhead and infrastructure costs.
As Kubernetes continues to evolve into a platform for stateful, business-critical applications, data protection strategies must evolve alongside it. Organisations adopting Kubernetes must ensure they have the right mechanisms in place to safeguard the data that increasingly lives inside their clusters.
Related Articles