The CKA is unlike almost every other certification in this atlas, because there is nothing to recognise and nothing to guess. It is entirely performance-based: you sit at a live terminal, in real Kubernetes clusters, and you are scored on whether the tasks you were asked to do actually work afterwards. There are no multiple-choice questions. That single fact reshapes how you must prepare, because reading about Kubernetes earns you nothing on this exam, only doing it does. The two things that decide the result are a solid mental model of how a cluster fits together and raw speed and accuracy with kubectl under a tight clock. This guide is a full self-study course built around that reality: it walks through all five curriculum domains in depth, explains the operational concepts behind each, and turns them into a hands-on plan that treats speed as a skill to train, not an afterthought. It is original teaching material and study guidance only. It contains no real or simulated exam tasks, and you should always confirm the current curriculum and exam environment version on the CNCF CKA page before you book.
Chapter 1: How the exam works and how to use this guide
The format is the difficulty
The CKA gives you two hours to solve a set of performance-based tasks from the command line across several clusters, and you need 66% to pass. You may consult the official Kubernetes documentation at kubernetes.io during the exam, which sounds generous until you realise the constraint is time, not knowledge. The exam is open-book in the narrowest sense: the docs are there for exact syntax and rarely-used flags, not for learning a topic you never practised. Candidates who fail rarely fail because they did not know the material; they fail because they were too slow, got lost in the documentation, or made small errors under pressure that broke the task. Internalise this from the start, because it changes every study decision that follows.
The five domains and their weights
The curriculum is organised into five domains with published weights, and those weights are your single most important planning fact. They are Troubleshooting at 30%, Cluster Architecture, Installation and Configuration at 25%, Services and Networking at 20%, Workloads and Scheduling at 15%, and Storage at 10%. Troubleshooting is the heaviest domain by a clear margin, which tells you that diagnosing and fixing broken things deserves the most practice, and it is also the domain that most rewards a deep mental model rather than memorised commands. The exam environment is refreshed to track recent Kubernetes releases, so confirm the current version in the CNCF FAQ and practise on a matching version.
How to use this course
Read the domain chapters in order, but understand that the order here is pedagogical, not the exam’s weighting. Cluster architecture comes early because you cannot troubleshoot a cluster you do not understand the shape of; troubleshooting comes last among the domains because it draws on all the others. Every concept in this guide should become something you have typed, broken and fixed in a real cluster, not just read. Throughout, the guide flags the imperative kubectl patterns and time-saving habits that turn correct-but-slow into correct-and-fast, because on this exam those are different outcomes. Short illustrations appear to make ideas concrete; none are exam tasks.
Chapter 2: Cluster Architecture, Installation and Configuration (25%)
This domain is about the cluster itself: how its parts fit together, how you build and secure it, and how you protect its state. It is the second-heaviest domain, and crucially its tasks are highly scorable once practised, because they are concrete and repeatable rather than open-ended. Master these and you bank a quarter of the marks reliably.
The mental model: control plane and nodes
Before any command, you need the shape of a cluster clearly in your head, because troubleshooting later depends on it. A Kubernetes cluster has a control plane and a set of worker nodes. The control plane runs the API server (the front door every command and component talks to), etcd (the key-value store that holds the entire cluster state), the scheduler (which decides which node a new pod runs on), and the controller manager (which drives the cluster toward its desired state). Each worker node runs the kubelet (the agent that starts and supervises containers and reports node health) and a container runtime. When you can name these components and say what each does, most troubleshooting becomes a question of “which of these is failing?”, which is exactly why this model is foundational.
Installing a cluster with kubeadm
The exam expects you to work with clusters built using kubeadm, the standard bootstrapping tool, and to manage their lifecycle. You should be able to initialise a control plane, join worker nodes, and, importantly, upgrade a cluster from one version to the next following the correct order (control plane components first, then the kubelet on each node, draining nodes appropriately). Upgrades are a classic scorable task: the steps are precise and the same every time, so practise the full upgrade workflow until it is routine, including draining a node with kubectl drain and bringing it back with kubectl uncordon.
RBAC and securing access
Role-Based Access Control (RBAC) governs who and what can do what in the cluster. The four objects to know cold are Roles and ClusterRoles (which define a set of permissions, namespaced or cluster-wide) and RoleBindings and ClusterRoleBindings (which grant those permissions to a user, group or ServiceAccount). You should be able to create a Role that allows specific verbs on specific resources and bind it to a ServiceAccount, then verify the result with kubectl auth can-i. As a teaching example to anchor the relationship: a Role is the list of what is allowed, and a RoleBinding is the act of handing that list to someone, so a permission that “does not work” is almost always a binding that points at the wrong subject or a Role scoped to the wrong namespace. That diagnostic instinct is what the exam rewards.
etcd backup and restore
Because etcd holds the entire cluster state, backing it up and restoring it is a high-value, very examinable skill, and one with an exact procedure. You should be able to take a snapshot of etcd with etcdctl snapshot save, providing the right endpoint and certificate flags, and restore from a snapshot with etcdctl snapshot restore. The certificate and endpoint flags are the part people fumble under pressure, so practise the full command until you can produce it without hunting through the docs. This is the kind of task that is either right or wrong with little partial credit, which makes drilling it directly worthwhile.
Chapter 3: Workloads and Scheduling (15%)
This domain covers running applications on the cluster: deploying them, updating them safely, scaling them, configuring them, and controlling where they run. Although it is a smaller slice at 15%, the objects here, especially Deployments, appear constantly across other domains and in troubleshooting, so fluency pays back well beyond its weight.
Deployments and rolling updates
A Deployment manages a set of identical pods through a ReplicaSet and gives you safe, declarative updates. The key behaviour to understand is the rolling update: when you change a Deployment, Kubernetes gradually replaces old pods with new ones so the application stays available, and if something goes wrong you can roll back to the previous version with kubectl rollout undo. You should be able to create a Deployment, scale it, update its image, watch the rollout with kubectl rollout status, and roll it back. Generate the Deployment imperatively rather than writing the YAML by hand, because under time that is far faster, a point this guide returns to in the kubectl chapter.
Configuration with ConfigMaps and Secrets
Applications need configuration, and Kubernetes separates it from the container image so the same image can run in different environments. A ConfigMap holds non-sensitive configuration as key-value pairs, and a Secret holds sensitive data such as passwords and tokens (base64-encoded, and treated with more care). You should be able to create both and inject them into a pod either as environment variables or as mounted files, and understand the difference between the two delivery methods. As a teaching example of why the delivery method matters: a value injected as an environment variable is fixed when the container starts, whereas a value mounted from a ConfigMap as a file can update in the running pod, which is the kind of distinction a precisely worded task can hinge on.
Scaling, scheduling and resource limits
You should be comfortable scaling workloads manually and understand how the scheduler places pods. Scheduling controls can include node selectors and node affinity (steering pods toward particular nodes), taints and tolerations (letting nodes repel pods unless the pod explicitly tolerates the taint), and the effect of resource requests and limits. A request is what a pod is guaranteed and what the scheduler uses to find a node with room; a limit is the ceiling it cannot exceed. Understanding requests versus limits is essential both here and in troubleshooting, because a pod stuck in Pending is very often a scheduling or resource problem, and a pod being killed is very often a limit problem. Practise tainting a node and scheduling a pod that tolerates it, because the relationship between taints and tolerations is a common source of confusion that the lab quickly resolves.
Chapter 4: Services and Networking (20%)
This domain covers how pods communicate, how workloads are exposed, how traffic is routed in, and how you secure it. It is a substantial slice at 20%, and the concepts trip people up more than the syntax does, so the goal is a clear model of how a packet reaches a pod.
The Service abstraction
Pods are ephemeral and their IP addresses change, so you almost never talk to a pod directly. A Service provides a stable address and load-balances across a set of pods selected by labels. Know the main types: ClusterIP (the default, reachable only inside the cluster), NodePort (exposes the Service on a port of every node, reachable from outside), and LoadBalancer (provisions an external load balancer in supported environments). The mechanism that connects a Service to its pods is the label selector plus the resulting endpoints, and this is where most Service problems live. As a teaching example of the most common failure: a Service with no endpoints is almost always a selector that does not match the pods’ labels, so when a Service does not work, your first move is to check whether its endpoints are populated, which immediately tells you whether the problem is the selector or something downstream.
DNS, Ingress and NetworkPolicies
CoreDNS provides cluster DNS, so a Service is reachable by a predictable name rather than an IP, and understanding this naming is essential for debugging connectivity. Ingress manages external HTTP and HTTPS access to Services through a single entry point with host- and path-based routing rules, fronted by an ingress controller; you should be able to create an Ingress resource that routes to a Service. NetworkPolicies control which pods may communicate with which, acting as a firewall at the pod level. The crucial concept is the default: in a namespace with no policies, all pod traffic is allowed, but as soon as a pod is selected by any NetworkPolicy, everything not explicitly permitted is denied for that pod. Understanding this default-allow-then-default-deny shift is exactly what NetworkPolicy tasks test, so practise writing a policy that permits only specific ingress and confirm both the allowed and the blocked paths.
How to study this domain
Build a small application in your cluster and wire it up end to end: deploy pods, put a ClusterIP Service in front of them, confirm DNS resolution from another pod, expose it through an Ingress, then lock it down with a NetworkPolicy and verify that only the intended traffic gets through. Networking rewards seeing the whole path work and then deliberately breaking one link to learn how the failure presents, which is the exact diagnostic skill the troubleshooting domain will demand.
Chapter 5: Storage (10%)
Storage is the smallest domain at 10%, but its concepts are clean and its tasks are concrete, so it is efficient marks if you do not skip it. The theme is giving pods durable storage that outlives the pod itself.
Volumes, persistent volumes and claims
A container’s filesystem is ephemeral, so for data that must survive a restart you use volumes. The durable model has two halves that you must keep straight: a PersistentVolume (PV) is a piece of storage in the cluster (provisioned by an administrator or dynamically), and a PersistentVolumeClaim (PVC) is a request for storage made by a user, which Kubernetes binds to a suitable PV. A pod then mounts the claim. The separation matters because it decouples the person who needs storage from the details of where it comes from. As a teaching example of the relationship: a PVC is like a request for “10 gigabytes of fast storage”, and the PV is the actual storage that gets matched to it, so a PVC stuck Pending usually means no PV satisfies the request, which is the first thing to check.
StorageClasses, access modes and reclaim policies
A StorageClass enables dynamic provisioning, automatically creating a PV when a PVC asks for that class, which is how most modern clusters provide storage on demand. You should also understand access modes (such as ReadWriteOnce, which allows one node to mount the volume read-write, versus ReadWriteMany, which allows many) and reclaim policies (whether a PV is Retained or Deleted when its claim is removed), because tasks often specify these and they change the behaviour materially. Practise creating a PVC that uses a StorageClass, mounting it into a pod, and confirming the data persists across a pod restart, since that full cycle covers the bulk of what this domain asks.
Chapter 6: Troubleshooting (30%), the heaviest domain
Troubleshooting is the single largest domain on the CKA, and it is the one that most rewards a strong mental model over memorised commands, because every problem is a little different. The skill being tested is methodical diagnosis: given something that does not work, find the layer that is failing and fix it, quickly.
A method, not a memorised fix
The reliable approach is to work outward from the symptom using the cluster model from Chapter 2. Start with kubectl get to see the state of the relevant objects, then kubectl describe to read the events and conditions that explain why an object is in that state, then kubectl logs to read what a container is actually saying. For node-level and control-plane problems you also drop to the host: checking the kubelet’s status and its logs through the system journal, and inspecting the static pod manifests that run the control plane components. The discipline is to confirm each layer before moving deeper rather than guessing, because guessing wastes the time this exam punishes.
The common failure patterns to drill
A handful of patterns cover most troubleshooting tasks, and you should be able to recognise each instantly. A pod stuck in Pending is usually a scheduling problem: no node has the resources it requests, or a taint is repelling it, or a PVC it needs is unbound. A pod in CrashLoopBackOff is starting and failing repeatedly, so the answer is in its logs and often a bad configuration or a failing command. A pod in ImagePullBackOff cannot fetch its image, so the image name, tag or registry access is wrong. A NotReady node points at the kubelet or the container runtime on that node. A Service that does not respond points at its endpoints and therefore its selector, as Chapter 4 described. As a teaching example of the method in action: faced with a pod that will not start, you would describe it to read the events (which might reveal an image pull error), and only if the events are clean would you move to logs, rather than jumping straight to editing YAML, because the events usually name the problem for you.
Monitoring and logs
The domain also covers reading cluster and application telemetry: using kubectl top to see resource usage where metrics are available, reading container logs including previous-instance logs for a crashed pod with the appropriate flag, and checking cluster events. Practise these until they are reflexive, because in a timed exam the speed at which you can surface the relevant log or event is the speed at which you can solve the task. Weight your overall study toward this domain, and deliberately break things in your practice cluster, then fix them, because manufactured failures are the best possible preparation for the heaviest part of the exam.
Chapter 7: Getting fast with kubectl, the hidden exam skill
Speed is not a bonus on the CKA; it is a graded skill, because the clock is the main adversary. This chapter is about the habits that turn correct work into fast correct work, and it deserves as much practice as any single domain.
Imperative commands and generators
The biggest time saver is to stop writing YAML from scratch. kubectl can generate most objects for you: creating a pod, deployment, service, configmap or secret imperatively is faster than typing a manifest, and where you do need YAML, the —dry-run=client -o yaml flags let kubectl scaffold a correct manifest that you then edit rather than author. Learn to create an object imperatively, redirect the generated YAML to a file, make the one change the task needs, and apply it. This pattern alone reclaims minutes across a two-hour exam, which is often the difference between finishing and not.
Aliases, autocompletion and knowing the docs
Set up the small efficiencies at the start of the exam: alias kubectl to k, enable shell autocompletion, and set an environment variable for the default editor if you have a preference. Learn a handful of patterns cold, kubectl explain for the structure of a resource, label selectors for filtering kubectl get output, and the -n and —all-namespaces flags so you never operate in the wrong namespace. Finally, rather than memorising every manifest, learn where in the kubernetes.io documentation the canonical examples live, because the exam allows the docs and a few seconds to copy a known-good example beats minutes spent recalling exact syntax. The goal is to need the docs only for details, never to learn.
Chapter 8: Study plan, mock exams and exam day
With the domains and the kubectl habits understood, the remaining work is pacing them so that troubleshooting and timed practice, the two things that decide results, are never squeezed out. The plan is built around living in a real cluster from day one.
Set up a real cluster first
Before studying any domain, build a cluster you can break freely. A multi-node cluster built with kubeadm is ideal because it matches the exam, but a local cluster with minikube or kind is fine for most workload, configuration and networking practice and costs nothing. The non-negotiable point is that everything you learn in this course should be typed into a real cluster, not read, because the exam scores outcomes, not understanding. Pair the cluster with the CNCF curriculum so the official objectives, not a random course, define your scope, and match your practice cluster to the current exam Kubernetes version.
Choose a timeline weighted toward troubleshooting
If you already work with Kubernetes, a three-week intensive at around fifteen hours a week is realistic: drill all five domains hands-on with troubleshooting threaded throughout, then move to timed practice. A more common pace is a six-week plan at about eight hours a week, taking one domain area at a time in a real cluster, practising failures continuously, and finishing with mocks. If you are newer to Kubernetes and Linux, an eight-week plan at around five hours a week spends the early weeks building container and command-line fundamentals before the exam domains. Whichever you choose, weight your hours toward Troubleshooting and the two architecture and networking domains, while keeping storage and the smaller workload tasks genuinely practised, and treat kubectl speed as its own recurring drill. To turn a timeline into dated weeks for your own start date, use the free study-plan generator.
Mock exams: train pacing and stamina
In the final stretch, sit full-length, timed practice exams that mirror the hands-on format, because the format is what trips people up and the only cure is rehearsal. Each timed session does two things: it builds the stamina the real two hours demand, and it surfaces the specific tasks you are slow on so you can drill them. Aim to finish comfortably within time on fresh task sets before you book, and after each mock, review not only what you got wrong but what you got right slowly, because slowness is the failure mode this exam specialises in. If you are deciding between the administrator and developer tracks before committing, the CKA vs CKAD comparison covers who each is for and how they overlap.
Exam day and format
On the day, the exam is two hours of performance-based tasks in a live terminal across multiple clusters, online-proctored, and you may consult the official Kubernetes documentation. Manage the clock deliberately. Set up your alias and autocompletion in the first minute, read each task for exactly what it asks and which cluster and namespace it applies to, and confirm you are operating in the right context before you act, because a perfect solution applied to the wrong cluster scores nothing. If a task resists you, note it and move on rather than letting it consume time other tasks need, since you only need 66% and unattempted easy tasks cost more than one hard task left unfinished. Verify your work where you can with a quick kubectl get or describe, because the exam scores whether things actually work. Having lived in a cluster for weeks and trained for speed, the format will feel like ordinary operational work under a clock, which is precisely the advantage your hands-on preparation was built to give you.