Kubernetes: Up and Running 2e
Ch3 Deploying a Kubernetes Cluster
The Kubernetes Client
官方提供的用戶端工具kubectl,一種用在與Kubernetes API互動的命令列工具。
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.15", GitCommit:"2adc8d7091e89b6e3ca8d048140618ec89b39369", GitTreeState:"clean", BuildDate:"2020-09-02T11:40:00Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.15", GitCommit:"2adc8d7091e89b6e3ca8d048140618ec89b39369", GitTreeState:"clean", BuildDate:"2020-09-02T11:31:21Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
會看到2種版本,一個是本機kubectl版本,一個是Kubernetes API伺服器版本。版本可以相容1個minor version差異。 e.g. api server是1.20 kubectl可以是1.21, 1.20, 1.19。
Cluster Components
Kubernetes proxy負責將網路流量路由至叢集內的LoadBalancer,存在於每個node。Kubernetes有一個名為DaemonSet的API物件,來達到此目的。
The Kubernetes proxy is responsible for routing network traffic to load-balancedservices in the Kubernetes cluster. To do its job, the proxy must be present on everynode in the cluster. Kubernetes has an API object named DaemonSet, which you will learn about later in the book, that is used in many clusters to accomplish this.
$ k get daemonSets --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system kube-flannel-ds 1 1 1 1 1 <none> 19h
kube-system kube-proxy 1 1 1 1 1 beta.kubernetes.io/os=linux 19h
Kubernetes DNS作為副本式服務運行在叢集中。根據叢集大小,可能會看到一個或多個DNS伺服器運行。
Kubernetes also runs a DNS server, which provides naming and discovery for theservices that are defined in the cluster. This DNS server also runs as a replicated ser‐vice on the cluster. Depending on the size of your cluster, you may see one or more DNS servers running in your cluster.
DNS以Deployment運行,它會管理所有DNS的replica。
$ k get deployments --namespace=kube-system coredns
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 2/2 2 2 19h
還有一個Service,它為DNS提供LoadBalance。
$ kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 19h
Kubernetes UI 只有一個replica,但是為了可靠性和方便升級,還是透過Kubernetes deployment管理。
Ch4 Common kubectl Commands
Viewing Kubernetes API Objects
K8s內任何東西都由RESTful resource代表。我們稱這些resource為kubernetes object。每個object都有唯一的HTTP路徑。
而kubectl透過建立HTTP request存取Kubernetes object。
Everything contained in Kubernetes is represented by a RESTful resource. Through‐out this book, we refer to these resources as Kubernetes objects. Each Kubernetes object exists at a unique HTTP path; for example, https://your-k8s.com/api/v1/name‐spaces/default/pods/my-pod leads to the representation of a Pod in the default name‐space named my-pod. The kubectl command makes HTTP requests to these URLs to access the Kubernetes objects that reside at these paths.
Debugging Commands
exec: 在容器內執行命令
$ kubectl exec -it <pod-name> -- bash
attach: 如果容器沒有bash或terminal,仍可以操作容器:
$ kubectl attach -it <pod-name>
NOTE:
如果port-forward對象是Kubernetes service,也只會將request轉發到service內的單一Pod,並不會通過service load balancer。
You can also use the port-forward command with services by specifying services/
instead of , but note that if you do port-forward to a service, the requests will only ever be forwarded to a single Pod in that service. They will not go through the service load balancer.
Ch5 Pods
Pods in Kubernetes
Each container within a Pod runs in its own cgroup, but they share a number of Linux namespaces.
Applications running in the same Pod share the same IP address and port space (network namespace), have the same hostname (UTS namespace), and can communicate using native interprocess communication channels over System V IPC or POSIX message queues (IPC namespace).
Thinking with Pods
心得:
- 拆分Container依據資源是否隔離判斷
- 拆分Pod依據scale strategy判斷
- 拆分Pod依據Container之間是否一定要在同一台機器上才能溝通
Running Pods
Deleting a Pod
無法直接刪除Pod原因是grace period預設30秒,會等待Pod把目前的request處理完。
When a Pod is deleted, it is not immediately killed. Instead, if you run kubectl get pods you will see that the Pod is in the Terminating state. All Pods have a termination grace period. By default, this is 30 seconds. When a Pod is transitioned to Terminating it no longer receives new requests. In a serving scenario, the grace period is important for reliability because it allows the Pod to finish any active requests that it may be in the middle of processing before it is terminated.
It’s important to note that when you delete a Pod, any data stored in the containers associated with that Pod will be deleted as well. If you want to persist data across multiple instances of a Pod, you need to use PersistentVolumes, described at the end of this chapter.
Ch7 Service Discovery
What Is Service Discovery?
DNS是傳統的Service Discovery System,但並不適用於K8s的動態環境。 一般cache機制導致在K8s和錯誤的IP溝通,並很難及時修正。
The Domain Name System (DNS) is the traditional system of service discovery on the internet. DNS is designed for relatively stable name resolution with wide and efficient caching. It is a great system for the internet but falls short in the dynamic world of Kubernetes.
The Service Object
kubectl run: 建立K8s deployment kubectl expose: 對資源建立service
Just as the kubectl run command is an easy way to create a Kubernetes deployment, we can use kubectl expose to create a service.
Service會被指定一個ClusterIP,系統會透過selector進行load-balance。
要存取service,可以透過port-forward存取其中一個Pod。
that service is assigned a new type of virtual IP called a cluster IP. This is a special IP address the system will load-balance across all of the Pods that are identified by the selector.
Service DNS
Advanced Details
Endpoints
NOTE Service拿到的IP和Pod拿到的IP屬於不同網段!而Endpoint指向的是Pod IP
Service是10.110開頭的IP
$ k get -n kubeflow svc/centraldashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
centraldashboard ClusterIP 10.110.103.203 <none> 80/TCP 23h
Pod拿到的是10.244開頭的IP
$ k describe -n kubeflow endpoints/centraldashboard
Name: centraldashboard
Namespace: kubeflow
Labels: app=centraldashboard
app.kubernetes.io/component=centraldashboard
app.kubernetes.io/instance=centraldashboard-v1.0.0
app.kubernetes.io/managed-by=kfctl
app.kubernetes.io/name=centraldashboard
app.kubernetes.io/part-of=kubeflow
app.kubernetes.io/version=v1.0.0
kustomize.component=centraldashboard
Annotations: <none>
Subsets:
Addresses: 10.244.0.29
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 8082 TCP
Events: <none>
Manual Service Discovery
K8s Service是基於label selector建置。所以可以不透過Service物件手動作服務探索。
Kubernetes services are built on top of label selectors over Pods. That means that you can use the Kubernetes API to do rudimentary service discovery without using a Ser‐vice object at all! Let’s demonstrate.
$ k get po -n kube-system -o wide --selector=k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5644d7b6d9-9bh5g 1/1 Running 0 24h 10.244.0.3 tom-g75jw <none> <none>
coredns-5644d7b6d9-v5ll9 1/1 Running 0 24h 10.244.0.2 tom-g75jw <none> <none>
kube-proxy and Cluster IPs
Cluster IPs are stable virtual IPs that load-balance traffic across all of the endpoints in a service. This magic is performed by a component running on every node in the cluster called the kube-proxy.
kube-proxy透過API server得知新的Service。並改寫所在的node的iptables rules來導向封包至endpoint。
In Figure 7-1, the kube-proxy watches for new services in the cluster via the APIserver. It then programs a set of iptables rules in the kernel of that host to rewritethe destinations of packets so they are directed at one of the endpoints for that service.
ClusterIP通常由APIserver指定。但是使用者可以等Service啟動後自行指定,但之後刪除和重啟Service後,IP不能再被改變。
The cluster IP itself is usually assigned by the API server as the service is created.However, when creating the service, the user can specify a specific cluster IP. Onceset, the cluster IP cannot be modified without deleting and recreating the Service object.
NOTE:
注意ClusterIP range和submask不要和Docker bridge, K8s node重疊。
The Kubernetes service address range is configured using the --service-cluster-ip-range flag on the kube-apiserver binary. The service address range should not overlap with the IP subnets and ranges assigned to each Docker bridge or Kubernetes node.
In addition, any explicit cluster IP requested must come from that range and not already be in use.
Cluster IP Environment Variables
除了DNS還有老方法可以找尋clusterIP: 設定環境變數。
While most users should be using the DNS services to find cluster IPs, there are someolder mechanisms that may still be in use. One of these is injecting a set of environ‐ment variables into Pods as they start up.
Service必須提前Pod啟動,也不適用於大型應用,所以作者還是建議DNS才是較好的選擇。
A problem with the environment variable approach is that it requires resources to becreated in a specific order. The services must be created before the Pods that referencethem. This can introduce quite a bit of complexity when deploying a set of services that make up a larger application. In addition, using just environment variables seems strange to many users. For this reason, DNS is probably a better option.
Ch8 HTTP Load Balancing with Ingress
Service物件提供的對外存取功能只在OSI Layer4,也就是處理TCP,UPD但無法更進階地解析。
As described in Chapter 7, Kubernetes has a set of capabilities to enable services to be exposed outside of the cluster. For many users and simple use cases these capabilities are sufficient. But the Service object operates at Layer 4 (according to the OSI model1). This means that it only forwards TCP and UDP connections and doesn’t look inside of those connections.
HTTP-based(OSI Layer7)提供的服務可以比Service type: NodePort,LoadBalancer做得更好。
Because of this, hosting many applications on a cluster uses many different exposed services. In the case where these services are type: NodePort, you’ll have to have clients connect to a unique port per service. In the case where these services are type: LoadBalancer, you’ll be allocating (often expensive or scarce) cloud resources for each service. But for HTTP (Layer 7)-based services, we can do better.
在傳統非K8s的解決方法是透過virtual hosting解決。透過load balancer或reverse proxy將連進來的HTTP/HTTPS進行header,url解析,並連至對應的upstream server。
When solving a similar problem in non-Kubernetes situations, users often turn to the idea of “virtual hosting.” This is a mechanism to host many HTTP sites on a single IP address. Typically, the user uses a load balancer or reverse proxy to accept incoming connections on HTTP (80) and HTTPS (443) ports. That program then parses the HTTP connection and, based on the Host header and the URL path that is requested, proxies the HTTP call to some other program. In this way, that load balancer or reverse proxy plays “traffic cop” for decoding and directing incoming connections to the right “upstream” server.
在K8s,HTTP-based load-balancing system 就稱之為 Ingress !
Kubernetes calls its HTTP-based load-balancing system Ingress. Ingress is a Kubernetes-native way to implement the “virtual hosting” pattern we just discussed.
Ingress Spec Versus Ingress Controllers
Ch9 ReplicaSets [TBD]
官方建議用Deployment管理ReplicaSet。
Deployment is an object which can own ReplicaSets and update them and their Pods via declarative, server-side rolling updates. While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod creation, deletion and updates. When you use Deployments you don't have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets. As such, it is recommended to use Deployments when you want ReplicaSets.
https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#deployment-recommended
Ch10 Deployments
Deployment存在是為了管理新版本的發布。
The Deployment object exists to manage the release of new versions.
Rollout是由deployment controller進行。所以不會受限於用戶所處的網路環境影響deployment。
Using deployments you can simply and reliably roll out new software versions without downtime or errors. The actual mechanics of the software rollout performed by a deployment is controlled by a deployment controller that runs in the Kubernetes cluster itself. This means you can let a deployment proceed unattended and it will still operate correctly and safely
Deployment Internals
ReplicaSet管理Pod Deployment管理ReplicateSet。
ReplicaSets manage Pods, deployments manage ReplicaSets.
K8s內物件之間的關係是由Label和Label selector定義。
As with all relationships in Kubernetes, this relationship is defined by labels and a label selector.
Updating Deployments
Scaling a Deployment
增加ReplicaSet方法,除了直接kubectl scale
,最佳範例是透過YAML定義,再透過kubectl apply
更新。
spec:
replicas: 3
Updating a Container Image
TBD
Rollout History
TBD
Deployment Strategies
TBD
Ch11 DaemonSets
Generally, the motivation for replicating a Pod to every node is to land some sort of agent or daemon on each node, and the Kubernetes object for achieving this is the DaemonSet.
一樣是透過label,DaemonSet讓Pod只在特定nodes運作。
You can use labels to run DaemonSet Pods on specific nodes; for example, you maywant to run specialized intrusion-detection software on nodes that are exposed to theedge network.
Deleting a DaemonSet
刪除DaemonSet預設也會刪除管理的Pods,--cascade
可以避免Pod也被刪除。
Deleting a DaemonSet will also delete all the Pods being managed by that DaemonSet. Set the --cascade flag to false to ensure only the DaemonSet is deleted and not the Pods.
Ch12 Jobs
需要執行短暫或一次性的任務就會需要Job。
While long-running processes make up the large majority of work‐loads that run on a Kubernetes cluster, there is often a need to run short-lived, one-off tasks. The Job object is made for handling these types of tasks.
一個Job會建立多個Pod,直到執行成功(exit 0)才會結束。
A job creates Pods that run until successful termination (i.e., exit with 0).
Job Patterns
One Shot
One-shot jobs provide a way to run a single Pod once until successful termination.
並非只有回傳非0的exit code才代表出錯,也可能因為卡住所以無法繼續執行,這時需要liveness判斷Pod是否需要重啟或替換。
But workers can fail in other ways. Specifically, they can get stuck and not make any forward progress.To help cover this case, you can use liveness probes with jobs. If the liveness probe policy determines that a Pod is dead, it’ll be restarted/replaced for you.
Parallelism
案例:每個Pod負責生產10支密鑰,以加快生成共100支密鑰。
參數: completions: 完成次數 parallelism: 限制同時執行的Pod數量。
Work Queues
作者範例:
- 建立ReplicaSet,啟動Work queue
- 建立Service,把Work queue對cluter開放
- curl指令模擬Producer行為
- 建立Consumer Job,運行直到queue被清空才會結束
CronJob
定期建立Job
Sometimes you want to schedule a job to be run at a certain interval. To achieve this you can declare a CronJob in Kubernetes, which is responsible for creating a new Job object at a particular interval.
Ch13 ConfigMaps and Secrets
重點在於解決單一Image在不同階段甚至不同應用仍可以重複使用,而不用重建Image。
It is a good practice to make container images as reusable as possible. The sameimage should be able to be used for development, staging, and production. It is evenbetter if the same image is general-purpose enough to be used across applications andservices.
ConfigMap
在Pod啟動前會和ConfigMap資訊合併。也就是說透過修改ConfigMap,讓應用城市和服務重複使用image和Pod definition。
The key thing is that the ConfigMapis combined with the Pod right before it is run. This means that the container image and the Pod definition itself can be reused across many apps by just changing the ConfigMap that is used.
Creating ConfigMaps
只是建立key-value pair
Using a ConfigMap
- Filesystem 當作目錄掛載至Pod,Key作為檔案名稱,檔案內容為value。
- Environment variable 動態設定環境變數
- Command-line argument 動態作為命令列參數
Secret
NOTE
K8s預設用明文把敏感資訊存在etcd storage。任何有k8s admin權限的人就可以存取全部的secret。新版K8s開始支援雲端加密儲存(cloud key store),可以提高敏感資料安全性。
By default, Kubernetes secrets are stored in plain text in the etcd storage for the cluster. Depending on your requirements, this may not be sufficient security for you. In particular, anyone who has cluster administration rights in your cluster will be able to read all of the secrets in the cluster. In recent versions of Kubernetes, support has been added for encrypting the secrets with a user-supplied key, generally integrated into a cloud key store. Additionally, most cloud key stores have integration with Kubernetes flexible volumes,enabling you to skip Kubernetes secrets entirely and rely exclusively on the cloud provider’s key store. All of these options should provide you with sufficient tools to craft a security profile that suits your needs.
Creating Secrets
$ kubectl create secret generic kuard-tls \
--from-file=kuard.crt \
--from-file=kuard.key
$ kubectl describe secrets kuard-tls
Name: kuard-tls
Namespace: default
Labels: <none>
Annotations: <none>
Type: Opaque
Data
====
kuard.crt: 1050 bytes
kuard.key: 1679 bytes
Consuming Secrets
Secrets volumes
secrets volumes透過kubelet管理,並在Pod建立時產生。secrets是儲存在tmpfs volumes,所以不會直接寫到node disk。
Secret data can be exposed to Pods using the secrets volume type. Secrets volumes are managed by the kubelet and are created at Pod creation time. Secrets are stored on tmpfs volumes (aka RAM disks), and as such are not written to disk on nodes.
Private Docker Registries
K8s提供額外方式處理docker pull 的憑證問題。
$ k create secret --help
Create a secret using specified subcommand.
Available Commands:
docker-registry Create a secret for use with a Docker registry
generic Create a secret from a local file, directory or literal value
tls Create a TLS secret
Usage:
kubectl create secret [flags] [options]
kubectl create secret docker-registry my-image-pull-secret \
--docker-username=<username> \
--docker-password=<password> \
--docker-email=<email-address>
Naming Constraints
ConfigMap以UTF-8儲存,K8s1.6之後ConfigMap無法儲存binary。
Secret以base64加密儲存,所以可以儲存binary。
ConfigMap和Secret最大不得超過1MB。
ConfigMap data values are simple UTF-8 text specified directly in the manifest. As of Kubernetes 1.6, ConfigMaps are unable to store binary data.Secret data values hold arbitrary data encoded using base64. The use of base64 encoding makes it possible to store binary data. This does, however, make it more difficult to manage secrets that are stored in YAML files as the base64-encoded value must be put in the YAML. Note that the maximum size for a ConfigMap or secret is 1 MB.
Managing ConfigMaps and Secrets
更新過後會直接自動推送到正在使用該ConfigMap/Secret的磁碟區。但是目前K8s並沒有內建發送信號通知應用程式的功能。一切取決於應用程式實做方式來決定何時該套用新的設定檔。
Once a ConfigMap or secret is updated using the API, it’ll be automatically pushed to all volumes that use that ConfigMap or secret. It may take a few seconds, but the file listing and contents of the files, as seen by kuard, will be updated with these new values. Using this live update feature you can update the configuration of applications without restarting them.
Currently there is no built-in way to signal an application when a new version of a ConfigMap is deployed. It is up to the application (or some helper script) to look for the config files to change and reload them.
Ch14 Role-Based Access Control for Kubernetes
Role-Based Access Control
Identity in Kubernetes
Every request that comes to Kubernetes is associated with some identity. Even arequest with no identity is associated with the system:unauthenticated group.
Kubernetes supports a number of different authentication providers, including:
- HTTP Basic Authentication (largely deprecated)
- x509 client certificates
- Static token files on the host
- Cloud authentication providers
- Authentication webhooks
Understanding Roles and Role Bindings
A role is a set of abstract capabilities. For example, the appdev role might representthe ability to create Pods and services. A role binding is an assignment of a role to oneor more identities. Thus, binding the appdev role to the user identity alice indicates that Alice has the ability to create Pods and services.
Roles and Role Bindings in Kubernetes
作者範例:先建立Role,再建立RoleBindings
TBD