使用kubeadm安装kubernetes 1.15版本

准备

1.1 系统配置

准备两台机器,关闭防火墙(这里列举的node1为192.168.110.130,node2为131),

1
2
3
cat /etc/hosts
192.168.110.130 node1
192.168.110.131 node2

systemctl stop firewalld 或按照https://kubernetes.io/docs/setup/independent/install-kubeadm/说明,开放端口。

关闭selinux,然后reboot重启

1
2
3
setenforce 0                     ---- 0关闭,1启用
vi /etc/selinux/config
SELINUX=disabled

创建/etc/sysctl.d/k8s.conf文件,添加如下内容:

1
2
3
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

执行命令使修改生效。

1
2
modprobe br_netfilter             ----该指令用于向内核中加载/移除模块
sysctl -p /etc/sysctl.d/k8s.conf ----从指定文件加载系统参数

1.2 kube-proxy开启ipvs的前置条件

kube-proxy是kubernetes重要的组件,它的作用是虚拟出一个VIP,保证VIP无论后台服务(pod,Endpoint)如何变更都保持不变,起到一个负载均衡的功能
kube-proxty有三种模式ipvs、userspace、iptables三种,这里安装时使用ipvs模式(IP VirtualServer),所以需要为它加载以下的内核模块:

1
2
3
4
5
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4

在所涉及的kubernetes节点(这里仅有node1,node2),执行以下脚本

1
2
3
4
5
6
7
8
9
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

上述脚本创建了/etc/sysconfig/modules/ipvs.modules文件,保证在节点重启后能自动加载所需模块。 使用 lsmod | grep -e ip_vs -e nf_conntrack_ipv4命令,可以查看是否已经正确加载所需的内核模块。
另外,还需要确保各节点已经安装了ipset软件包(一般情况系统都已经自带了) yum install ipset,为了方便查看ipvs的代理规则,最好也安装一下管理工具ipvsadm,yum install ipvsadm。
如果以上前提条件不满足,则即使kube-proxy的配置开启了ipvs模式,也会退回到iptables模式。

1.3 安装docker

安装docker的yum源:(已安装过且版本匹配的可以跳过)

1
2
3
4
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

查看最新的Docker版本:

1
2
3
4
5
6
7
8
9
10
11
12
yum list docker-ce.x86_64  --showduplicates |sort -r

docker-ce.x86_64 3:18.09.7-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.7-3.el7 @docker-ce-stable
docker-ce.x86_64 3:18.09.6-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.5-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.4-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.3-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.2-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.1-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.0-3.el7 docker-ce-stable
docker-ce.x86_64 18.06.3.ce-3.el7 docker-ce-stable

Kubernetes 1.15当前支持的docker版本列表是1.13.1, 17.03, 17.06, 17.09, 18.06, 18.09。 这里在各节点安装docker的18.09.7版本。

1
2
3
4
5
6
7
yum makecache fast

yum install -y --setopt=obsoletes=0 \
docker-ce-18.09.7-3.el7

systemctl start docker
systemctl enable docker

确认一下iptables filter表中FOWARD链的默认策略(pllicy)为ACCEPT。

1
2
3
4
5
6
7
8
9
10
11
12
13
iptables -nvL

Chain INPUT (policy ACCEPT 263 packets, 19209 bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0
0 0 DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0

如果有不匹配的,请使用 iptables -P FORWARD ACCEPT 修改过来

1.4 修改docker cgroup driver为systemd

根据文档CRI installation中的内容,对于使用systemd作为init system的Linux的发行版,使用systemd作为docker的cgroup driver可以确保服务器节点在资源紧张的情况更加稳定,因此这里修改各个节点上docker的cgroup driver为systemd。

创建或修改/etc/docker/daemon.json(这里很重要,错了就有坑)

1
2
3
4
5
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

重启docker并设置开机启动(没有设置开机启动初始化时会失败)

1
2
3
4
systemctl restart docker

docker info | grep Cgroup
Cgroup Driver: systemd

2 使用kubeadm部署kubernetes

2.1 安装kubeadm和kubelet

下面在各节点安装kubeadm和kubelet,这里涉及访问google,需要科学上网。(我试着找过阿里云镜像地址,但是并不能用……..)

1
2
3
4
5
6
7
8
9
10
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

执行如下指令

1
2
3
4
5
6
7
8
9
10
11
yum makecache fast
yum install -y kubelet kubeadm kubectl

...
已安装:
kubeadm.x86_64 0:1.15.0-0 kubectl.x86_64 0:1.15.0-0 kubelet.x86_64 0:1.15.0-0

作为依赖被安装:
cri-tools.x86_64 0:1.13.0-0 kubernetes-cni.x86_64 0:0.7.5-0

完毕!

从安装结果可以看出还安装了cri-tools, kubernetes-cni的依赖

Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。 关闭系统的Swap方法如下:

1
swapoff -a

修改 /etc/fstab 文件,注释掉 SWAP 的自动挂载,使用free -m确认swap已经关闭。 swappiness参数调整,修改/etc/sysctl.d/k8s.conf添加下面一行:

1
vm.swappiness=0

执行sysctl -p /etc/sysctl.d/k8s.conf使修改生效
因为这里本次用于测试两台主机上还运行其他服务,关闭swap可能会对其他服务产生影响,所以这里修改kubelet的配置去掉这个限制。 使用kubelet的启动参数–fail-swap-on=false去掉必须关闭Swap的限制,修改/etc/sysconfig/kubelet,加入:

1
KUBELET_EXTRA_ARGS=--fail-swap-on=false

2.2 使用kubeadm init初始化集群

在各节点开机启动kubelet服务:

1
systemctl enable kubelet.service

使用kubeadm config print init-defaults可以打印集群初始化默认的使用的配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: localhost.localdomain
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}

从默认的配置中可以看到,可以使用imageRepository定制在集群初始化时拉取k8s所需镜像的地址。基于默认配置定制出本次使用kubeadm初始化集群所需的配置文件kubeadm.yaml(新建,任意位置下都行,advertiseAddress改成你的ip即可):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.110.128
bindPort: 6443
nodeRegistration:
taints:
- effect: PreferNoSchedule
key: node-role.kubernetes.io/master
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.15.0
networking:
podSubnet: 10.244.0.0/16

在开始初始化集群之前可以使用kubeadm config images pull预先在各个节点上拉取所k8s需要的docker镜像。

接下来使用kubeadm初始化集群,选择node1作为Master Node,在node1上执行下面的命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
kubeadm init --config kubeadm.yaml --ignore-preflight-errors=Swap,NumCPU


...
[init] Using Kubernetes version: v1.15.0
[preflight] Running pre-flight checks
[WARNING NumCPU]: the number of available CPUs 1 is less than the required 2
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [192.168.110.128 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [192.168.110.128 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [localhost.localdomain kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.110.128]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 22.501928 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.15" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:PreferNoSchedule]
[bootstrap-token] Using token: fbd9w8.6j1yjer52w2po3q4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy


Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.110.130:6443 --token luxlja.y8td78jdh7immmfd \
--discovery-token-ca-cert-hash sha256:ab640c9f8a8497c23bb91454e4734a6642aaa3c6b622c0158a94a8b5ab29bb85

运行该指令时,kubernetes会检查环境是否符合要求,诸如cpu核心数不能小于2,我这里的虚拟机只分配了一个核心,所以–ignore-preflight-errors中加上了NumCPU忽略掉了,否则会启动不成功。
上面记录了完成的初始化输出的内容,根据输出的内容基本上可以看出手动初始化安装一个Kubernetes集群所需要的关键步骤。 其中有以下关键内容:

  • [kubelet-start] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”
  • [certs]生成相关的各种证书
  • [kubeconfig]生成相关的kubeconfig文件
  • [control-plane]使用/etc/kubernetes/manifests目录中的yaml文件创建apiserver、controller-manager、scheduler的静态pod
  • [bootstraptoken]生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
  • 下面的命令是配置常规用户如何使用kubectl访问集群:(这里必须执行,master和worker都需要)

    1
    2
    3
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • 最后给出了将节点加入集群的命令

    1
    2
    kubeadm join 192.168.110.130:6443 --token luxlja.y8td78jdh7immmfd \
    --discovery-token-ca-cert-hash sha256:ab640c9f8a8497c23bb91454e4734a6642aaa3c6b622c0158a94a8b5ab29bb85

查看一下集群状态,确认个组件都处于healthy状态:

1
2
3
4
5
6
7
kubectl get cs

...
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}

集群初始化如果遇到问题,可以使用下面的命令进行清理:(没问题的直接跳过)

1
2
3
4
5
6
kubeadm reset
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/

2.3 安装Pod Network

接下来安装flannel network add-on:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mkdir -p ~/k8s/
cd ~/k8s
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml

...
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created

如果Node有多个网卡的话,需要修改kube-flannel.yml,为flanneld启动参数加上 –-iface=<iface-name>
(kubectl apply -f kube-flannel.yml对应安装flannel,同样的,如果安装出现问题需要重装,请使用kubectl delete -f kube-flannel.yml删除后再尝试。)

1
2
3
4
5
6
7
8
9
10
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=ens33
......

检查master状态为Ready

1
2
3
4
5
kubectl get nodes

...
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready master 75m v1.15.0

使用kubectl get pod –all-namespaces -o wide确保所有的Pod都处于Running状态。
(如果有其它状态的,请检查selinux关闭后需要重启)

1
2
3
4
5
6
7
8
9
10
11
12
kubectl get pod -n kube-system

...
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-f9jrh 1/1 Running 0 79m
coredns-5c98db65d4-w6cpx 1/1 Running 0 79m
etcd-localhost.localdomain 1/1 Running 1 78m
kube-apiserver-localhost.localdomain 1/1 Running 1 78m
kube-controller-manager-localhost.localdomain 1/1 Running 1 78m
kube-flannel-ds-amd64-dqv67 1/1 Running 0 20m
kube-proxy-c7cwv 1/1 Running 1 79m
kube-scheduler-localhost.localdomain 1/1 Running 1 78m

2.4 测试集群DNS是否可用

1
2
3
4
5
6
kubectl run curl --image=radial/busyboxplus:curl -it

...
kubectl run --generator=deployment/apps.v1beta1 is DEPRECATED and will be removed in a future version. Use kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-5cc7b478b6-r997p:/ ]$

上述命令就是进入容器的意思,如果命令卡住了,尝试docker exec 该镜像也可以看到效果。进入容器后执行nslookup kubernetes.default确认解析正常:

1
2
3
4
5
6
7
8
nslookup kubernetes.default

...
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

2.5 向Kubernetes集群中添加Node节点

下面将node2这个主机添加到Kubernetes集群中(node2不需要执行kubeadm init),在node2上执行:

1
2
kubeadm join 192.168.110.130:6443 --token luxlja.y8td78jdh7immmfd \
--discovery-token-ca-cert-hash sha256:ab640c9f8a8497c23bb91454e4734a6642aaa3c6b622c0158a94a8b5ab29bb85

随后在node1节点上运行kubectl get nodes即可:

1
2
3
4
5
6
kubectl get nodes

...
NAME STATUS ROLES AGE VERSION
node1 Ready master 4m36s v1.15.0
node2 Ready <none> 69s v1.15.0

2.6 kube-proxy开启ipvs

1
kubectl edit cm kube-proxy -n kube-system

找到如下部分的配置内容

1
2
3
4
5
6
7
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: "ipvs" #默认是空的,修改为ipvs
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy

运行如下指令,会将之前的kube-proxy pod删除然后重新创建:

1
2
3
4
5
kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'

...
pod "kube-proxy-8xv5x" deleted
pod "kube-proxy-skcbw" deleted

查看新创建的kube-proxy:

1
2
3
4
5
kubectl get pod -n kube-system | grep kube-proxy

...
kube-proxy-7fsrg 1/1 Running 0 3s
kube-proxy-k8vhm 1/1 Running 0 9s

1
2
3
4
5
6
7
8
9
10
11
12
13
kubectl logs kube-proxy-7fsrg  -n kube-system

...
I0703 04:42:33.308289 1 server_others.go:170] Using ipvs Proxier.
W0703 04:42:33.309074 1 proxier.go:401] IPVS scheduler not specified, use rr by default
I0703 04:42:33.309831 1 server.go:534] Version: v1.15.0
I0703 04:42:33.320088 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0703 04:42:33.320365 1 config.go:96] Starting endpoints config controller
I0703 04:42:33.320393 1 controller_utils.go:1029] Waiting for caches to sync for endpoints config controller
I0703 04:42:33.320455 1 config.go:187] Starting service config controller
I0703 04:42:33.320470 1 controller_utils.go:1029] Waiting for caches to sync for service config controller
I0703 04:42:33.420899 1 controller_utils.go:1036] Caches are synced for endpoints config controller
I0703 04:42:33.420969 1 controller_utils.go:1036] Caches are synced for service config controller

日志中打印出了Using ipvs Proxier,说明ipvs模式已经开启。

3.移除节点

如果需要从集群中移除node2这个Node执行下面的命令:

在master节点上执行:

1
2
kubectl drain node2 --delete-local-data --force --ignore-daemonsets
kubectl delete node node2

在node2上执行:

1
2
3
4
5
6
kubeadm reset
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/

在node1上执行:

1
kubectl delete node node2
0%