使用kubespray安装kubernetes
最后更新于
最后更新于
www.huweihuang.com
以下机器为虚拟机
机器IP | 主机名 | 角色 | 系统版本 | 备注 |
---|---|---|---|---|
管理机主要用来部署k8s集群,需要安装以下版本的软件,具体可参考:
https://github.com/kubernetes-incubator/kubespray#requirements
https://github.com/kubernetes-incubator/kubespray/blob/master/requirements.txt
ansible>=2.4.0
jinja2>=2.9.6
netaddr
pbr>=1.6
ansible-modules-hashivault>=3.9.4
hvac
1、安装及配置ansible
参考ansible的使用。
给部署机器配置SSH的免密登录权限,具体参考ssh免密登录。
2、安装python-netaddr
# 安装pip
yum -y install epel-release
yum -y install python-pip
# 安装python-netaddr
pip install netaddr
3、升级Jinja
# Jinja 2.9 (or newer)
pip install --upgrade jinja2
部署机器即用来运行k8s集群的机器,包括Master
和Node
。
1、确认系统版本
本文采用centos7
的系统,建议将系统内核升级到4.x.x
以上。
2、关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
iptables -F
3、关闭swap
Kubespary v2.5.0
的版本需要关闭swap,具体参考
https://github.com/kubernetes-incubator/kubespray/blob/02cd5418c22d51e40261775908d55bc562206023/roles/kubernetes/preinstall/tasks/verify-settings.yml#L75
- name: Stop if swap enabled
assert:
that: ansible_swaptotal_mb == 0
when: kubelet_fail_swap_on|default(true)
ignore_errors: "{{ ignore_assert_errors }}"
V2.6.0
版本去除了swap的检查,具体参考:
https://github.com/kubernetes-incubator/kubespray/commit/b902602d161f8c147f3d155d2ac5360244577127#diff-b92ae64dd18d34a96fbeb7f7e48a6a9b
执行关闭swap命令swapoff -a
。
[root@master ~]#swapoff -a
[root@master ~]#
[root@master ~]# free -m
total used free shared buff/cache available
Mem: 976 366 135 6 474 393
Swap: 0 0 0
# swap 一栏为0,表示已经关闭了swap
4、确认部署机器内存
由于本文采用虚拟机部署,内存可能存在不足的问题,因此将虚拟机内存调整为3G或以上;如果是物理机一般不会有内存不足的问题。具体参考:
https://github.com/kubernetes-incubator/kubespray/blob/95f1e4634a1c50fa77312d058a2b713353f4307e/roles/kubernetes/preinstall/tasks/verify-settings.yml#L52
- name: Stop if memory is too small for masters
assert:
that: ansible_memtotal_mb >= 1500
ignore_errors: "{{ ignore_assert_errors }}"
when: inventory_hostname in groups['kube-master']
- name: Stop if memory is too small for nodes
assert:
that: ansible_memtotal_mb >= 1024
ignore_errors: "{{ ignore_assert_errors }}"
when: inventory_hostname in groups['kube-node']
Docker
版本为17.03.2-ce
。
1、Master节点
2、Node节点
3、说明
镜像被墙并且全部镜像下载需要较多时间,建议提前下载到部署机器上。
hyperkube镜像主要用来运行k8s核心组件(例如kube-apiserver等)。
此处使用的网络组件为calico。
git clone https://github.com/kubernetes-incubator/kubespray.git
hosts.ini
主要为部署节点机器信息的文件,路径为:kubespray/inventory/sample/hosts.ini
。
cd kubespray
# 复制一份配置进行修改
cp -rfp inventory/sample inventory/k8s
vi inventory/k8s/hosts.ini
例如:
hosts.ini文件可以填写部署机器的登录密码,也可以不填密码而设置ssh的免密登录。
# Configure 'ip' variable to bind kubernetes services on a
# different ip than the default iface
# 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
# configure a bastion host if your nodes are not directly reachable
# bastion ansible_ssh_host=x.x.x.x
[kube-master]
kube-master-0
[etcd]
kube-master-0
[kube-node]
kube-node-41
kube-node-42
[k8s-cluster:children]
kube-node
kube-master
[calico-rr]
k8s-cluster.yml
主要为k8s集群的配置文件,路径为:kubespray/inventory/k8s/group_vars/k8s-cluster.yml
。该文件可以修改安装的k8s集群的版本,参数为:kube_version: v1.9.5。具体可参考:
https://github.com/kubernetes-incubator/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster.yml#L22
涉及文件为cluster.yml
。
# 进入主目录
cd kubespray
# 执行部署命令
ansible-playbook -i inventory/k8s/hosts.ini cluster.yml -b -vvv
-vvv 参数表示输出运行日志
如果需要重置
可以执行以下命令:
涉及文件为reset.yml
。
ansible-playbook -i inventory/k8s/hosts.ini reset.yml -b -vvv
ansible命令执行完,出现以下日志,则说明部署成功,否则根据报错内容进行修改。
PLAY RECAP *****************************************************************************
kube-master-0 : ok=309 changed=30 unreachable=0 failed=0
kube-node-41 : ok=203 changed=8 unreachable=0 failed=0
kube-node-42 : ok=203 changed=8 unreachable=0 failed=0
localhost : ok=2 changed=0 unreachable=0 failed=0
以下为部分部署执行日志:
kubernetes/preinstall : Update package management cache (YUM) --------------------23.96s
/root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:121
kubernetes/master : Master | wait for the apiserver to be running ----------------23.44s
/root/gopath/src/kubespray/roles/kubernetes/master/handlers/main.yml:79
kubernetes/preinstall : Install packages requirements ----------------------------20.20s
/root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/main.yml:203
kubernetes/secrets : Check certs | check if a cert already exists on node --------13.94s
/root/gopath/src/kubespray/roles/kubernetes/secrets/tasks/check-certs.yml:17
gather facts from all instances --------------------------------------------------9.98s
/root/gopath/src/kubespray/cluster.yml:25
kubernetes/node : install | Compare host kubelet with hyperkube container --------9.66s
/root/gopath/src/kubespray/roles/kubernetes/node/tasks/install_host.yml:2
kubernetes-apps/ansible : Kubernetes Apps | Start Resources -----------------------9.27s
/root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/main.yml:37
kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template ------------8.47s
/root/gopath/src/kubespray/roles/kubernetes-apps/ansible/tasks/kubedns.yml:3
download : Sync container ---------------------------------------------------------8.23s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
kubernetes-apps/network_plugin/calico : Start Calico resources --------------------7.82s
/root/gopath/src/kubespray/roles/kubernetes-apps/network_plugin/calico/tasks/main.yml:2
download : Download items ---------------------------------------------------------7.67s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------7.48s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Sync container ---------------------------------------------------------7.35s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
download : Download items ---------------------------------------------------------7.16s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
network_plugin/calico : Calico | Copy cni plugins from calico/cni container -------7.10s
/root/gopath/src/kubespray/roles/network_plugin/calico/tasks/main.yml:62
download : Download items ---------------------------------------------------------7.04s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------7.01s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Sync container ---------------------------------------------------------7.00s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:15
download : Download items ---------------------------------------------------------6.98s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
download : Download items ---------------------------------------------------------6.79s
/root/gopath/src/kubespray/roles/download/tasks/main.yml:6
1、k8s组件信息
# kubectl get all --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/calico-node 3 3 3 3 3 <none> 2h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 2 2 2 2 2h
deploy/kubedns-autoscaler 1 1 1 1 2h
deploy/kubernetes-dashboard 1 1 1 1 2h
NAME DESIRED CURRENT READY AGE
rs/kube-dns-79d99cdcd5 2 2 2 2h
rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
rs/kubernetes-dashboard-69cb58d748 1 1 1 2h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/calico-node 3 3 3 3 3 <none> 2h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 2 2 2 2 2h
deploy/kubedns-autoscaler 1 1 1 1 2h
deploy/kubernetes-dashboard 1 1 1 1 2h
NAME DESIRED CURRENT READY AGE
rs/kube-dns-79d99cdcd5 2 2 2 2h
rs/kubedns-autoscaler-5564b5585f 1 1 1 2h
rs/kubernetes-dashboard-69cb58d748 1 1 1 2h
NAME READY STATUS RESTARTS AGE
po/calico-node-22vsg 1/1 Running 0 2h
po/calico-node-t7zgw 1/1 Running 0 2h
po/calico-node-zqnx8 1/1 Running 0 2h
po/kube-apiserver-kube-master-0 1/1 Running 0 22h
po/kube-controller-manager-kube-master-0 1/1 Running 0 2h
po/kube-dns-79d99cdcd5-f2t6t 3/3 Running 0 2h
po/kube-dns-79d99cdcd5-gw944 3/3 Running 0 2h
po/kube-proxy-kube-master-0 1/1 Running 2 22h
po/kube-proxy-kube-node-41 1/1 Running 3 22h
po/kube-proxy-kube-node-42 1/1 Running 3 22h
po/kube-scheduler-kube-master-0 1/1 Running 0 2h
po/kubedns-autoscaler-5564b5585f-lt9bb 1/1 Running 0 2h
po/kubernetes-dashboard-69cb58d748-wmb9x 1/1 Running 0 2h
po/nginx-proxy-kube-node-41 1/1 Running 3 22h
po/nginx-proxy-kube-node-42 1/1 Running 3 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP 2h
svc/kubernetes-dashboard ClusterIP 10.233.27.24 <none> 443/TCP 2h
2、k8s节点信息
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-0 Ready master 22h v1.9.5
kube-node-41 Ready node 22h v1.9.5
kube-node-42 Ready node 22h v1.9.5
3、组件健康信息
# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
如果需要扩容Node
节点,则修改hosts.ini
文件,增加新增的机器信息。例如,要增加节点机器kube-node-43(IP为172.16.94.143),修改后的文件内容如下:
# Configure 'ip' variable to bind kubernetes services on a
# different ip than the default iface
# 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
kube-node-43 ansible_ssh_host=172.16.94.143 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.143 mask=/24
# configure a bastion host if your nodes are not directly reachable
# bastion ansible_ssh_host=x.x.x.x
[kube-master]
kube-master-0
[etcd]
kube-master-0
[kube-node]
kube-node-41
kube-node-42
kube-node-43
[k8s-cluster:children]
kube-node
kube-master
[calico-rr]
涉及文件为scale.yml
。
# 进入主目录
cd kubespray
# 执行部署命令
ansible-playbook -i inventory/k8s/hosts.ini scale.yml -b -vvv
1、ansible的执行结果
PLAY RECAP ***************************************
kube-node-41 : ok=228 changed=11 unreachable=0 failed=0
kube-node-42 : ok=197 changed=6 unreachable=0 failed=0
kube-node-43 : ok=227 changed=69 unreachable=0 failed=0 # 新增Node节点
localhost : ok=2 changed=0 unreachable=0 failed=0
2、k8s的节点信息
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-0 Ready master 1d v1.9.5
kube-node-41 Ready node 1d v1.9.5
kube-node-42 Ready node 1d v1.9.5
kube-node-43 Ready node 1m v1.9.5 #该节点为新增Node节点
可以看到新增的kube-node-43
节点已经扩容完成。
3、k8s组件信息
# kubectl get po --namespace=kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
calico-node-22vsg 1/1 Running 0 10h 172.16.94.140 kube-master-0
calico-node-8fz9x 1/1 Running 2 27m 172.16.94.143 kube-node-43
calico-node-t7zgw 1/1 Running 0 10h 172.16.94.142 kube-node-42
calico-node-zqnx8 1/1 Running 0 10h 172.16.94.141 kube-node-41
kube-apiserver-kube-master-0 1/1 Running 0 1d 172.16.94.140 kube-master-0
kube-controller-manager-kube-master-0 1/1 Running 0 10h 172.16.94.140 kube-master-0
kube-dns-79d99cdcd5-f2t6t 3/3 Running 0 10h 10.233.100.194 kube-node-41
kube-dns-79d99cdcd5-gw944 3/3 Running 0 10h 10.233.107.1 kube-node-42
kube-proxy-kube-master-0 1/1 Running 2 1d 172.16.94.140 kube-master-0
kube-proxy-kube-node-41 1/1 Running 3 1d 172.16.94.141 kube-node-41
kube-proxy-kube-node-42 1/1 Running 3 1d 172.16.94.142 kube-node-42
kube-proxy-kube-node-43 1/1 Running 0 26m 172.16.94.143 kube-node-43
kube-scheduler-kube-master-0 1/1 Running 0 10h 172.16.94.140 kube-master-0
kubedns-autoscaler-5564b5585f-lt9bb 1/1 Running 0 10h 10.233.100.193 kube-node-41
kubernetes-dashboard-69cb58d748-wmb9x 1/1 Running 0 10h 10.233.107.2 kube-node-42
nginx-proxy-kube-node-41 1/1 Running 3 1d 172.16.94.141 kube-node-41
nginx-proxy-kube-node-42 1/1 Running 3 1d 172.16.94.142 kube-node-42
nginx-proxy-kube-node-43 1/1 Running 0 26m 172.16.94.143 kube-node-43
将hosts.ini
文件中的master和etcd的机器增加到多台,执行部署命令。
ansible-playbook -i inventory/k8s/hosts.ini cluster.yml -b -vvv
例如:
# Configure 'ip' variable to bind kubernetes services on a
# different ip than the default iface
# 主机名 ssh登陆IP ssh用户名 ssh登陆密码 机器IP 子网掩码
kube-master-0 ansible_ssh_host=172.16.94.140 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.140 mask=/24
kube-master-1 ansible_ssh_host=172.16.94.144 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.144 mask=/24
kube-master-2 ansible_ssh_host=172.16.94.145 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.145 mask=/24
kube-node-41 ansible_ssh_host=172.16.94.141 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.141 mask=/24
kube-node-42 ansible_ssh_host=172.16.94.142 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.142 mask=/24
kube-node-43 ansible_ssh_host=172.16.94.143 ansible_ssh_user=root ansible_ssh_pass=123 ip=172.16.94.143 mask=/24
# configure a bastion host if your nodes are not directly reachable
# bastion ansible_ssh_host=x.x.x.x
[kube-master]
kube-master-0
kube-master-1
kube-master-2
[etcd]
kube-master-0
kube-master-1
kube-master-2
[kube-node]
kube-node-41
kube-node-42
kube-node-43
[k8s-cluster:children]
kube-node
kube-master
[calico-rr]
选择对应的k8s版本信息,执行升级命令。涉及文件为upgrade-cluster.yml
。
ansible-playbook upgrade-cluster.yml -b -i inventory/k8s/hosts.ini -e kube_version=v1.10.4 -vvv
在使用kubespary部署k8s集群时,主要遇到以下报错。
报错内容:
fatal: [node1]: FAILED! => {"failed": true, "msg": "The ipaddr filter requires python-netaddr be installed on the ansible controller"}
解决方法:
需要安装 python-netaddr,具体参考上述[环境准备]内容。
报错内容:
fatal: [kube-master-0]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-41]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-42]: FAILED! => {
"assertion": "ansible_swaptotal_mb == 0",
"changed": false,
"evaluated_to": false
}
解决方法:
所有部署机器执行swapoff -a
关闭swap,具体参考上述[环境准备]内容。
报错内容:
TASK [kubernetes/preinstall : Stop if memory is too small for masters] *********************************************************************************************************************************************************************************************************
task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:52
Friday 10 August 2018 21:50:26 +0800 (0:00:00.940) 0:01:14.088 *********
fatal: [kube-master-0]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1500",
"changed": false,
"evaluated_to": false
}
TASK [kubernetes/preinstall : Stop if memory is too small for nodes] ***********************************************************************************************************************************************************************************************************
task path: /root/gopath/src/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:58
Friday 10 August 2018 21:50:27 +0800 (0:00:00.570) 0:01:14.659 *********
fatal: [kube-node-41]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1024",
"changed": false,
"evaluated_to": false
}
fatal: [kube-node-42]: FAILED! => {
"assertion": "ansible_memtotal_mb >= 1024",
"changed": false,
"evaluated_to": false
}
to retry, use: --limit @/root/gopath/src/kubespray/cluster.retry
解决方法:
调大所有部署机器的内存,本示例中调整为3G或以上。
kube-scheduler组件运行失败,导致http://localhost:10251/healthz调用失败。
报错内容:
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
FAILED - RETRYING: Master | wait for kube-scheduler (1 retries left).
fatal: [node1]: FAILED! => {"attempts": 60, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:10251/healthz"}
解决方法:
可能是内存不足导致,本示例中调大了部署机器的内存。
报错内容:
failed: [k8s-node-1] (item={u'name': u'docker-engine-1.13.1-1.el7.centos'}) => {
"attempts": 4,
"changed": false,
...
"item": {
"name": "docker-engine-1.13.1-1.el7.centos"
},
"msg": "Error: docker-ce-selinux conflicts with 2:container-selinux-2.66-1.el7.noarch\n",
"rc": 1,
"results": [
"Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * elrepo: mirrors.tuna.tsinghua.edu.cn\n * epel: mirrors.tongji.edu.cn\nPackage docker-engine is obsoleted by docker-ce, trying to install docker-ce-17.03.2.ce-1.el7.centos.x86_64 instead\nResolving Dependencies\n--> Running transaction check\n---> Package docker-ce.x86_64 0:17.03.2.ce-1.el7.centos will be installed\n--> Processing Dependency: docker-ce-selinux >= 17.03.2.ce-1.el7.centos for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Processing Dependency: libltdl.so.7()(64bit) for package: docker-ce-17.03.2.ce-1.el7.centos.x86_64\n--> Running transaction check\n---> Package docker-ce-selinux.noarch 0:17.03.2.ce-1.el7.centos will be installed\n---> Package libtool-ltdl.x86_64 0:2.4.2-22.el7_3 will be installed\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Restarting Dependency Resolution with new changes.\n--> Running transaction check\n---> Package container-selinux.noarch 2:2.55-1.el7 will be updated\n---> Package container-selinux.noarch 2:2.66-1.el7 will be an update\n--> Processing Conflict: docker-ce-selinux-17.03.2.ce-1.el7.centos.noarch conflicts docker-selinux\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"
]
}
解决方法:
卸载旧的docker版本,由kubespary自动安装。
sudo yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine
参考文章:
https://github.com/kubernetes-incubator/kubespray
https://github.com/kubernetes-incubator/kubespray/blob/master/docs/upgrades.md
镜像 | 版本 | 大小 | 镜像ID | 备注 |
---|---|---|---|---|
镜像 | 版本 | 大小 | 镜像ID | 备注 |
---|---|---|---|---|