Docker的容器編排
docker的容器編排
容器編排,聽上去是一個非常高大尚的詞匯,通俗一點,可以理解成“集群管理”。docker的容器編排工具有不少,最出名的三個,被譽為docker三劍客(Compose、Machine、Swarm)。前兩者,都是第三方提供的,而Swarm,則是docker官方的容器編排工具,已經被集成在docker中。
Swarm由三大部分組成:
swarm:集群管理
node:節點管理
service:服務管理
集群與節點管理
使用docker swarm命令,可以創建或加入集群,docker集群中的節點分為manager和worker兩種。這兩種節點,都可以運行docker容器,但只有manager節點,擁有管理功能。
一個集群中,只有manager節點,也可以正常的工作。
創建并加入集群
我測試的環境有兩臺機器,ip地址分別為192.168.1.220和192.168.1.116。下面在220上創建集群:
# docker swarm init Swarm initialized: current node (ppmurem8j7mdbmgpdhssjh0h9) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-3e4l8crbt04xlqfxwyw6nuf9gtcwpw72zggtayzy8clyqmvb5h-7o6ww4ftwm38dz7ydbolsz3kd 192.168.1.220:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
執行docker swarm init后,集群就被創建好了。當前的機器,自動成為集群的manager節點,并且輸出了其他機器加入集群的方式,即:docker swarm join --token SWMTKN-1-3e4l8crbt04xlqfxwyw6nuf9gtcwpw72zggtayzy8clyqmvb5h-7o6ww4ftwm38dz7ydbolsz3kd 192.168.1.220:2377。使用這個token加入的節點,是worker節點,如果想加入一個新的manager節點,可以執行docker swarm join-token manager,它也會輸出一串類似的命令,執行就可以以manager的方式加入。如果忘記加入的命令,也可以使用docker swarm join-token worker進行查看。
下面在116上執行加入命令:
# docker swarm join --token SWMTKN-1-12dlq70adr3z38mlkltc288rdzevtjn73xse7d0qndnjmx45zs-b1kwenzmrsqb4o5nvni5rafcr 192.168.1.220:2377 This node joined a swarm as a worker.
這里發生了一個小插曲,在我創建集群的兩臺機器的時區不一致,導致在加入worker節點時報錯:
Error response from daemon: error while validating Root CA Certificate: x509: certificate has expired or is not yet valid
在更新了220的時區后,依然無法加入。于是,我刪除了集群又重新創建,就可以了。沒有嘗試使用docker swarm update是不是也可以。
加入了集群后,可以在manager節點上,查詢集群的節點:
# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION 9b4cmakc4hpc9ra4rruy5x5yo * localhost.localdomain Ready Active Leader 20.10.3 hz50cnwrbk4vxa7h0g23ccil9 zhangmh-virtual-machine Ready Active 20.10.1
退出集群
在116上執行下面命令,可以退出集群:
# docker swarm leave Node left the swarm.
再次查看節點:
# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION 9b4cmakc4hpc9ra4rruy5x5yo * localhost.localdomain Ready Active Leader 20.10.3 hz50cnwrbk4vxa7h0g23ccil9 zhangmh-virtual-machine Down Active 20.10.1
發現剛退出的這個節點還在,只是狀態變成了Down。需要在manager節點中刪除:
# docker node rm hz50cnwrbk4vxa7h0g23ccil9 hz50cnwrbk4vxa7h0g23ccil9 # docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION 9b4cmakc4hpc9ra4rruy5x5yo * localhost.localdomain Ready Active Leader 20.10.3 xby86ffkqw3axyfkwd4s7nubz zhangmh-virtual-machine Ready Active 20.10.1
這樣才真正刪除了節點。
如果退出的節點是manager節點,需要強制退出,即:docker swarm leave -f。
將節點提升為manager節點
只有一個manager的集群是不穩定的,當manager節點崩潰時,整個集群就群龍無首了。docker認為,一個集群中應該至少有三個manager節點,并且有一半以上的manager節點是可達的,才能保證集群的正常運行。當集群中只有兩個manager節點,且有一個節點出現問題時,整個集群還是處于不可用的狀態。
當然,對于我們測試,是沒有必要的,我們只需要使用兩個manager節點,測試一下是否可以主從切換就可以了。使用下面的命令,可以直接將workder節點提升為manager節點:
# docker node promote xby86ffkqw3axyfkwd4s7nubz Node xby86ffkqw3axyfkwd4s7nubz promoted to a manager in the swarm. # docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION 9b4cmakc4hpc9ra4rruy5x5yo * localhost.localdomain Ready Active Leader 20.10.3 xby86ffkqw3axyfkwd4s7nubz zhangmh-virtual-machine Ready Active Reachable 20.10.1
OK,現在有兩個manager節點了,220的狀態為leader,即當前是領導節點,116的狀態為Reachable,是可達的。下面關閉220節點的docker服務:
# systemctl stop docker Warning: Stopping docker.service, but it can still be activated by: docker.socket
關閉時輸出了一個警告,意思是docker服務已經被關閉了,但它仍然可被docker.socket服務喚醒。再次查看節點狀態:
# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION 9b4cmakc4hpc9ra4rruy5x5yo * localhost.localdomain Ready Active Reachable 20.10.3 xby86ffkqw3axyfkwd4s7nubz zhangmh-virtual-machine Ready Active Leader 20.10.1
可以看到116已經成為了Leader,并且,220也以經被喚醒了,是可達的。看來,docker集群的穩定是相當不錯的。
服務管理
集群中各節點都配置好后,就可以創建服務了。docker的服務其實就是啟動容器,并且賦予了容器副本和負載均衡的能力。以之前創建的ws:1.0為例,創建5個副本:
# docker service create --replicas 5 --name ws -p 80:8000 ws:1.0 image ws:1.0 could not be accessed on a registry to record its digest. Each node will access ws:1.0 independently, possibly leading to different nodes running different versions of the image. 1nj3o38slbo2zwt5p69l1qi5t overall progress: 5 out of 5 tasks 1/5: running [==================================================>] 2/5: running [==================================================>] 3/5: running [==================================================>] 4/5: running [==================================================>] 5/5: running [==================================================>] verify: Service converged
服務已經創建并運行了,使用瀏覽器訪問220和116的80端口都可以訪問。
使用docker service ls命令可以查看ws服務:
# docker service ls ID NAME MODE REPLICAS IMAGE PORTS 1nj3o38slbo2 ws replicated 5/5 ws:1.0 *:80->8000/tcp
使用docker service ps ws 命令可查看ws服務的進程:
# docker service ps ws ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS jpckj0mn24ae ws.1 ws:1.0 zhangmh-virtual-machine Running Running 6 minutes ago yrrdn4ntb089 ws.2 ws:1.0 localhost.localdomain Running Running 6 minutes ago mdjxadbmlmhs ws.3 ws:1.0 zhangmh-virtual-machine Running Running 6 minutes ago kqdwfrddbaxd ws.4 ws:1.0 localhost.localdomain Running Running 6 minutes ago is2iimz1v4eb ws.5 ws:1.0 zhangmh-virtual-machine Running Running 6 minutes ago
可以看到有兩個進程運行在220上,三個進程運行在116上。我在瀏覽器上訪問了幾次之后 ,使用docker service logs ws命令查看服務的日志:
# docker service logs ws ws.5.is2iimz1v4eb@zhangmh-virtual-machine | [I 210219 01:57:23 web:2239] 200 GET / (10.0.0.2) 3.56ms ws.5.is2iimz1v4eb@zhangmh-virtual-machine | [W 210219 01:57:23 web:2239] 404 GET /favicon.ico (10.0.0.2) 0.97ms ws.5.is2iimz1v4eb@zhangmh-virtual-machine | [I 210219 01:57:28 web:2239] 200 GET / (10.0.0.4) 0.82ms ws.5.is2iimz1v4eb@zhangmh-virtual-machine | [W 210219 01:57:28 web:2239] 404 GET /favicon.ico (10.0.0.4) 0.79ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:01:45 web:2239] 304 GET / (10.0.0.2) 1.82ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:01:59 web:2239] 304 GET / (10.0.0.2) 0.49ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:01 web:2239] 304 GET / (10.0.0.2) 2.05ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:02 web:2239] 304 GET / (10.0.0.2) 0.89ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:02 web:2239] 304 GET / (10.0.0.2) 1.13ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:03 web:2239] 304 GET / (10.0.0.2) 0.92ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:03 web:2239] 304 GET / (10.0.0.2) 2.19ms ws.1.jpckj0mn24ae@zhangmh-virtual-machine | [I 210219 02:02:20 web:2239] 304 GET / (10.0.0.2) 1.00ms
可以看到即使我訪問的是220,而實際訪問的扔然是116上的進程。
如果把116關機,116上運行的進程會自動轉移到220的節點中,因為116現在是manager節點,如果停止,集群會進入不可用的狀態,所以,需要先將其降級為worker節點:
# docker node demote xby86ffkqw3axyfkwd4s7nubz Manager xby86ffkqw3axyfkwd4s7nubz demoted in the swarm.
然后,將116關機。
# docker service ps ws ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS jrj9ben9vr5c ws.1 ws:1.0 localhost.localdomain Running Running 57 minutes ago yrrdn4ntb089 ws.2 ws:1.0 localhost.localdomain Running Running about an hour ago opig9zrmp261 ws.3 ws:1.0 localhost.localdomain Running Running 57 minutes ago kqdwfrddbaxd ws.4 ws:1.0 localhost.localdomain Running Running about an hour ago hiz8730pl3je ws.5 ws:1.0 localhost.localdomain Running Running 57 minutes ago
可以看到5個進程都轉移到220上運行了。
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bc4c457ce769 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.5.hiz8730pl3je7qvo2lv6k554b c846ac1c4d91 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.3.opig9zrmp2619t4e1o3ntnj2w 214daa36c138 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.1.jrj9ben9vr5c3biuc90xtoffh 17842db9dc47 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.4.kqdwfrddbaxd5z78uo3zsy5sd 47185ba9a4fd ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.2.yrrdn4ntb089t6i66w8xvq8r9 # docker kill bc4c457ce769 bc4c457ce769
殺死第5個進程后,等待幾秒再查看進程:
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 416b55e8d174 ws:1.0 "/bin/sh -c 'python …" About a minute ago Up About a minute ws.5.fvpm334t2zqbj5l50tyx5glr6 c846ac1c4d91 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.3.opig9zrmp2619t4e1o3ntnj2w 214daa36c138 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.1.jrj9ben9vr5c3biuc90xtoffh 17842db9dc47 ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.4.kqdwfrddbaxd5z78uo3zsy5sd 47185ba9a4fd ws:1.0 "/bin/sh -c 'python …" 3 hours ago Up 3 hours ws.2.yrrdn4ntb089t6i66w8xvq8r9
第5個進程又被啟動。
docker服務的副本數量是可以動態調整的,比如系統負載過高,需要添加副本時,只需要執行:
# docker service scale ws=6 ws scaled to 6 overall progress: 6 out of 6 tasks 1/6: running [==================================================>] 2/6: running [==================================================>] 3/6: running [==================================================>] 4/6: running [==================================================>] 5/6: running [==================================================>] 6/6: running [==================================================>] verify: Service converged
這樣,就增加了一個副本。
服務創建好以后,就可以隨著docker的系統服務被啟動,只要執行:
systemctl enable docker
剛才創建的集群和服務都會開機啟動,不用擔心機器重啟導致程序運行不正常。
共享數據卷
首先,使用docker volume create命令創建一個數據卷:
# docker volume create ws_volume ws_volume
創建完成后,使用docker volume ls命令可查看現有的數據卷:
# docker volume ls DRIVER VOLUME NAME local ws_volume
使用docker inspect命令可查看數據卷的詳細信息:
# docker inspect ws_volume [ { "CreatedAt": "2021-02-19T14:09:58+08:00", "Driver": "local", "Labels": {}, "Mountpoint": "/var/lib/docker/volumes/ws_volume/_data", "Name": "ws_volume", "Options": {}, "Scope": "local" } ]
在創建service時,可使用--mount參數將數據卷掛載到service中:
# docker service create --replicas 2 --name ws -p 80:8000 --mount type=volume,src=ws_volume,dst=/volume ws:1.0 image ws:1.0 could not be accessed on a registry to record its digest. Each node will access ws:1.0 independently, possibly leading to different nodes running different versions of the image. iiiit9slq9qqwcdwwi0w0mcz5 overall progress: 2 out of 2 tasks 1/2: running [==================================================>] 2/2: running [==================================================>] verify: Service converged
--mount有很多的子參數,把它們寫成key=value的形式,然后用逗號隔開即可,最簡單的,只需要設置type、src、dst三個參數即可。
Docker 任務調度
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。