【MindSpore第六期兩日集訓營】MindElec作業記錄
【MindSpore第六期兩日集訓營】于2021年11月6日到11月7日在B站拉開了帷幕,錯過直播 https://live.bilibili.com/22127570 的老鐵們別忘了還有錄播,鏈接分別為:
第一天:
第六期兩日集訓營 | MindSpore AI電磁仿真 https://www.bilibili.com/video/BV1Y34y1Z7E8?spm_id_from=333.999.0.0
第六期兩日集訓營 | MindSpore并行使能大模型訓練 https://www.bilibili.com/video/BV193411b7on?spm_id_from=333.999.0.0
第六期兩日集訓營 | MindSpore Boost,讓你的訓練變得飛快 https://www.bilibili.com/video/BV1c341187ML?spm_id_from=333.999.0.0
第二天:
第六期兩日集訓營 | MindSpore 控制流概述 https://www.bilibili.com/video/BV1A34y1d7G7?spm_id_from=333.999.0.0
第六期兩日集訓營 | MindSpore Lite1.5特性發布,帶來全新端側AI體驗 https://www.bilibili.com/video/BV1f34y1o7mR?spm_id_from=333.999.0.0
第六期兩日集訓營 | 可視化集群調優重磅發布,從LeNet到盤古大模型都能調優 https://www.bilibili.com/video/BV1dg411K7Nb?spm_id_from=333.999.0.0
我們先看第一天第一講,MindScience的MindElec——電磁仿真。
第一講的作業如下:
其實張小白已經嘗試過MindScience的MindSPONGE分子模擬套件包了:
具體鏈接如下:
論壇:https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=159269
博客:https://bbs.huaweicloud.com/blogs/302842
但是既然作業2要求做MindElec電磁仿真,所以,作業1也可以用MindElec來做一下。
一、購買ECS GPU云服務器
我們使用ECS的GPU云服務器來完成這個作業的MindElec部分,MindSponge的部分請看前面的鏈接。
到華為云的控制臺-》ECS,切換到北京四,按照下圖所示購買:
點擊立即購買:
由于費用是1小時7塊多,所以張小白迫不及待地登陸進去。
先看了一下內存和CUDA的版本:11.0
二、安裝Anaconda環境
由于MindSpore傳統上都是使用Python 3.7.5環境(當然后面也支持了Python 3.9),所以先裝conda環境:
...
...
source ~/.bashrc
發現裝的版本太老了,只好重新下載最新的Anaconda:
下載好后將其傳到服務器,執行:
bash ./Anaconda3-2021.05-Linux-x86_64.sh
安裝的時候自然提示目錄已存在,
rm -rf /root/anaconda3
重新執行:
bash ./Anaconda3-2021.05-Linux-x86_64.sh
三、創建mindspore1.5的conda環境:
conda create -n mindspore1.5 python=3.7.5
。。。
conda activate mindspore1.5
conda install -c conda-forge pythonocc-core=7.5.1 cudatoolkit=11.1
按Y繼續:
conda環境的CUDA 11.1的包比較大(1.2G),要耐心等待下載。
pythonocc也在其中。
四、安裝mindspore 1.5的GPU版本
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindSpore/gpu/x86_64/cuda-11.1/mindspore_gpu-1.5.0-cp37-cp37m-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
五、安裝mindelec:
我們直接使用官網提供的MindElec的包安裝吧,雖然名字寫的是ascend,但是老師說gpu也能用。
wget?https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindScience/x86_64/mindscience_mindelec_ascend-0.1.0-cp37-cp37m-linux_x86_64.whl
pip install ./mindscience_mindelec_ascend-0.1.0-cp37-cp37m-linux_x86_64.whl -i?https://pypi.tuna.tsinghua.edu.cn/simple
驗證安裝:
出錯了,cuda是11.0版本了,而且cudnn似乎沒有安裝。
六·、安裝cuda 11.1和對應的cudnn 8.0.5
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
sh?cuda_11.1.0_455.23.05_linux.run
按下面的方式選擇:
按圖中的提示方式修改~/.bashrc:
PATH 增加 /usr/local/cuda-11.1/bin
LD_LIBRARY_PATH 增加 /usr/local/cuda-11.1/lib64
再檢查一下CUDA版本:
nvidia-smi
是11.1了。
下載CUDA 11.1對應的cudnn 8.0.5(其他版本也可以裝,只要對應CUDA 11.1即可),并將其上傳到服務器:
解壓
tar -zxvf cudnn-11.1-linux-x64-v8.0.5.39.tgz
將其拷貝到cuda的相應目錄下:
七、驗證mindspore 1.5和MindElec的安裝:
python -c "import mindspore;mindspore.run_check()"
或者 vi test.py
python test.py
驗證mindelec的安裝:
python -c 'import mindelec'
好像萬事俱備。
那么能不能成功嘗試mindelec的例子呢?
八、下載MindElec代碼倉:
git clone https://gitee.com/mindspore/mindscience.git
九、安裝依賴包
1、安裝easydict
2、安裝opencv
pip install opencv-python -i ?https://pypi.tuna.tsinghua.edu.cn/simple
十、驗證
1、試驗數據驅動的參數化電磁仿真:
https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/data_driven/parameterization
以下試驗均需將相關代碼中的Ascend改為GPU后再進行驗證,以后不再贅述。
。。。
終于結束了:
具體結果如下:
epoch: 9966 step: 55, loss is 1.067301e-06 epoch time: 156.272 ms, per step time: 2.841 ms epoch: 9967 step: 55, loss is 1.6718128e-06 epoch time: 161.586 ms, per step time: 2.938 ms epoch: 9968 step: 55, loss is 1.9428162e-06 epoch time: 165.269 ms, per step time: 3.005 ms epoch: 9969 step: 55, loss is 1.1494253e-06 epoch time: 160.396 ms, per step time: 2.916 ms epoch: 9970 step: 55, loss is 1.2750754e-06 epoch time: 154.781 ms, per step time: 2.814 ms epoch: 9971 step: 55, loss is 1.2550026e-06 epoch time: 160.627 ms, per step time: 2.920 ms epoch: 9972 step: 55, loss is 1.4948789e-06 epoch time: 159.846 ms, per step time: 2.906 ms epoch: 9973 step: 55, loss is 1.8957531e-06 epoch time: 164.061 ms, per step time: 2.983 ms epoch: 9974 step: 55, loss is 1.8941449e-06 epoch time: 164.542 ms, per step time: 2.992 ms epoch: 9975 step: 55, loss is 2.340197e-06 epoch time: 166.823 ms, per step time: 3.033 ms epoch: 9976 step: 55, loss is 1.5545256e-06 epoch time: 152.811 ms, per step time: 2.778 ms epoch: 9977 step: 55, loss is 9.994957e-07 epoch time: 171.435 ms, per step time: 3.117 ms epoch: 9978 step: 55, loss is 2.12672e-06 epoch time: 154.989 ms, per step time: 2.818 ms epoch: 9979 step: 55, loss is 1.5981371e-06 epoch time: 159.917 ms, per step time: 2.908 ms epoch: 9980 step: 55, loss is 1.6546201e-06 epoch time: 151.021 ms, per step time: 2.746 ms epoch: 9981 step: 55, loss is 1.5869264e-06 epoch time: 162.313 ms, per step time: 2.951 ms epoch: 9982 step: 55, loss is 1.1969032e-06 epoch time: 168.984 ms, per step time: 3.072 ms epoch: 9983 step: 55, loss is 1.1927513e-06 epoch time: 163.749 ms, per step time: 2.977 ms epoch: 9984 step: 55, loss is 1.0608298e-06 epoch time: 160.595 ms, per step time: 2.920 ms epoch: 9985 step: 55, loss is 1.964669e-06 epoch time: 155.398 ms, per step time: 2.825 ms epoch: 9986 step: 55, loss is 1.5706166e-06 epoch time: 165.935 ms, per step time: 3.017 ms epoch: 9987 step: 55, loss is 1.3382705e-06 epoch time: 163.523 ms, per step time: 2.973 ms epoch: 9988 step: 55, loss is 1.2119517e-06 epoch time: 168.339 ms, per step time: 3.061 ms epoch: 9989 step: 55, loss is 1.7882771e-06 epoch time: 159.096 ms, per step time: 2.893 ms epoch: 9990 step: 55, loss is 1.1589409e-06 epoch time: 160.459 ms, per step time: 2.917 ms epoch: 9991 step: 55, loss is 8.78855e-07 epoch time: 156.461 ms, per step time: 2.845 ms epoch: 9992 step: 55, loss is 1.3546548e-06 epoch time: 157.824 ms, per step time: 2.870 ms epoch: 9993 step: 55, loss is 3.1089023e-06 epoch time: 158.035 ms, per step time: 2.873 ms epoch: 9994 step: 55, loss is 1.4939134e-06 epoch time: 160.428 ms, per step time: 2.917 ms epoch: 9995 step: 55, loss is 2.164372e-06 epoch time: 155.159 ms, per step time: 2.821 ms epoch: 9996 step: 55, loss is 9.635824e-07 epoch time: 156.919 ms, per step time: 2.853 ms epoch: 9997 step: 55, loss is 1.0471658e-06 epoch time: 160.262 ms, per step time: 2.914 ms epoch: 9998 step: 55, loss is 1.4574234e-06 epoch time: 160.660 ms, per step time: 2.921 ms epoch: 9999 step: 55, loss is 2.0352143e-06 epoch time: 150.130 ms, per step time: 2.730 ms epoch: 10000 step: 55, loss is 9.816508e-07 epoch time: 156.031 ms, per step time: 2.837 ms Eval current epoch: 10000 loss: 0.0002412886533234922 l2_s11: 0.0030976369803562306
ckpt下應該是訓練好的模型:
在eval_res下有49張圖片:
將其下載下來可以看到:
2、試驗物理驅動的AI求解頻域麥克斯韋方程:
https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/frequency_domain_maxwell
cd ~/mindscience/MindElec/examples/physics_driven/frequency_domain_maxwell
python solve.py
。。
具體結果如下:
(mindspore1.5) root@ecs-zhanghui-gpu:~/mindscience/MindElec/examples/physics_driven/frequency_domain_maxwell# python solve.py pid: 2676 check test dataset shape: (10201, 2), (10201, 1) [WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.369.176 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 679_75_construct.92, J user: 679_75_construct.92:construct{[0]: [CNode]93, [1]: x0, [2]: u} [WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.382.175 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 622_132_construct.94, J user: 622_132_construct.94:construct{[0]: [CNode]95, [1]: x0, [2]: u} [WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.595.722 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 894_465_7_construct.116, J user: 894_465_7_construct.116:construct{[0]: [CNode]117, [1]: [CNode]118, [2]: [CNode]119} [WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.614.336 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 894_465_7_construct.116, J user: 894_465_7_construct.116:construct{[0]: [CNode]120, [1]: [CNode]118, [2]: [CNode]121} [WARNING] CORE(2676,7feb1bba3740,python):2021-11-09-00:05:07.738.476 [mindspore/core/ir/anf_extends.cc:65] fullname_with_scope] Input 0 of cnode is not a value node, its type is CNode. epoch: 1 step: 78, loss is 600.0 epoch time: 11268.853 ms, per step time: 144.472 ms epoch: 2 step: 78, loss is 225.4 epoch time: 1389.687 ms, per step time: 17.816 ms epoch: 3 step: 78, loss is 199.9 ================================Start Evaluation================================ Total prediction time: 0.19255661964416504 s l2_error: 0.20626515080160301 =================================End Evaluation================================= epoch time: 1610.614 ms, per step time: 20.649 ms epoch: 4 step: 78, loss is 10.19 epoch time: 1730.271 ms, per step time: 22.183 ms epoch: 5 step: 78, loss is 2.803 epoch time: 1429.185 ms, per step time: 18.323 ms epoch: 6 step: 78, loss is 2.316 ================================Start Evaluation================================ Total prediction time: 0.0025403499603271484 s l2_error: 0.019291123630052236 =================================End Evaluation================================= epoch time: 1420.687 ms, per step time: 18.214 ms epoch: 7 step: 78, loss is 2.2 epoch time: 1844.602 ms, per step time: 23.649 ms epoch: 8 step: 78, loss is 1.953 epoch time: 1408.553 ms, per step time: 18.058 ms epoch: 9 step: 78, loss is 1.856 ================================Start Evaluation================================ Total prediction time: 0.0025916099548339844 s l2_error: 0.015916268073532643 =================================End Evaluation================================= epoch time: 1404.208 ms, per step time: 18.003 ms epoch: 10 step: 78, loss is 1.33 epoch time: 1459.013 ms, per step time: 18.705 ms l2 error: 0.0159162681 per step time: 18.7052916258
3、試驗物理驅動的AI求解點源麥克斯韋方程組
https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/incremental_learning
cd ~/mindscience/MindElec/examples/physics_driven/incremental_learning
修改為GPU之后執行:
python piad.py --mode=pretrain
。。。
耐心等待:
突然發現pretrain的epoch是3000:
由于張小白囊中羞澀,所以果然暫停了訓練:
但是估計mindspore團隊是經過估算的,只有跑3000個epoch才能把loss降到0.1以下吧。。。現在loss雖然在收斂,但是還是蠻高的。
4、試驗物理驅動的AI求解點源麥克斯韋方程組
https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/time_domain_maxwell
cd ~/mindscience/MindElec/examples/physics_driven/time_domain_maxwell
改下GPU。
基于上個試驗的教訓,果然的修改配置,減少下epoch:
將epoch從6000降到100。
開始訓練:
100個還是蠻快的。
同樣的,雖然減少了epoch,但是loss確實在收斂之中,想必修煉6000次之后確實會成為六神裝。
但是,張小白不能用自己的血汗錢去試,所以,這個時候關機走人是最好的解脫了。
這樣子,基本上就完成了MindScience的MindElec作業。
(全文完,謝謝閱讀)
GPU加速云服務器 MindSpore Python 機器學習
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。