ROS-基礎知識

網(wǎng)友投稿 947 2022-05-30

概念：

機器學習分支之一強化學習，學習通過與環(huán)境交互進行，是一種目標導向的方法。

不告知學習者應采用行為，但其行為對于獎勵懲罰，從行為后果學習。

機器人避開障礙物案例：

靠近障礙物-10分，遠離障礙物+10分。

智能體自己探索獲取優(yōu)良獎勵的各自行為，包括如下步驟：

智能體執(zhí)行行為與環(huán)境交互

行為執(zhí)行后，智能體從一個狀態(tài)轉(zhuǎn)移至另一個狀態(tài)

依據(jù)行為獲得相應的獎勵或懲罰

智能體理解正面和反面的行為效果

獲取更多獎勵，避免懲罰，調(diào)整策略進行試錯學習。

需要對比，理解和掌握強化學習與其他機器學習的差異，在機器人中的應用前景。

強化學習元素：智能體，策略函數(shù)，值函數(shù)，模型等。

環(huán)境類型：確定，隨機，完全可觀測，部分可觀測，離散，連續(xù)，情景序列，非情景序列，單智能體，多智能體。

強化學習平臺：OpenAI Gym/Universe/DeepMind Lab/RL-Glue/Rroject Malmo/VizDoom等。

強化學習應用：教育！醫(yī)療！健康！制造業(yè)！管理！金融！細分行業(yè)：自然語言處理/計算機視覺等。

參考文獻：

https://www.cs.ubc.ca/~murphyk/Bayes/pomdp.html

https://morvanzhou.github.io/

https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python

配置：

強化學習筆記1-Python/OpenAI/TensorFlow/ROS-基礎知識

安裝配置Anaconda/Docker/OpenAI Gym/TensorFlow。

由于涉及系統(tǒng)環(huán)境，版本配置各不相同，自行查閱資料配置即可。

常用命令如下：

bash/conda create/source activate/apt install/docker/pip3 install gym/universe/等。

上述全部配置完成后，測試OpenAI Gym和OpenAI Universe。

*.ipynb文檔查看：ipython notebook或jupyter notebook

Gym案例：

倒立擺案例：

示例代碼

import gym

env = gym.make('CartPole-v0')

env.reset()

for _ in range(1000):

env.render()

env.step(env.action_space.sample())

關于這個代碼更多內(nèi)容，參考鏈接：

https://blog.csdn.net/ZhangRelay/article/details/89325679

查看gym全部支持的環(huán)境。

from gym import envs

print(envs.registry.all())

賽車示例：

import gym

env = gym.make('CarRacing-v0')

env.reset()

for _ in range(1000):

env.render()

env.step(env.action_space.sample())

足式機器人：

import gym

env = gym.make('BipedalWalker-v2')

for episode in range(100):

observation = env.reset()

# Render the environment on each step

for i in range(10000):

env.render()

# we choose action by sampling random action from environment's action space. Every environment has

# some action space which contains the all possible valid actions and observations,

action = env.action_space.sample()

# Then for each step, we will record the observation, reward, done, info

observation, reward, done, info = env.step(action)

# When done is true, we print the time steps taken for the episode and break the current episode.

if done:

print("{} timesteps taken for the Episode".format(i+1))

break

flash游戲環(huán)境示例：

import gym

import universe

import random

env = gym.make('flashgames.NeonRace-v0')

env.configure(remotes=1)

observation_n = env.reset()

# Move left

left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),

('KeyEvent', 'ArrowRight', False)]

# Move right

right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),

('KeyEvent', 'ArrowRight', True)]

# Move forward

forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),

('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

# We use turn variable for deciding whether to turn or not

turn = 0

# We store all the rewards in rewards list

rewards = []

# we will use buffer as some kind of threshold

buffer_size = 100

# We set our initial action has forward i.e our car moves just forward without making any turns

action = forward

while True:

turn -= 1

# Let us say initially we take no turn and move forward.

# First, We will check the value of turn, if it is less than 0

# then there is no necessity for turning and we just move forward

if turn <= 0:

action = forward

turn = 0

action_n = [action for ob in observation_n]

# Then we use env.step() to perform an action (moving forward for now) one-time step

observation_n, reward_n, done_n, info = env.step(action_n)

# store the rewards in the rewards list

rewards += [reward_n[0]]

# We will generate some random number and if it is less than 0.5 then we will take right, else

# we will take left and we will store all the rewards obtained by performing each action and

# based on our rewards we will learn which direction is the best over several timesteps.

if len(rewards) >= buffer_size:

mean = sum(rewards)/len(rewards)

if mean == 0:

turn = 20

if random.random() < 0.5:

action = right

else:

action = left

rewards = []

env.render()

部分測試如下（多次測試）：

Python TensorFlow 機器學習

標簽：強化學習筆記

學習 筆記20170601">【PMP】學習 筆記20170601

947 2022-05-30

前端 學習 -- NuxtJS學習筆記">大前端 學習 -- NuxtJS學習筆記

947 2022-05-30

課堂 筆記">HTML課堂 筆記

947 2022-05-30

強化 學習 筆記1-Python/OpenAI/TensorFlow/ROS-基礎知識

學習 筆記20170601">【PMP】學習 筆記20170601

前端 學習 -- NuxtJS學習筆記">大前端 學習 -- NuxtJS學習筆記

課堂 筆記">HTML課堂 筆記

推薦文章

企業(yè)生產(chǎn)管理是什么，企業(yè)生產(chǎn)管理軟件

進盤點進銷存軟件排行榜前十名

進銷存系統(tǒng)哪個簡單好用？進銷存系統(tǒng)優(yōu)點

工廠生產(chǎn)管理（工廠生產(chǎn)管理流程及制度）

生產(chǎn)管理軟件，機械制造業(yè)生產(chǎn)管理，制造業(yè)生產(chǎn)過程管理軟件

進銷存軟件和ERP有什么區(qū)別？進銷存與erp軟件理解

進銷存如何進行庫存管理

如何利用excel制作銷售訂單管理系統(tǒng)？

數(shù)據(jù)庫訂單管理系統(tǒng)有哪些功能？數(shù)據(jù)庫訂單管理系統(tǒng)怎么設計？

什么是數(shù)據(jù)庫管理系統(tǒng)？

最近發(fā)表

熱評文章

零代碼開發(fā)是什么？2022低代碼平臺排行榜">零代碼開發(fā)是什么？2022低代碼平臺排行榜

進銷存庫存管理 系統(tǒng)（智慧進銷存）">智能進銷存庫存管理系統(tǒng)（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

WPS2016怎么繪制簡單的價格表?

定制家居數(shù)字化管理模式：提升品質(zhì)、智能化和個性化的未

智能定制家居管理系統(tǒng)：重新定義家庭生活方式

友情鏈接