強化學習筆記1-Python/OpenAI/TensorFlow/ROS-基礎知識

      網(wǎng)友投稿 947 2022-05-30

      概念:

      機器學習分支之一強化學習,學習通過與環(huán)境交互進行,是一種目標導向的方法。

      不告知學習者應采用行為,但其行為對于獎勵懲罰,從行為后果學習。

      機器人避開障礙物案例:

      靠近障礙物-10分,遠離障礙物+10分。

      智能體自己探索獲取優(yōu)良獎勵的各自行為,包括如下步驟:

      智能體執(zhí)行行為與環(huán)境交互

      行為執(zhí)行后,智能體從一個狀態(tài)轉(zhuǎn)移至另一個狀態(tài)

      依據(jù)行為獲得相應的獎勵或懲罰

      智能體理解正面和反面的行為效果

      獲取更多獎勵,避免懲罰,調(diào)整策略進行試錯學習。

      需要對比,理解和掌握強化學習與其他機器學習的差異,在機器人中的應用前景。

      強化學習元素:智能體,策略函數(shù),值函數(shù),模型等。

      環(huán)境類型:確定,隨機,完全可觀測,部分可觀測,離散,連續(xù),情景序列,非情景序列,單智能體,多智能體。

      強化學習平臺:OpenAI Gym/Universe/DeepMind Lab/RL-Glue/Rroject Malmo/VizDoom等。

      強化學習應用:教育!醫(yī)療!健康!制造業(yè)!管理!金融!細分行業(yè):自然語言處理/計算機視覺等。

      參考文獻:

      https://www.cs.ubc.ca/~murphyk/Bayes/pomdp.html

      https://morvanzhou.github.io/

      https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python

      配置:

      強化學習筆記1-Python/OpenAI/TensorFlow/ROS-基礎知識

      安裝配置Anaconda/Docker/OpenAI Gym/TensorFlow。

      由于涉及系統(tǒng)環(huán)境,版本配置各不相同,自行查閱資料配置即可。

      常用命令如下:

      bash/conda create/source activate/apt install/docker/pip3 install gym/universe/等。

      上述全部配置完成后,測試OpenAI Gym和OpenAI Universe。

      *.ipynb文檔查看:ipython notebook或jupyter notebook

      Gym案例:

      倒立擺案例:

      示例代碼

      import gym

      env = gym.make('CartPole-v0')

      env.reset()

      for _ in range(1000):

      env.render()

      env.step(env.action_space.sample())

      關于這個代碼更多內(nèi)容,參考鏈接:

      https://blog.csdn.net/ZhangRelay/article/details/89325679

      查看gym全部支持的環(huán)境。

      from gym import envs

      print(envs.registry.all())

      賽車示例:

      import gym

      env = gym.make('CarRacing-v0')

      env.reset()

      for _ in range(1000):

      env.render()

      env.step(env.action_space.sample())

      足式機器人:

      import gym

      env = gym.make('BipedalWalker-v2')

      for episode in range(100):

      observation = env.reset()

      # Render the environment on each step

      for i in range(10000):

      env.render()

      # we choose action by sampling random action from environment's action space. Every environment has

      # some action space which contains the all possible valid actions and observations,

      action = env.action_space.sample()

      # Then for each step, we will record the observation, reward, done, info

      observation, reward, done, info = env.step(action)

      # When done is true, we print the time steps taken for the episode and break the current episode.

      if done:

      print("{} timesteps taken for the Episode".format(i+1))

      break

      flash游戲環(huán)境示例:

      import gym

      import universe

      import random

      env = gym.make('flashgames.NeonRace-v0')

      env.configure(remotes=1)

      observation_n = env.reset()

      # Move left

      left = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', True),

      ('KeyEvent', 'ArrowRight', False)]

      # Move right

      right = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowLeft', False),

      ('KeyEvent', 'ArrowRight', True)]

      # Move forward

      forward = [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowRight', False),

      ('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'n', True)]

      # We use turn variable for deciding whether to turn or not

      turn = 0

      # We store all the rewards in rewards list

      rewards = []

      # we will use buffer as some kind of threshold

      buffer_size = 100

      # We set our initial action has forward i.e our car moves just forward without making any turns

      action = forward

      while True:

      turn -= 1

      # Let us say initially we take no turn and move forward.

      # First, We will check the value of turn, if it is less than 0

      # then there is no necessity for turning and we just move forward

      if turn <= 0:

      action = forward

      turn = 0

      action_n = [action for ob in observation_n]

      # Then we use env.step() to perform an action (moving forward for now) one-time step

      observation_n, reward_n, done_n, info = env.step(action_n)

      # store the rewards in the rewards list

      rewards += [reward_n[0]]

      # We will generate some random number and if it is less than 0.5 then we will take right, else

      # we will take left and we will store all the rewards obtained by performing each action and

      # based on our rewards we will learn which direction is the best over several timesteps.

      if len(rewards) >= buffer_size:

      mean = sum(rewards)/len(rewards)

      if mean == 0:

      turn = 20

      if random.random() < 0.5:

      action = right

      else:

      action = left

      rewards = []

      env.render()

      部分測試如下(多次測試):

      Python TensorFlow 機器學習

      版權聲明:本文內(nèi)容由網(wǎng)絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發(fā)現(xiàn)本站中有涉嫌抄襲或描述失實的內(nèi)容,請聯(lián)系我們jiasou666@gmail.com 處理,核實后本網(wǎng)站將在24小時內(nèi)刪除侵權內(nèi)容。

      上一篇:VSCode使用技巧
      下一篇:使用機器人操作系統(tǒng)ROS 2和仿真軟件Gazebo 9環(huán)境綜合測試教程(三)
      相關文章
      亚洲人成无码网WWW| 国产精品亚洲高清一区二区| 亚洲色成人WWW永久网站| www.91亚洲| 亚洲 综合 国产 欧洲 丝袜| 精品亚洲av无码一区二区柚蜜| 亚洲私人无码综合久久网| 亚洲人成网站色7799| 亚洲αⅴ无码乱码在线观看性色| 亚洲色偷偷色噜噜狠狠99| 亚洲欧美日韩综合久久久久| 亚洲日韩AV一区二区三区四区 | 亚洲一区爱区精品无码| 伊人婷婷综合缴情亚洲五月| 中文字幕久久亚洲一区| 亚洲午夜福利AV一区二区无码| 亚洲中文字幕日产乱码高清app| 亚洲中文字幕日产乱码高清app| 亚洲精品二区国产综合野狼| 欧洲亚洲国产清在高| 亚洲五月六月丁香激情| 91亚洲国产成人久久精品网址 | 亚洲高清最新av网站| 亚洲一区二区精品视频| 亚洲精品乱码久久久久久按摩| 亚洲国产美国国产综合一区二区| 亚洲高清视频在线播放| 亚洲三级在线免费观看| 亚洲色最新高清av网站| 久久人午夜亚洲精品无码区| 亚洲av无码天堂一区二区三区 | 亚洲一区在线视频| 亚洲精品动漫免费二区| WWW国产亚洲精品久久麻豆| 亚洲毛片av日韩av无码| 久久亚洲国产精品一区二区| 亚洲高清日韩精品第一区| 久久亚洲精品专区蓝色区| 亚洲AV无码一区二区一二区| 亚洲人AV永久一区二区三区久久| 亚洲中久无码永久在线观看同|