TAILIEUCHUNG - Robot Learning 2010 Part 7

Tham khảo tài liệu 'robot learning 2010 part 7', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | Uncertainty in Reinforcement Learning Awareness Quantisation and Control 83 While the full-matrix UP is the more fundamental and theoretically more sound method its computational cost is considerable see table 3 . If used with care however DUIPI and DUIPI-QM constitute valuable alternatives that proved well in practice. Although our experiments are rather small we expect DUIPI and DUIPI-QM to also perform well on larger problems. Increasing the expected performance Incorporating uncertainty in RL can even improve the expected performance for concrete MDPs in many practical and industrial environments where exploration is expensive and only allowed within a small range. The available amount of data is hence small and exploration takes place in an in part extremely unsymmetrical way. Data is particularly collected in areas where the operation is already preferable. Many of the insufficiently explored so-called on-border states are undesirable in expectation but might by chance give a high reward in the singular case. If the border is sufficiently large this might happen at least a few times and such an outlier might suggest a high expected reward. Note that in general the size of the border region will increase with the dimensionality of the problem. Carefully incorporating uncertainty avoids the agent to prefer those outliers in its final operation. We applied the joint iteration on a simple artificial archery benchmark with the border phenomenon . The state space represents an archer s target figure 7 . Starting in the target s middle the archer has the possibility to move the arrowhead in all four directions and to shoot the arrow. The exploration has been performed randomly with short episodes. The dynamics were simulated with two different underlying MDPs. The arrowhead s moves are either stochastic 25 percent chance of choosing another action or deterministic. The event of making a hit after shooting the arrow is stochastic in both settings. The highest .

TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.