Artis Zelmenis's question on Reward Modeling
From Stampy's Wiki
Artis Zelmenis's question on Reward Modeling id:UgxxG92UOJ7wO76Ud254AaABAg
What if, instead of going with linear NN that mainly goes around (feedforward and backpropogation), we try NN as an actual >net<. A self contained system with several sub-NNs who does the thing described in the video, but many times mirrored and they feed their own result in each other. Something like brain areas with a task, and the area who will get it the best result will be the main 'executor' subduing other non-permanently. Like ever-changing (plasticity?) brain en small scale.
Asked by: | Artis Zelmenis () |
OriginWhere was this question originally asked |
YouTube (comment link) |
On video: | Training AI Without Writing A Reward Function, with Reward Modelling |
Date: | 2019-12-13T18:05 |
Asked on Discord? | No |
Discussion