Artis Zelmenis's question on Reward Modeling
What if, instead of going with linear NN that mainly goes around (feedforward and backpropogation), we try NN as an actual >net<. A self contained system with several sub-NNs who does the thing described in the video, but many times mirrored and they feed their own result in each other. Something like brain areas with a task, and the area who will get it the best result will be the main 'executor' subduing other non-permanently. Like ever-changing (plasticity?) brain en small scale.
|Asked by:||Artis Zelmenis
OriginWhere was this question originally asked
|YouTube (comment link)|
|On video:||Training AI Without Writing A Reward Function, with Reward Modelling|
|Asked on Discord?||No|