But don't all these examples just come down to the quality of training? Sure if you added more examples of correct and incorrect, you would get the expected behavior. I guess a possible caveat to machine learning is that, in order to get desirable results, you need to generate a potentially infinite permutations of correct / incorrect examples (or at least to enough of a fidelity to cover the behavior you expect), and maybe that's not feasible for advanced models.

I wonder if there's been any papers on machine learning training data synthesis.

