Reinforcement learning techniques are provided that generate initial training data to refine a machine-learning model (e.g., a neural network). The techniques allow a machine-learning system to make a correlation between inputs and outputs, and analyze generated outputs to produce new training data that will produce better outputs. The techniques start by profiling a system (e.g., an operating system) and a mock model of the machine-learning model that provides random outputs or outputs according to a simple heuristic. The techniques can then use the outputs to adjust heuristics of the system to obtain a wide variety of performance reactions of the system. Evaluation of the profiling data of the system can be performed to distinguish go...