This Viewpoint discusses the type and amount of data needed for machine learning models to accurately predict diagnoses and treatment outcomes at the individual patient level.