Data preparation for machine learning