This study addresses self-selection and heterogeneity issues inherent in measuring the efficacy of voluntary training programs. We exploit data collected from Indiana University’s introductory microeconomics course. In conjunction with their class, undergraduates were given the choice to participate in a voluntary training program called Collaborative Learning (CL), which is designed to encourage a self-discovery learning style. To address self-selection and heterogeneity in the effectiveness of CL, program evaluation methods were used to measure student performance. We find, amongst other things, that CL produces heterogeneous results e.g., the bottom 40 percentile of CL participants improved their performance the most, and that students at the higher end of the grade distribution achieve greater improvement in topic understanding. The latter is greater than can be associated with superior innate ability alone. Finally, parametric and non-parametric sensitivity analysis confirmed that the sign of the calculated treatment effects is robust to potential violations of the underlying assumptions.