Designing optimal training sets for genomic prediction using adversarial validation with probit regression |
| |
Authors: | Osval A Montesinos-López Kismiantini Abelardo Montesinos-López |
| |
Institution: | 1. Facultad de Telemática, Universidad de Colima, Colima, Colima, Mexico;2. Statistics Study Program, Universitas Negeri Yogyakarta, Yogyakarta, Indonesia;3. Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico |
| |
Abstract: | Genomic selection (GS) is a disruptive methodology that is revolutionizing animal and plant breeding. However, its practical implementation is challenging since many times there is a mismatch in the distribution of the training and testing sets. Adversarial validation is an approach popular in machine learning to detect and address the difference between the training and testing distributions. For this reason, the adversarial validation method in this research was implemented using probit regression to detect the mismatch in distributions and also to select an optimal training set. We evaluated the proposed method with 14 datasets, and the results were benchmarked regarding of using the whole reference population and simple random samples. We found that the proposed method is effective for detecting the mismatch in distributions and outperformed in prediction accuracy by 11.67% (in terms of mean square error) and by 5.35% (in terms of normalized mean square error) when the whole reference population was used as training sets. Also, in general, this outperformed some existing methods for optimal training designs in the context of GS. |
| |
Keywords: | adversarial validation genomic prediction mismatch in distributions optimal training set selection plant breeding probit regression |
|
|