首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Designing optimal training sets for genomic prediction using adversarial validation with probit regression
Authors:Osval A Montesinos-López  Kismiantini  Abelardo Montesinos-López
Institution:1. Facultad de Telemática, Universidad de Colima, Colima, Colima, Mexico;2. Statistics Study Program, Universitas Negeri Yogyakarta, Yogyakarta, Indonesia;3. Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
Abstract:Genomic selection (GS) is a disruptive methodology that is revolutionizing animal and plant breeding. However, its practical implementation is challenging since many times there is a mismatch in the distribution of the training and testing sets. Adversarial validation is an approach popular in machine learning to detect and address the difference between the training and testing distributions. For this reason, the adversarial validation method in this research was implemented using probit regression to detect the mismatch in distributions and also to select an optimal training set. We evaluated the proposed method with 14 datasets, and the results were benchmarked regarding of using the whole reference population and simple random samples. We found that the proposed method is effective for detecting the mismatch in distributions and outperformed in prediction accuracy by 11.67% (in terms of mean square error) and by 5.35% (in terms of normalized mean square error) when the whole reference population was used as training sets. Also, in general, this outperformed some existing methods for optimal training designs in the context of GS.
Keywords:adversarial validation  genomic prediction  mismatch in distributions  optimal training set selection  plant breeding  probit regression
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号