首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A simulation study was conducted to assess the influence of differences in the length of individual testing periods on estimates of (co)variance components of a random regression model for daily feed intake of growing pigs performance tested between 30 and 100 kg live weight. A quadratic polynomial in days on test with fixed regressions for sex, random regressions for additive genetic and permanent environmental effects and a constant residual variance was used for a bivariate simulation of feed intake and daily gain. (Co)variance components were estimated for feed intake only by means of a Bayesian analysis using Gibbs sampling and restricted maximum likelihood (REML). A single trait random regression model analogous to the one used for data simulation was used to analyse two versions of the data: full data sets with 18 weekly means of feed intake per animal and reduced data sets with the individual length of testing periods determined when tested animals reached 100 kg live weight. Only one significant difference between estimates from full and reduced data (REML estimate of genetic covariance between linear and quadratic regression parameters) and two significant differences from expected values (Gibbs estimates of permanent environmental variance of quadratic regression parameters) occurred. These differences are believed to be negligible, as the number lies within the expected range of type I error when testing at the 5% level. The course of test day variances calculated from estimates of additive genetic and permanent environmental covariance matrices also supports the conclusion that no bias in estimates of (co)variance components occurs due to the individual length of testing periods of performance‐tested growing pigs. A lower number of records per tested animal only results in more variation among estimates of (co)variance components from reduced compared with full data sets. Compared with the full data, the effective sample size of Gibbs samples from the reduced data decreased to 18% for residual variance and increased up to five times for other (co)variances. The data structure seems to influence the mixing of Gibbs chains.  相似文献   

2.
A hierarchical model for inferring the parameters of the joint distribution of a trait measured longitudinally and another assessed cross-sectionally, when selection has been applied to the cross-sectional trait, is presented. Distributions and methods for a Bayesian implementation via Markov Chain Monte Carlo procedures are discussed for the case where information about the selection criterion is available for all the individuals, but longitudinal records are available only in the later generations. Alternative specifications of the residual covariance structure are suggested. The procedure is illustrated with an analysis of correlated responses in growth curve parameters in a population of rabbits selected for increased growth rate. Results agree with those obtained in a previous study using both selected and control populations. The high correlation between samples indicates slow mixing, resulting in small effective sample sizes and high Monte Carlo standard errors.  相似文献   

3.
Multiple‐trait and random regression models have multiplied the number of equations needed for the estimation of variance components. To avoid inversion or decomposition of a large coefficient matrix, we propose estimation of variance components by Monte Carlo expectation maximization restricted maximum likelihood (MC EM REML) for multiple‐trait linear mixed models. Implementation is based on full‐model sampling for calculating the prediction error variances required for EM REML. Performance of the analytical and the MC EM REML algorithm was compared using a simulated and a field data set. For field data, results from both algorithms corresponded well even with one MC sample within an MC EM REML round. The magnitude of the standard errors of estimated prediction error variances depended on the formula used to calculate them and on the MC sample size within an MC EM REML round. Sampling variation in MC EM REML did not impair the convergence behaviour of the solutions compared with analytical EM REML analysis. A convergence criterion that takes into account the sampling variation was developed to monitor convergence for the MC EM REML algorithm. For the field data set, MC EM REML proved far superior to analytical EM REML both in computing time and in memory need.  相似文献   

4.
The accessibility of Markov Chain Monte Carlo (MCMC) methods for statistical inference have improved with the advent of general purpose software. This enables researchers with limited statistical skills to perform Bayesian analysis. Using MCMC sampling to do statistical inference requires convergence of the MCMC chain to its stationary distribution. There is no certain way to prove convergence; it is only possible to ascertain when convergence definitely has not been achieved. These methods are rather subjective and not implemented as automatic safeguards in general MCMC software. This paper considers a pragmatic approach towards assessing the convergence of MCMC methods illustrated by a Bayesian analysis of the Hui–Walter model for evaluating diagnostic tests in the absence of a gold standard. The Hui–Walter model has two optimal solutions, a property which causes problems with convergence when the solutions are sufficiently close in the parameter space. Using simulated data we demonstrate tools to assess the convergence and mixing of MCMC chains using examples with and without convergence. Suggestions to remedy the situation when the MCMC sampler fails to converge are given. The epidemiological implications of the two solutions of the Hui–Walter model are discussed.  相似文献   

5.
The Markov chain Monte Carlo (MCMC) strategy provides remarkable flexibility for fitting complex hierarchical models. However, when parameters are highly correlated in their posterior distributions and their number is large, a particular MCMC algorithm may perform poorly and the resulting inferences may be affected. The objective of this study was to compare the efficiency (in terms of the asymptotic variance of features of posterior distributions of chosen parameters, and in terms of computing cost) of six MCMC strategies to sample parameters using simulated data generated with a reaction norm model with unknown covariates as an example. The six strategies are single-site Gibbs updates (SG), single-site Gibbs sampler for updating transformed (a priori independent) additive genetic values (TSG), pairwise Gibbs updates (PG), blocked (all location parameters are updated jointly) Gibbs updates (BG), Langevin-Hastings (LH) proposals, and finally Langevin-Hastings proposals for updating transformed additive genetic values (TLH). The ranking of the methods in terms of asymptotic variance is affected by the degree of the correlation structure of the data and by the true values of the parameters, and no method comes out as an overall winner across all parameters. TSG and BG show very good performance in terms of asymptotic variance especially when the posterior correlation between genetic effects is high. In terms of computing cost, TSG performs best except for dispersion parameters in the low correlation scenario where SG was the best strategy. The two LH proposals could not compete with any of the Gibbs sampling algorithms. In this study it was not possible to find an MCMC strategy that performs optimally across the range of target distributions and across all possible values of parameters. However, when the posterior correlation between parameters is high, TSG, BG and even PG show better mixing than SG.  相似文献   

6.
A Hamiltonian Monte Carlo algorithm is a Markov chain Monte Carlo method, and the method has a potential to improve estimating parameters effectively. Hamiltonian Monte Carlo is based on Hamiltonian dynamics, and it follows Hamilton's equations, which are expressed as two differential equations. In the sampling process of Hamiltonian Monte Carlo, a numerical integration method called leapfrog integration is used to approximately solve Hamilton's equations, and the integration is required to set the number of discrete time steps and the integration stepsize. These two parameters require some amount of tuning and calibration for effective sampling. In this study, we applied the Hamiltonian Monte Carlo method to animal breeding data and identified the optimal tunings of leapfrog integration for normal and inverse chi-square distributions. Then, using real pig data, we revealed the properties of the Hamiltonian Monte Carlo method with the optimal tuning by applying models including variance explained by pedigree information or genomic information. Compared with the Gibbs sampling method, the Hamiltonian Monte Carlo method had superior performance in both models. We have provided the source codes of this method written in the Fortran language at https://github.com/A-ARAKAWA/HMC .  相似文献   

7.
First parity calving difficulty scores from Italian Piemontese cattle were analysed using a threshold mixed effects model. The model included the fixed effects of age of dam and sex of calf and their interaction and the random effects of sire, maternal grandsire, and herd‐year‐season. Covariances between sire and maternal grandsire effects were modelled using a numerator relationship matrix based on male ancestors. Field data consisted of 23 953 records collected between 1989 and 1998 from 4741 herd‐year‐seasons. Variance and covariance components were estimated using two alternative approximate marginal maximum likelihood (MML) methods, one based on expectation‐maximization (EM) and the other based on Laplacian integration. Inferences were compared to those based on three separate runs or sequences of Markov Chain Monte Carlo (MCMC) sampling in order to assess the validity of approximate MML estimates derived from data with similar size and design structure. Point estimates of direct heritability were 0.24, 0.25 and 0.26 for EM, Laplacian and MCMC (posterior mean), respectively, whereas corresponding maternal heritability estimates were 0.10, 0.11 and 0.12, respectively. The covariance between additive direct and maternal effects was found to be not different from zero based on MCMC‐derived confidence sets. The conventional joint modal estimates of sire effects and associated standard errors based on MML estimates of variance and covariance components differed little from the respective posterior means and standard deviations derived from MCMC. Therefore, there may be little need to pursue computation‐intensive MCMC methods for inference on genetic parameters and genetic merits using conventional threshold sire and maternal grandsire models for large datasets on calving ease.  相似文献   

8.
Random regression models were applied to eight conformation traits (i.e. stature, rump angle, thurl width, rear leg set, rear udder width, rear udder height, udder depth, and fore udder attachment) of Holstein cows from the northeastern United States. Covariates for fixed and random regressions included age and age‐squared for six of the traits, and two additional covariates were included for rear udder width and rear udder height. Other effects in the model were herd—year‐classifier and months in milk. Fixed covariates were nested within year of birth of the cow. Variance components were estimated using Bayesian theory and Gibbs sampling procedure. Estimated breeding values from the random regression models were compared to two single trait models. The first model utilized only the first classification record of the cow in first lactation, and the second model utilized all classifications of the cow in a simple repeatability model. Additive genetic merit for conformation traits changed with the age of the animal. Some traits were affected by age more than others. The single trait, single record model and the simple repeatability model were not appropriate in predicting breeding values at mature ages for rear udder width and rear udder height.  相似文献   

9.
Genetic evaluation of Icelandic horses is currently based on results from breeding field tests where riding ability and conformation of the horses are evaluated over the course of 1-2 days. Only a small part of registered horses attend these field tests, and it can be assumed that these are not a random sample of the population. In this study, the trait test status was introduced, describing whether a horse was assessed in a breeding field test. This trait was analysed to find out whether it has a genetic variation and how it correlates genetically to other traits in the breeding goal. Breeding field test data included 39,443 mares born in Iceland in 1990-2001, of which 7431 were assessed in the period 1994-2007. The trait was defined in relation to age, gender and stud of horses. Variance and covariance components were estimated using the Markov Chain Monte Carlo method by applying the Gibbs sampler procedure in the DMU program. Three multivariate analyses were performed where the test status trait was analysed with breeding field test traits. Animal models and sire models were applied. Based on estimated heritabilities (0.51-0.67) and genetic correlations (0.00-0.87), the test status trait showed significant genetic variation and was strongly correlated to some traits. The test status trait reflects preselection in the breeding field test traits and should be included in the genetic evaluation to enhance the procedure, reduce selection bias and increase accuracy of the estimation.  相似文献   

10.
A two‐dimensional random regression model with regressions on days in milk (DIM) and parity number was applied to lactational milk yields in Chinese Simmental cattle. Random regressions were fitted for additive genetic and permanent environmental effects using a two‐dimensional polynomial on DIM and parity number. A total of 4340 lactational milk yields from Chinese Simmental cattle which calved between 1980 and early 2000 were used in this study. Variance components were estimated using Bayesian methodology via Gibbs sampling. Variances of random regression coefficients associated with all terms of the polynomials were significant. A covariance function showed that heritabilities of lactational milk yields between 200 and 400 DIM over parities varied between 0.25 and 0.45. Heritabilities of 305‐day milk yields from 1st to 6–8th parities were 0.28, 0.30, 0.32 0.32, 0.32, and 0.31, respectively. Ratios of permanent environment variances to total variances at each DIM were greater than corresponding heritabilities. Generally, genetic correlations were higher between lactational milk yields with similar DIM and parity number.  相似文献   

11.
Markov chain Monte Carlo (MCMC) enables fitting complex hierarchical models that may adequately reflect the process of data generation. Some of these models may contain more parameters than can be uniquely inferred from the distribution of the data, causing non‐identifiability. The reaction norm model with unknown covariates (RNUC) is a model in which unknown environmental effects can be inferred jointly with the remaining parameters. The problem of identifiability of parameters at the level of the likelihood and the associated behaviour of MCMC chains were discussed using the RNUC as an example. It was shown theoretically that when environmental effects (covariates) are considered as random effects, estimable functions of the fixed effects, (co)variance components and genetic effects are identifiable as well as the environmental effects. When the environmental effects are treated as fixed and there are other fixed factors in the model, the contrasts involving environmental effects, the variance of environmental sensitivities (genetic slopes) and the residual variance are the only identifiable parameters. These different identifiability scenarios were generated by changing the formulation of the model and the structure of the data and the models were then implemented via MCMC. The output of MCMC sampling schemes was interpreted in the light of the theoretical findings. The erratic behaviour of the MCMC chains was shown to be associated with identifiability problems in the likelihood, despite propriety of posterior distributions, achieved by arbitrarily chosen uniform (bounded) priors. In some cases, very long chains were needed before the pattern of behaviour of the chain may signal the existence of problems. The paper serves as a warning concerning the implementation of complex models where identifiability problems can be difficult to detect a priori. We conclude that it would be good practice to experiment with a proposed model and to understand its features before embarking on a full MCMC implementation.  相似文献   

12.
The aims of this study were to investigate the presence of genetic variation for susceptibility to pathogen-specific mastitis and to examine whether haplotypes of an identified quantitative trait locus with effect on unspecific mastitis resistance had different effects on specific mastitis pathogens. Bacteriological data on mastitis pathogens were obtained from the diagnostic laboratory at the Swedish National Veterinary Institute. The data were mainly from subclinical cases of mastitis but also clinical cases were included. Variance components were estimated for incidence of the six most frequent pathogens using Markov Chain Monte Carlo methodology via Gibbs sampling. Genetic variation for susceptibility to pathogen-specific mastitis was higher compared to estimates of general resistance to clinical mastitis in most other studies. However, because of the non-random nature of data collection, comparisons to other studies should be made by caution. The effect of haplotype on the risk of being infected by a given mastitis pathogen, relative to other pathogens, was studied using an allele substitution model. Although there were no significant haplotype substitution effects on the resistance to any of the six mastitis pathogens, there was a significant difference between the effects of two of the haplotypes regarding the risk of acquiring a Streptococcus dysgalactiae infection.  相似文献   

13.
Using spline functions (segmented polynomials) in regression models requires the knowledge of the location of the knots. Knots are the points at which independent linear segments are connected. Optimal positions of knots for linear splines of different orders were determined in this study for different scenarios, using existing estimates of covariance functions and an optimization algorithm. The traits considered were test‐day milk, fat and protein yields, and somatic cell score (SCS) in the first three lactations of Canadian Holsteins. Two ranges of days in milk (from 5 to 305 and from 5 to 365) were taken into account. In addition, four different populations of Holstein cows, from Australia, Canada, Italy and New Zealand, were examined with respect to first lactation (305 days) milk only. The estimates of genetic and permanent environmental covariance functions were based on single‐ and multiple‐trait test‐day models, with Legendre polynomials of order 4 as random regressions. A differential evolution algorithm was applied to find the best location of knots for splines of orders 4 to 7 and the criterion for optimization was the goodness‐of‐fit of the spline covariance function. Results indicated that the optimal position of knots for linear splines differed between genetic and permanent environmental effects, as well as between traits and lactations. Different populations also exhibited different patterns of optimal knot locations. With linear splines, different positions of knots should therefore be used for different effects and traits in random regression test‐day models when analysing milk production traits.  相似文献   

14.
This work focuses on the effects of variable amount of genomic information in the Bayesian estimation of unknown variance components associated with single‐step genomic prediction. We propose a quantitative criterion for the amount of genomic information included in the model and use it to study the relative effect of genomic data on efficiency of sampling from the posterior distribution of parameters of the single‐step model when conducting a Bayesian analysis with estimating unknown variances. The rate of change of estimated variances was dependent on the amount of genomic information involved in the analysis, but did not depend on the Gibbs updating schemes applied for sampling realizations of the posterior distribution. Simulation revealed a gradual deterioration of convergence rates for the locations parameters when new genomic data were gradually added into the analysis. In contrast, the convergence of variance components showed continuous improvement under the same conditions. The sampling efficiency increased proportionally to the amount of genomic information. In addition, an optimal amount of genomic information in variance–covariance matrix that guaranty the most (computationally) efficient analysis was found to correspond a proportion of animals genotyped ***0.8. The proposed criterion yield a characterization of expected performance of the Gibbs sampler if the analysis is subject to adjustment of the amount of genomic data and can be used to guide researchers on how large a proportion of animals should be genotyped in order to attain an efficient analysis.  相似文献   

15.
The purpose of this study is to present guidelines in selection of statistical and computing algorithms for variance components estimation when computing involves software packages. For this purpose two major methods are to be considered: residual maximal likelihood (REML) and Bayesian via Gibbs sampling. Expectation‐Maximization (EM) REML is regarded as a very stable algorithm that is able to converge when covariance matrices are close to singular, however it is slow. However, convergence problems can occur with random regression models, especially if the starting values are much lower than those at convergence. Average Information (AI) REML is much faster for common problems but it relies on heuristics for convergence, and it may be very slow or even diverge for complex models. REML algorithms for general models become unstable with larger number of traits. REML by canonical transformation is stable in such cases but can support only a limited class of models. In general, REML algorithms are difficult to program. Bayesian methods via Gibbs sampling are much easier to program than REML, especially for complex models, and they can support much larger datasets; however, the termination criterion can be hard to determine, and the quality of estimates depends on a number of details. Computing speed varies with computing optimizations, with which some large data sets and complex models can be supported in a reasonable time; however, optimizations increase complexity of programming and restrict the types of models applicable. Several examples from past research are discussed to illustrate the fact that different problems required different methods.  相似文献   

16.
This study was designed to: (i) estimate genetic parameters and breeding values for conception rates (CR) using the repeatability threshold model (RP‐THM) and random regression threshold models (RR‐THM); and (ii) compare covariance functions for modeling the additive genetic (AG) and permanent environmental (PE) effects in the RR‐THM. The CR was defined as the outcome of an insemination. A data set of 130 592 first‐lactation insemination records of 55 789 Thai dairy cows, calving between 1996 and 2011, was used in the analyses. All models included fixed effects of year × month of insemination, breed × day in milk to insemination class and age at calving. The random effects consisted of herd × year interaction, service sire, PE, AG and residual. Variance components were estimated using a Bayesian method via Gibbs sampling. Heritability estimates of CR ranged from 0.032 to 0.067, 0.037 to 0.165 and 0.045 to 0.218 for RR‐THM with the second, third and fourth‐order of Legendre polynomials, respectively. The heritability estimated from RP‐THM was 0.056. Model comparisons based on goodness of fit, predictive abilities, predicted service results of animal, and pattern of genetic parameter estimates, indicated that the model which fit the desired outcome of insemination was the RR‐THM with two regression coefficients.  相似文献   

17.
18.
Robust threshold models with multivariate Student's t or multivariate Slash link functions were employed to infer genetic parameters of clinical mastitis at different stages of lactation, with each cow defining a cluster of records. The robust fits were compared with that from a multivariate probit model via a pseudo‐Bayes factor and an analysis of residuals. Clinical mastitis records on 36 178 first‐lactation Norwegian Red cows from 5286 herds, daughters of 245 sires, were analysed. The opportunity for infection interval, going from 30 days pre‐calving to 300 days postpartum, was divided into four periods: (i) ?30 to 0 days pre‐calving; (ii) 1–30 days; (iii) 31–120 days; and (iv) 121–300 days of lactation. Within each period, absence or presence of clinical mastitis was scored as 0 or 1 respectively. Markov chain Monte Carlo methods were used to draw samples from posterior distributions of interest. Pseudo‐Bayes factors strongly favoured the multivariate Slash and Student's t models over the probit model. The posterior mean of the degrees of freedom parameter for the Slash model was 2.2, indicating heavy tails of the liability distribution. The posterior mean of the degrees of freedom for the Student's t model was 8.5, also pointing away from a normal liability for clinical mastitis. A residual was the observed phenotype (0 or 1) minus the posterior mean of the probability of mastitis. The Slash and Student's t models tended to have smaller residuals than the probit model in cows that contracted mastitis. Heritability of liability to clinical mastitis was 0.13–0.14 before calving, and ranged from 0.05 to 0.08 after calving in the robust models. Genetic correlations were between 0.50 and 0.73, suggesting that clinical mastitis resistance is not the same trait across periods, corroborating earlier findings with probit models.  相似文献   

19.
Multiple‐trait (MT) finite mixture random regression (MIX) model was applied using Bayesian methods to first lactation test‐day (TD) milk yield and somatic cell score (SCS) of Canadian Holsteins, allowing for heterogeneity of distributions with respect to days in milk (DIM) in lactation. The assumption was that the associations between patterns of variation in these traits and mastitis would allow revealing the hidden structure in the data distribution because of unknown health status of cows. The MIX model assumed separate means and residual co‐variance structures for two components in four intervals of lactation, in addition to fitting the fixed effect of herd‐test‐day, and fixed and random regressions with Legendre polynomials. Results indicated that the mixture model was superior to standard MT model, as supported by the Bayes factor. Approximately 20% of TD records were classified as originated from cows with a putative, sub‐clinical form of mastitis. The proportion of records from mastitic cows was the largest at the beginning of lactation. The MIX model exhibited different distributions of data from healthy and infected cows in different parts of lactation. Records from sick cows were characterized by larger (smaller) means for SCS (milk) and larger variances. Residual, and daily genetic and environmental correlations between milk and SCS were smaller from the MIX model when compared with MT estimates. Heritabilities of both traits differed significantly among records from healthy, sick and MT model estimates. Both models fitted milk records from healthy cows relatively well. The ability of the MT model in handling SCS records, measured by model residuals, was low, but improved substantially, however, where the data were allowed to be separated into two components in the MIX parameterization. Correlations between estimated breeding values (EBV) for sires from both models were very high for cumulative milk yield (>0.99) and slightly lower (0.95 in the interval from 5 to 45 DIM) for daily SCS. EBV for SCS from MT and MIX models were weakly correlated with posterior probability of sub‐clinical mastitis on the phenotypic scale.  相似文献   

20.
A Bayesian method was developed to handle QTL analyses of multiple experimental data of outbred populations with heterogeneity of variance between sexes for all random effects. The method employed a scaled reduced animal model with random polygenic and QTL allelic effects. A parsimonious model specification was applied by choosing assumptions regarding the covariance structure to limit the number of parameters to estimate. Markov chain Monte Carlo algorithms were applied to obtain marginal posterior densities. Simulation demonstrated that joint analysis of multiple environments is more powerful than separate single trait analyses of each environment. Measurements on broiler BW obtained from 2 experiments concerning growth efficiency and carcass traits were used to illustrate the method. The population consisted of 10 full-sib families from a cross between 2 broiler lines. Microsatellite genotypes were determined on generations 1 and 2, and phenotypes were collected on groups of generation 3 animals. The model included a polygenic correlation, which had a posterior mean of 0.70 in the analyses. The reanalysis agreed on the presence of a QTL in marker bracket MCW0058-LEI0071 accounting for 34% of the genetic variation in males and 24% in females in the growth efficiency experiment. In the carcass experiment, this QTL accounted for 19% of the genetic variation in males and 6% in females.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号