首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Major transition has occurred in recent years in statistical methods for analysis of linear mixed model data from analysis of variance (ANOVA) to likelihood-based methods. Prior to the early 1990s, most applications used some version of analysis of variance because computer software was either not available or not easy to use for likelihood-based methods. ANOVA is based on ordinary least squares computations, with adoptions for mixed models. Computer programs for such methodology were plagued with technical problems of estimability, weighting, and handling missing data. Likelihood-based methods mainly use a combination of residual maximum likelihood (REML) estimation of covariance parameters and generalized least squares (GLS) estimation of mean parameters. Software for REML/GLS methods became readily available early in the 1990s, but the methodology still is not universally embraced. Although many of the computational inadequacies have been overcome, conceptual problems remain. Also, technical problems with REML/GLS have emerged, such as the need for adjustments for effects due to estimating covariance parameters. This article attempts to identify the major problems with ANOVA, describe the problems which remain with REML/GLS, and discuss new problems with REML/GLS.  相似文献   

2.
Geostatistical estimates of a soil property by kriging are equivalent to the best linear unbiased predictions (BLUPs). Universal kriging is BLUP with a fixed‐effect model that is some linear function of spatial co‐ordinates, or more generally a linear function of some other secondary predictor variable when it is called kriging with external drift. A problem in universal kriging is to find a spatial variance model for the random variation, since empirical variograms estimated from the data by method‐of‐moments will be affected by both the random variation and that variation represented by the fixed effects. The geostatistical model of spatial variation is a special case of the linear mixed model where our data are modelled as the additive combination of fixed effects (e.g. the unknown mean, coefficients of a trend model), random effects (the spatially dependent random variation in the geostatistical context) and independent random error (nugget variation in geostatistics). Statisticians use residual maximum likelihood (REML) to estimate variance parameters, i.e. to obtain the variogram in a geostatistical context. REML estimates are consistent (they converge in probability to the parameters that are estimated) with less bias than both maximum likelihood estimates and method‐of‐moment estimates obtained from residuals of a fitted trend. If the estimate of the random effects variance model is inserted into the BLUP we have the empirical BLUP or E‐BLUP. Despite representing the state of the art for prediction from a linear mixed model in statistics, the REML–E‐BLUP has not been widely used in soil science, and in most studies reported in the soils literature the variogram is estimated with methods that are seriously biased if the fixed‐effect structure is more complex than just an unknown constant mean (ordinary kriging). In this paper we describe the REML–E‐BLUP and illustrate the method with some data on soil water content that exhibit a pronounced spatial trend.  相似文献   

3.
Soil scientists often use prediction models to obtain values at unsampled locations. The spatial variation in the soil is best captured by using the empirical best linear unbiased predictor (EBLUP) based on a restricted maximum likelihood (REML) approach that efficiently exploits available data on both mean trends and correlation structures. We proposed a practical two‐step implementation of the REML approach for model‐based kriging, exemplified by predicting soil organic carbon (SOC) concentrations in mineral soils in Estonia from the large‐scale digital soil map information and a previously established prediction model. The prediction model was a linear mixed model with soil type, physical clay content (particle size < 0.01 mm) and A‐horizon thickness as fixed effects and site, transect, plot, year, year‐transect random intercepts and site‐specific random slopes for clay content. We used only the site‐specific intercept EBLUPs for estimating spatial correlation parameters as they described most of the variation in the random effects (86.8%). Fitting an exponential correlation model to these EBLUPs resulted in an estimated range of 10.5 km and the estimated proportion of the variance from the nugget effect was 0.23. The results of a simulation study showed a downwards bias that decreased with sample size. The results were validated through an external dataset, resulting in root mean square errors (RMSE) of 1.06 and 1.07% for the two‐step approach for kriging and the model with only fixed effects (no kriging), respectively. These results indicate that using the two‐step approach for kriging may improve prediction.  相似文献   

4.
Spatially nested sampling and the associated nested analysis of variance by spatial scale is a well-established methodology for the exploratory investigation of soil variation over multiple, disparate scales. The variance components that can be estimated this way can be accumulated to approximate the variogram. This allows us to identify the important scales of variation, and the general form of the spatial dependence, in order to plan more detailed sampling by design-based or model-based methods. Implicit in the standard analyses of nested sample data is the assumption of homogeneity in the variance, i.e. that all variations from sub-station means at some scale represent a random variable of uniform variance. If this assumption fails then the comparable assumption of stationarity in the variance, which is an important assumption in geostatistics, will also be implausible. However, data from nested sampling may be analysed with a linear mixed model in which the variance components are parameters which can be estimated by residual maximum likelihood (REML). Within this framework it is possible to propose an alternative variance parameterization in which the variance depends on some auxiliary variable, and so is not generally homogeneous. In this paper we demonstrate this approach, using data from nested sampling of chemical and biogeochemical soil properties across a region in central England, and use land use as our auxiliary variable to model non-homogeneous variance components. We show how the REML analysis allows us to make inferences about the need for a non-homogeneous model. Variances of soil pH and cation exchange capacity at different scales differ between these land uses, but a homogeneous variance model is preferable to such non-homogeneous models for the variance of soil urease activity at standard concentrations of urea.  相似文献   

5.
Variograms of soil properties are usually obtained by estimating the variogram for distinct lag classes by the method‐of‐moments and fitting an appropriate model to the estimates. An alternative is to fit a model by maximum likelihood to data on the assumption that they are a realization of a multivariate Gaussian process. This paper compares the two using both simulation and real data. The method‐of‐moments and maximum likelihood were used to estimate the variograms of data simulated from stationary Gaussian processes. In one example, where the simulated field was sampled at different intensities, maximum likelihood estimation was consistently more efficient than the method‐of‐moments, but this result was not general and the relative performance of the methods depends on the form of the variogram. Where the nugget variance was relatively small and the correlation range of the data was large the method‐of‐moments was at an advantage and likewise in the presence of data from a contaminating distribution. When fields were simulated with positive skew this affected the results of both the method‐of‐moments and maximum likelihood. The two methods were used to estimate variograms from actual metal concentrations in topsoil in the Swiss Jura, and the variograms were used for kriging. Both estimators were susceptible to sampling problems which resulted in over‐ or underestimation of the variance of three of the metals by kriging. For four other metals the results for kriging using the variogram obtained by maximum likelihood were consistently closer to the theoretical expectation than the results for kriging with the variogram obtained by the method‐of‐moments, although the differences between the results using the two approaches were not significantly different from each other or from expectation. Soil scientists should use both procedures in their analysis and compare the results.  相似文献   

6.
In a spatial regression context, scientists are often interested in a physical interpretation of components of the parametric covariance function. For example, spatial covariance parameter estimates in ecological settings have been interpreted to describe spatial heterogeneity or “patchiness” in a landscape that cannot be explained by measured covariates. In this article, we investigate the influence of the strength of spatial dependence on maximum likelihood (ML) and restricted maximum likelihood (REML) estimates of covariance parameters in an exponential-with-nugget model, and we also examine these influences under different sampling designs—specifically, lattice designs and more realistic random and cluster designs—at differing intensities of sampling (n=144 and 361). We find that neither ML nor REML estimates perform well when the range parameter and/or the nugget-to-sill ratio is large—ML tends to underestimate the autocorrelation function and REML produces highly variable estimates of the autocorrelation function. The best estimates of both the covariance parameters and the autocorrelation function come under the cluster sampling design and large sample sizes. As a motivating example, we consider a spatial model for stream sulfate concentration.  相似文献   

7.
A gene-by-gene mixed model analysis is a useful statistical method for assessing significance for microarray gene differential expression. While a large amount of data on thousands of genes are collected in a microarray experiment, the sample size for each gene is usually small, which could limit the statistical power of this analysis. In this report, we introduce an empirical Bayes (EB) approach for general variance component models applied to microarray data. Within a linear mixed model framework, the restricted maximum likelihood (REML) estimates of variance components of each gene are adjusted by integrating information on variance components estimated from all genes. The approach starts with a series of single-gene analyses. The estimated variance components from each gene are transformed to the “ANOVA components”. This transformation makes it possible to independently estimate the marginal distribution of each “ANOVA component.” The modes of the posterior distributions are estimated and inversely transformed to compute the posterior estimates of the variance components. The EB statistic is constructed by replacing the REML variance estimates with the EB variance estimates in the usual t statistic. The EB approach is illustrated with a real data example which compares the effects of five different genotypes of male flies on post-mating gene expression in female flies. In a simulation study, the ROC curves are applied to compare the EB statistic and two other statistics. The EB statistic was found to be the most powerful of the three. Though the null distribution of the EB statistic is unknown, a t distribution may be used to provide conservative control of the false positive rate.  相似文献   

8.
Kriging is a standard tool in the environmental sciences for spatial prediction from limited sample data, subject to the assumption of intrinsic stationarity, made about the underlying spatially correlated random function. It is generally well understood how the assumption of stationarity in the mean can be relaxed within the linear mixed model framework, using residual maximum likelihood to estimate variance parameters for the random effects. The Best Linear Unbiased Predictor (BLUP) is equivalent to the kriging predictor in these circumstances. However, nonstationarity in the variance is a harder problem to solve. Stationarity assumptions are necessary if the spatial covariance of a random process is to be estimated from the single realization which nature provides. However, they are not always plausible for variables arising from processes in complex landscapes across contrasting topography and geology.  相似文献   

9.
R. Kerry  M.A. Oliver 《Geoderma》2007,140(4):383-396
It has been generally accepted that the method of moments (MoM) variogram, which has been widely applied in soil science, requires about 100 sites at an appropriate interval apart to describe the variation adequately. This sample size is often larger than can be afforded for soil surveys of agricultural fields or contaminated sites. Furthermore, it might be a much larger sample size than is needed where the scale of variation is large. A possible alternative in such situations is the residual maximum likelihood (REML) variogram because fewer data appear to be required. The REML method is parametric and is considered reliable where there is trend in the data because it is based on generalized increments that filter trend out and only the covariance parameters are estimated. Previous research has suggested that fewer data are needed to compute a reliable variogram using a maximum likelihood approach such as REML, however, the results can vary according to the nature of the spatial variation. There remain issues to examine: how many fewer data can be used, how should the sampling sites be distributed over the site of interest, and how do different degrees of spatial variation affect the data requirements? The soil of four field sites of different size, physiography, parent material and soil type was sampled intensively, and MoM and REML variograms were calculated for clay content. The data were then sub-sampled to give different sample sizes and distributions of sites and the variograms were computed again. The model parameters for the sets of variograms for each site were used for cross-validation. Predictions based on REML variograms were generally more accurate than those from MoM variograms with fewer than 100 sampling sites. A sample size of around 50 sites at an appropriate distance apart, possibly determined from variograms of ancillary data, appears adequate to compute REML variograms for kriging soil properties for precision agriculture and contaminated sites.  相似文献   

10.
This paper provides a framework for estimating the effective sample size in a spatial regression model context when the data have been sampled using a line transect scheme and there is an evident serial correlation due to the chronological order in which the observations were collected. We propose a linear regression model with a partially linear covariance structure to address the computation of the effective sample size when spatial and serial correlations are present. A recursive algorithm is described to separately estimate the linear and nonlinear parameters involved in the covariance structure. The kriging equations are also presented to explore the kriging variance between our proposal and a typical spatial regression model. An application in the context of marine macroalgae, which motivated the present work, is also presented.  相似文献   

11.
The Bayesian maximum entropy (BME) method is a valuable tool, with rigorous theoretical underpinnings, with which to predict with soft (imprecise) data. The methodology uses a general knowledge base to derive a joint prior distribution of the data and the prediction by the criterion of maximum entropy; the hard (precise) and soft data are then processed using this prior distribution to yield a posterior distribution that provides the BME prediction. The general knowledge base commonly consists of the mean and covariance functions, which may be extracted from the data. The common method for extracting the mean function from the data is a generalized least squares (GLS) approach. However, when the soft data take the form of intervals of plausible values, this method can result in errors in the BME predictions. This paper suggests a maximum likelihood (ML) method for fitting the local mean. The two methods are compared in terms of their predictions, firstly on simulated random fields and then on a case study to predict the depth of soil using some censored data. The results show that the ML method can result in more accurate BME predictions; the degree of improvement over the GLS method depends on the parameters of the spatial covariance model.  相似文献   

12.
Heritability quantifies the extent to which a physical characteristic is passed from one generation to the next. From a statistical perspective, heritability is the proportion of the phenotypic variance attributable to (additive) genetic effects and is equal to a function of variance components in linear mixed models. Relying on normal distribution assumptions, one can compute exact confidence intervals for heritability using a pivotal quantity procedure. Alternatively, large-sample properties of the restricted maximum likelihood (REML) estimator can be used to construct asymptotic confidence intervals for heritability. Exact and asymptotic intervals are compared loineye muscle area measurements and balanced one-way random effects models having groups of correlated responses. In some cases the two interval methods yield vastly different results and the REML-based confidence interval does not maintain the nomiral coverate value even for seemingly large sample sizes. For finite sample size applications, the validity of the REML-based procedure depends on the correlation structure of the data.  相似文献   

13.
If we wish to describe the coregionalization of two or more soil properties for estimation by cokriging then we must estimate and model their auto‐ and cross‐variogram(s). The conventional estimates of these variograms, obtained by the method‐of‐moments, are unduly affected by outlying data which inflate the variograms and so also the estimates of the error variance of cokriging predictions. Robust estimators are less affected. Robust estimators of the auto‐variogram and the pseudo cross‐variogram have previously been proposed and used successfully, but the multivariate problem of estimating the cross‐variogram robustly has not yet been tackled. Two robust estimators of the cross‐variogram are proposed. These use covariance estimators with good robustness properties. The robust estimators of the cross‐variogram proved more resistant to outliers than did the method‐of‐moments estimator when applied to simulated fields which were then contaminated. Organic carbon and water content of the soil was measured at 256 sites on a transect and the method‐of‐moments estimator, and the two robust estimators, were used to estimate the auto‐variograms and cross‐variogram from a prediction subset of 156 sites. The data on organic carbon included a few outliers. The method‐of‐moments estimator returned larger values of the auto‐ and cross‐variograms than did either robust estimator. The organic carbon content at the 100 validation sites on the transect was estimated by cokriging from the prediction data plus a set of variograms fitted to the method‐of‐moments estimates and two sets of variograms fitted to the robust estimates. The ratio of the actual squared prediction error to the cokriging estimate of the error variance was computed at each validation site. These results showed that cokriging using variograms obtained by the method‐of‐moments estimator overestimated the error variance of the predictions. By contrast, cokriging with the robustly estimated variograms gave reliable estimates of the error variance of the predictions.  相似文献   

14.
The value of nested sampling for exploring the spatial structure of univariate variation of the soil has been demonstrated in several studies and applied to practical problems. This paper shows how the method can be extended to the multivariate case. While the extension is simple in theory, in practice the direct estimation of covariance components by equating mean‐square matrices with their expectation will often lead to estimates that are not positive semidefinite. This paper discusses solutions to this problem for balanced and unbalanced sample designs. In the balanced case there is a residual maximum likelihood (REML) estimator that will find estimates of covariance components that maximize an overall likelihood on the condition that all components are positive semidefinite (p.s.d.). This is possible because the condition is met if the differences of successive mean‐square matrices are positive semidefinite, and this constraint can be incorporated into an algorithm. This does not hold for unbalanced designs. In this paper the problem was solved for unbalanced designs by scaling covariance components that were not p.s.d. to the nearest p.s.d. matrix according to a Euclidean distance. These methods were applied to data from three surveys, two with balanced and one with unbalanced sampling. Different patterns of scale‐dependence of the correlation of soil properties were found. For example, at Ginninderra Experimental Station in Australia the soil water content and bulk density were correlated significantly, with the correlation increasing with distance to 56 m, but at longer distances the properties were not significantly correlated. By contrast, the pH of the soil and the available P content showed correlation that increased with distance. The implications of these results for planning more detailed sampling, both for prediction and for investigation of processes, are discussed.  相似文献   

15.
Bayesian methods seem well adapted to dynamic system models in general and to crop models in particular, because there is in general prior information about parameter values. The usefulness of a Bayesian approach has often been pointed out, but actual applications are rather rare. A major difficulty is including the elements of the covariance matrix of model errors in the treatment. We treat the specific case of balanced data and an unstructured covariance matrix. In our particular case this is a 3 × 3 matrix. We illustrate two methods for deriving a sample from the joint posterior density for the crop model parameters and the error covariance matrix parameters. The first method is based on importance sampling, the second on Metropolis within Gibbs sampling. We derive an instrumental density for the former and a proposal density for the latter which are adapted to this type of model and data. Both algorithms work well and they give very similar results. The example concerns a model for sunflowers during rapid leaf growth. The ultimate goal is to use the model as a decision aid in predicting disease risk.  相似文献   

16.
The Matérn correlation function provides great flexibility for modeling spatially correlated random processes in two dimensions, in particular via a smoothness parameter, whose estimation allows data to determine the degree of smoothness of a spatial process. The extension to include anisotropy provides a very general and flexible class of spatial covariance functions that can be used in a model-based approach to geostatistics, in which parameter estimation is achieved via REML and prediction is within the E-BLUP framework. In this article we develop a general class of linear mixed models using an anisotropic Matérn class with an extended metric. The approach is illustrated by application to soil salinity data in a rice-growing field in Australia, and to fine-scale soil pH data. It is found that anisotropy is an important aspect of both datasets, emphasizing the value of a straightforward and accessible approach to modeling anisotropy.  相似文献   

17.
The AMMI/GGE model can be used to describe a two-way table of genotype–environment means. When the genotype–environment means are independent and homoscedastic, ordinary least squares (OLS) gives optimal estimates of the model. In plant breeding, the assumption of independence and homoscedasticity of the genotype–environment means is frequently violated, however, such that generalized least squares (GLS) estimation is more appropriate. This paper introduces three different GLS algorithms that use a weighting matrix to take the correlation between the genotype–environment means as well as heteroscedasticity into account. To investigate the effectiveness of the GLS estimation, the proposed algorithms were implemented using three different weighting matrices, including (i) an identity matrix (OLS estimation), (ii) an approximation of the complete inverse covariance matrix of the genotype–environment means, and (iii) the complete inverse covariance matrix of the genotype–environment means. Using simulated data modeled on real experiments, the different weighting methods were compared in terms of the mean-squared error of the genotype–environment means, interaction effects, and singular vectors. The results show that weighted estimation generally outperformed unweighted estimation in terms of the mean-squared error. Furthermore, the effectiveness of the weighted estimation increased when the heterogeneity of the variances of the genotype–environment means increased.  相似文献   

18.
The cation exchange capacity (CEC) of soil is widely used for agricultural assessment as a measure of fertility and an indicator of structural stability; however, its measurement is time‐consuming. Although geostatistical methods have been used, a large number of samples must be collected. Using pedometric methods and incorporating easy‐to‐measure ancillary data into models have improved the efficiency of spatial prediction of soil CEC. However, mapping uncertainty has not been evaluated. In this study, we use an error budget procedure to quantify the relative contributions that model, input and covariate error make to prediction error of a digital map of CEC using gamma‐ray (γ‐ray) spectrometry and apparent electrical conductivity (ECa) data. The error budget uses empirical best linear unbiased prediction (E‐BLUP) and conditional simulation to produce numerous realizations of the data and their underlying errors. Linear mixed models (LMMs) estimated by residual maximum likelihood (REML) are used to create the prediction models. The combined error of model [5.07 cmol(+)/kg] and input error [12.88 cmol(+)/kg] is ~12.93 cmol(+)/kg, which is twice as large as the standard deviation of CEC [6.8 cmol(+)/kg]. The individual covariate errors caused by the γ‐ray [9.64 cmol(+)/kg] and EM error [8.55 cmol(+)/kg] were large. Preprocessing techniques to improve the quality of the γ‐ray data could be considered, whereas the EM error could be reduced by the use of a smaller sampling interval in particular near the edges of the study area and at pedoderm boundaries.  相似文献   

19.
As environmental monitoring data are collected successively in time, the data are suitable for sequential analysis. An earlier article proposed a refined sequential probability ratio test (SPRT) to test against a minimal relevanttrend, assuming no serial correlations and without modeling the spatial covariance matrix. As the model parameters are unknown in advance, a minimal number of observations(n min) is required for estimation prior to analysis. Leaving the spatial covariance matrix unstructured, n min increases if the number of sampling locations increases. Therefore, assumptions on the spatial covariance matrix are proposed, thereby reducing the number of nuisance parameters, thus reducingn min. This article studies. three simple types of spatial covariance matrix structures and derives an adjusted SPRT for each of these types. Furthermore, we examine the robustness against deviations from the assumed spatial covariance matrix structure. Simulation studies show that adjusted SPRTs can be derived rather easily and that they are in general robust against deviations from the assumed type of spatial covariance matrix. Sequential analysis of simulated data, which are based on monitoring data of bats in the Netherlands, illustrates the use of one of the derived SPRTs.  相似文献   

20.
Exact confidence intervals for variance components in linear mixed models rely heavily on normal distribution assumptions. If the random effects in the model are not normally distributed, then the true coverage probabilities of these conventional intervals may be erratic. In this paper we examine the performance of nonparametric bootstrap confidence intervals based on restricted maximum likelihood (REML) estimators. Asymptotic theory suggests that these intervals will achieve the nominal coverage value as the sample size increases. Incorporating a small-sample adjustment term in the bootstrap confidence interval construction process improves the performance of these intervals for small to intermediate sample sizes. Simulation studies suggest that the bootstrap standard method (with a transformation) and the bootstrap bias-corrected and accelerated (BC a ) method produce confidence intervals that have good coverage probabilities under a variety of distribution assumptions. For an interlaboratory comparison of mercury concentration in oyster tissue, a balanced one-way random effects model is used to quantify the proportion of the variation in mercury concentration that can be attributed to the laboratories. In this application the exact confidence interval using normal distribution theory produces misleading results and inferences based on nonparametric bootstrap procedures are more appropriate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号