Beyond ‘lognormal versus gamma’: discrimination among error distributions for generalized linear models |
| |
Authors: | EJ Dick |
| |
Institution: | aNational Marine Fisheries Service, Southwest Fisheries Science Center, 110 Shaffer Road, Santa Cruz, CA 95060, USA bDepartment of Applied Mathematics and Statistics, Center for Stock Assessment Research, Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA |
| |
Abstract: | The process of model selection includes making an assumption about the distribution of ‘errors’ about the mean response. Generalized linear models (GLMs) offer considerable flexibility in this regard. However, graphical methods for identifying potential error distributions can fail to discriminate among sets of candidate error distributions. I examine an information-theoretic approach to this issue, which ranks candidate models (error distributions) using Akaike's information criterion (AIC). I evaluate the effectiveness of this technique using Monte Carlo simulation by generating pseudorandom data from five skewed distributions: lognormal, gamma, Weibull, log-logistic, and inverse Gaussian. I then fit each data set under all five distributional assumptions, and examine how well AIC identifies the distribution that generated the data. On the basis of the simulations, I suggest that AIC is effective at identifying the data-generating distribution, given moderate to large sample sizes. I then fit four candidate models to data drawn from a mixture of four distributions with common expectations and coefficients of variation (CVs). AIC did not show strong support for a particular candidate model given small samples of ‘mixed’ data, although larger samples selected the gamma distribution for CVs of 0.5 and 1.0, and the Weibull distribution for CVs of 1.5 and 2.0. Finally, I apply this technique in a GLM setting to several fisheries-independent and -dependent data sets to select the error distribution that is best supported by the data. Twenty-one out of 24 fisheries data sets examined showed strong support for one of the five candidate error distributions and the remaining moderate support for two. |
| |
Keywords: | Generalized linear model Likelihood function Akaike's information criterion Lognormal Gamma Weibull Log-logistic Inverse Gaussian distributions |
本文献已被 ScienceDirect 等数据库收录! |
|