The API is consistent with GLM package. To fit a robust model, use the function,
rlm(formula, data, estimator; initial_scale=:mad), where,
formula: uses column symbols from the DataFrame data, for example, if
propertynames(data)=[:Y,:X1,:X2],then a valid formula is
@formula(Y ~ X1 + X2). An intercept is included by default.
data: a DataFrame which may contain missing values
estimator: chosen from
Supported loss functions are:
ρ(r) = ½ r², like ordinary OLS.
ρ(r) = |r|, non-differentiable estimator also know as Least absolute deviations. Prefer the
ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end, convex estimator that behaves as
L2cost for small residuals and
L1for large esiduals and outliers.
ρ(r) = √(1 + (r/c)²) - 1, smooth version of
ρ(r) = |r|/c - log(1 + |r|/c), smooth version of
ρ(r) = log(cosh(r/c)), smooth version of
ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²), smooth version of
ρ(r) = log(1+(r/c)²), non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.
ρ(r) = ½ (r/c)²/(1 + (r/c)²), non-convex and bounded estimator, it suppresses outliers more strongly.
ρ(r) = ½ (1 - exp(-(r/c)²)), non-convex and bounded estimator, it suppresses outliers more strongly.
ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end, non-convex and bounded estimator, it suppresses outliers more strongly and it is the prefered estimator for most cases.
ρ(r)is quadratic for
r/c < 2/3and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.
An estimator is constructed from an estimator type and a loss, e.g.
GeneralizedQuantileEstimator, the quantile should be specified with
τ (0.5 by default), e.g.
Many of the methods are consistent with GLM.
nobs: number of observations
dof_residual: degrees of freedom for residuals
dof: degrees of freedom of the model, defined by
nobs(m) - dof_residual(m)
coef: estimate of the coefficients in the model
predict: obtain predicted values of the dependent variable from the fitted model
nulldeviance: measure of the model (null model, respectively) fit
stderror: standard errors of the coefficients
confint: confidence intervals for the fitted coefficients
scale: the scale estimate from the model
workingweights: the weights for each observation from the robust estimate. Outliers have low weights
leverage: the vector of leverage score for each observation
vcov: estimated variance-covariance matrix of the coefficient estimates
Building upon GLM separation of the response and predictor objects, this package implements a new
RobustLinResp object to compute the residuals. There are currently two available predictor objects:
SparsePredChol (imported from GLM) and
SparsePredCG that use the iterative Conjugate Gradient methods,
lsqr! from the IterativeSolvers package that is faster and more accurate than Cholesky method for very large matrices. The predictor that is used depends on the model matrix type and the
method argument of the