Manual

Installation

The package can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add RobustModels

Or, equivalently, via the Pkg API:

julia> import Pkg; Pkg.add("RobustModels")

Fitting robust models

The API is consistent with GLM package. To fit a robust model, use the function, rlm(formula, data, estimator; initial_scale=:mad), where,

formula: uses column symbols from the DataFrame data, for example, if propertynames(data)=[:Y,:X1,:X2],then a valid formula is @formula(Y ~ X1 + X2). An intercept is included by default.
data: a DataFrame which may contain missing values
estimator: chosen from

Supported loss functions are:

L2Loss: ρ(r) = ½ r², like ordinary OLS.
L1Loss: ρ(r) = |r|, non-differentiable estimator also know as Least absolute deviations. Prefer the QuantileRegression solver.
HuberLoss: ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end, convex estimator that behaves as L2 cost for small residuals and L1 for large esiduals and outliers.
L1L2Loss: ρ(r) = √(1 + (r/c)²) - 1, smooth version of HuberLoss.
FairLoss: ρ(r) = |r|/c - log(1 + |r|/c), smooth version of HuberLoss.
LogcoshLoss: ρ(r) = log(cosh(r/c)), smooth version of HuberLoss.
ArctanLoss: ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²), smooth version of HuberLoss.
CauchyLoss: ρ(r) = log(1+(r/c)²), non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.
GemanLoss: ρ(r) = ½ (r/c)²/(1 + (r/c)²), non-convex and bounded estimator, it suppresses outliers more strongly.
WelschLoss: ρ(r) = ½ (1 - exp(-(r/c)²)), non-convex and bounded estimator, it suppresses outliers more strongly.
TukeyLoss: ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end, non-convex and bounded estimator, it suppresses outliers more strongly and it is the preferred estimator for most cases.
YohaiZamarLoss: ρ(r) is quadratic for r/c < 2/3 and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.

An estimator is constructed from an estimator type and a loss, e.g. MEstimator{TukeyLoss}().

For GeneralizedQuantileEstimator, the quantile should be specified with τ (0.5 by default), e.g. GeneralizedQuantileEstimator{HuberLoss}(0.2).

Methods applied to fitted models

Many of the methods are consistent with GLM.

nobs: number of observations
dof_residual: degrees of freedom for residuals
dof: degrees of freedom of the model, defined by nobs(m) - dof_residual(m)
coef: estimate of the coefficients in the model
predict : obtain predicted values of the dependent variable from the fitted model
deviance/nulldeviance: measure of the model (null model, respectively) fit
stderror: standard errors of the coefficients
confint: confidence intervals for the fitted coefficients
scale: the scale estimate from the model
workingweights: the weights for each observation from the robust estimate. Outliers have low weights
leverage: the vector of leverage score for each observation
vcov: estimated variance-covariance matrix of the coefficient estimates

Separation of response object and predictor object

Building upon GLM separation of the response and predictor objects, this package implements a new RobustLinResp object to compute the residuals. There are currently two available predictor objects: DensePredChol/SparsePredChol (imported from GLM) and DensePredCG/SparsePredCG that use the iterative Conjugate Gradient methods, cg! and lsqr! from the IterativeSolvers package that is faster and more accurate than Cholesky method for very large matrices. The predictor that is used depends on the model matrix type and the method argument of the fit/fit!/rlm methods.