Manual
Installation
The package can be installed with the Julia package manager. From the Julia REPL, type ]
to enter the Pkg REPL mode and run:
pkg> add RobustModels
Or, equivalently, via the Pkg
API:
julia> import Pkg; Pkg.add("RobustModels")
Fitting robust models
The API is consistent with GLM package. To fit a robust model, use the function, rlm(formula, data, estimator; initial_scale=:mad)
, where,
formula
: uses column symbols from the DataFrame data, for example, ifpropertynames(data)=[:Y,:X1,:X2]
,then a valid formula is@formula(Y ~ X1 + X2)
. An intercept is included by default.data
: a DataFrame which may contain missing valuesestimator
: chosen from
Supported loss functions are:
L2Loss
:ρ(r) = ½ r²
, like ordinary OLS.L1Loss
:ρ(r) = |r|
, non-differentiable estimator also know as Least absolute deviations. Prefer theQuantileRegression
solver.HuberLoss
:ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end
, convex estimator that behaves asL2
cost for small residuals andL1
for large esiduals and outliers.L1L2Loss
:ρ(r) = √(1 + (r/c)²) - 1
, smooth version ofHuberLoss
.FairLoss
:ρ(r) = |r|/c - log(1 + |r|/c)
, smooth version ofHuberLoss
.LogcoshLoss
:ρ(r) = log(cosh(r/c))
, smooth version ofHuberLoss
.ArctanLoss
:ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²)
, smooth version ofHuberLoss
.CauchyLoss
:ρ(r) = log(1+(r/c)²)
, non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.GemanLoss
:ρ(r) = ½ (r/c)²/(1 + (r/c)²)
, non-convex and bounded estimator, it suppresses outliers more strongly.WelschLoss
:ρ(r) = ½ (1 - exp(-(r/c)²))
, non-convex and bounded estimator, it suppresses outliers more strongly.TukeyLoss
:ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end
, non-convex and bounded estimator, it suppresses outliers more strongly and it is the preferred estimator for most cases.YohaiZamarLoss
:ρ(r)
is quadratic forr/c < 2/3
and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.
An estimator is constructed from an estimator type and a loss, e.g. MEstimator{TukeyLoss}()
.
For GeneralizedQuantileEstimator
, the quantile should be specified with τ
(0.5 by default), e.g. GeneralizedQuantileEstimator{HuberLoss}(0.2)
.
Methods applied to fitted models
Many of the methods are consistent with GLM.
nobs
: number of observationsdof_residual
: degrees of freedom for residualsdof
: degrees of freedom of the model, defined bynobs(m) - dof_residual(m)
coef
: estimate of the coefficients in the modelpredict
: obtain predicted values of the dependent variable from the fitted modeldeviance
/nulldeviance
: measure of the model (null model, respectively) fitstderror
: standard errors of the coefficientsconfint
: confidence intervals for the fitted coefficientsscale
: the scale estimate from the modelworkingweights
: the weights for each observation from the robust estimate. Outliers have low weightsleverage
: the vector of leverage score for each observationvcov
: estimated variance-covariance matrix of the coefficient estimates
Separation of response object and predictor object
Building upon GLM separation of the response and predictor objects, this package implements a new RobustLinResp
object to compute the residuals. There are currently two available predictor objects: DensePredChol
/SparsePredChol
(imported from GLM) and DensePredCG
/SparsePredCG
that use the iterative Conjugate Gradient methods, cg!
and lsqr!
from the IterativeSolvers package that is faster and more accurate than Cholesky method for very large matrices. The predictor that is used depends on the model matrix type and the method
argument of the fit
/fit!
/rlm
methods.