# Manual

## Installation

Pkg.add("RobustModels")

## Fitting robust models

The API is consistent with GLM package. To fit a robust model, use the function, rlm(formula, data, estimator; initial_scale=:mad), where,

• formula: uses column symbols from the DataFrame data, for example, if propertynames(data)=[:Y,:X1,:X2],then a valid formula is @formula(Y ~ X1 + X2). An intercept is included by default.

• data: a DataFrame which may contain missing values

• estimator: chosen from

Supported loss functions are:

• L2Loss: ρ(r) = ½ r², like ordinary OLS.
• L1Loss: ρ(r) = |r|, non-differentiable estimator also know as Least absolute deviations. Prefer the QuantileRegression solver.
• HuberLoss: ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end, convex estimator that behaves as L2 cost for small residuals and L1 for large esiduals and outliers.
• L1L2Loss: ρ(r) = √(1 + (r/c)²) - 1, smooth version of HuberLoss.
• FairLoss: ρ(r) = |r|/c - log(1 + |r|/c), smooth version of HuberLoss.
• LogcoshLoss: ρ(r) = log(cosh(r/c)), smooth version of HuberLoss.
• ArctanLoss: ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²), smooth version of HuberLoss.
• CauchyLoss: ρ(r) = log(1+(r/c)²), non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.
• GemanLoss: ρ(r) = ½ (r/c)²/(1 + (r/c)²), non-convex and bounded estimator, it suppresses outliers more strongly.
• WelschLoss: ρ(r) = ½ (1 - exp(-(r/c)²)), non-convex and bounded estimator, it suppresses outliers more strongly.
• TukeyLoss: ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end, non-convex and bounded estimator, it suppresses outliers more strongly and it is the prefered estimator for most cases.
• YohaiZamarLoss: ρ(r) is quadratic for r/c < 2/3 and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.

An estimator is constructed from an estimator type and a loss, e.g. MEstimator{TukeyLoss}().

For GeneralizedQuantileEstimator, the quantile should be specified with τ (0.5 by default), e.g. GeneralizedQuantileEstimator{HuberLoss}(0.2).

## Methods applied to fitted models

Many of the methods are consistent with GLM.

## Separation of response object and predictor object

Building upon GLM separation of the response and predictor objects, this package implements a new RobustLinResp object to compute the residuals. There are currently two available predictor objects: DensePredChol/SparsePredChol (imported from GLM) and DensePredCG/SparsePredCG that use the iterative Conjugate Gradient methods, cg! and lsqr! from the IterativeSolvers package that is faster and more accurate than Cholesky method for very large matrices. The predictor that is used depends on the model matrix type and the method argument of the fit/fit!/rlm methods.