# Manual

## Installation

`Pkg.add("RobustModels")`

## Fitting robust models

The API is consistent with GLM package. To fit a robust model, use the function, `rlm(formula, data, estimator; initial_scale=:mad)`

, where,

`formula`

: uses column symbols from the DataFrame data, for example, if`propertynames(data)=[:Y,:X1,:X2]`

,then a valid formula is`@formula(Y ~ X1 + X2)`

. An intercept is included by default.`data`

: a DataFrame which may contain missing values`estimator`

: chosen from

Supported loss functions are:

`L2Loss`

:`ρ(r) = ½ r²`

, like ordinary OLS.`L1Loss`

:`ρ(r) = |r|`

, non-differentiable estimator also know as*Least absolute deviations*. Prefer the`QuantileRegression`

solver.`HuberLoss`

:`ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end`

, convex estimator that behaves as`L2`

cost for small residuals and`L1`

for large esiduals and outliers.`L1L2Loss`

:`ρ(r) = √(1 + (r/c)²) - 1`

, smooth version of`HuberLoss`

.`FairLoss`

:`ρ(r) = |r|/c - log(1 + |r|/c)`

, smooth version of`HuberLoss`

.`LogcoshLoss`

:`ρ(r) = log(cosh(r/c))`

, smooth version of`HuberLoss`

.`ArctanLoss`

:`ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²)`

, smooth version of`HuberLoss`

.`CauchyLoss`

:`ρ(r) = log(1+(r/c)²)`

, non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.`GemanLoss`

:`ρ(r) = ½ (r/c)²/(1 + (r/c)²)`

, non-convex and bounded estimator, it suppresses outliers more strongly.`WelschLoss`

:`ρ(r) = ½ (1 - exp(-(r/c)²))`

, non-convex and bounded estimator, it suppresses outliers more strongly.`TukeyLoss`

:`ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end`

, non-convex and bounded estimator, it suppresses outliers more strongly and it is the prefered estimator for most cases.`YohaiZamarLoss`

:`ρ(r)`

is quadratic for`r/c < 2/3`

and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.

An estimator is constructed from an estimator type and a loss, e.g. `MEstimator{TukeyLoss}()`

.

For `GeneralizedQuantileEstimator`

, the quantile should be specified with `τ`

(0.5 by default), e.g. `GeneralizedQuantileEstimator{HuberLoss}(0.2)`

.

## Methods applied to fitted models

Many of the methods are consistent with GLM.

`nobs`

: number of observations`dof_residual`

: degrees of freedom for residuals`dof`

: degrees of freedom of the model, defined by`nobs(m) - dof_residual(m)`

`coef`

: estimate of the coefficients in the model`predict`

: obtain predicted values of the dependent variable from the fitted model`deviance`

/`nulldeviance`

: measure of the model (null model, respectively) fit`stderror`

: standard errors of the coefficients`confint`

: confidence intervals for the fitted coefficients`scale`

: the scale estimate from the model`workingweights`

: the weights for each observation from the robust estimate. Outliers have low weights`leverage`

: the vector of leverage score for each observation`vcov`

: estimated variance-covariance matrix of the coefficient estimates

## Separation of response object and predictor object

Building upon GLM separation of the response and predictor objects, this package implements a new `RobustLinResp`

object to compute the residuals. There are currently two available predictor objects: `DensePredChol`

/`SparsePredChol`

(imported from GLM) and `DensePredCG`

/`SparsePredCG`

that use the iterative Conjugate Gradient methods, `cg!`

and `lsqr!`

from the IterativeSolvers package that is faster and more accurate than Cholesky method for very large matrices. The predictor that is used depends on the model matrix type and the `method`

argument of the `fit`

/`fit!`

/`rlm`

methods.