Package 'superMICE' reference manual

Title:	SuperLearner Method for MICE
Description:	Adds a Super Learner ensemble model method (using the 'SuperLearner' package) to the 'mice' package. Laqueur, H. S., Shev, A. B., Kagawa, R. M. C. (2021) <doi:10.1093/aje/kwab271>.
Authors:	Aaron B. Shev
Maintainer:	Aaron B. Shev <[email protected]>
License:	GPL-3
Version:	1.1.1
Built:	2025-03-05 04:18:32 UTC
Source:	https://github.com/abshev/supermice

Function to generate imputations using SuperLearner for data with a binary outcome.

Description

Function to generate imputations using SuperLearner for data with a binary outcome.

Usage

binarySuperLearner(y, x, wy, SL.library, ...)
binarySuperLearner(y, x, wy, SL.library, ...)

Arguments

`y`	Vector of observed values of the variable to be imputed.
`x`	Numeric matrix of variables to be used as predictors in SuperLearner methods with rows corresponding to values in Y.
`wy`	Logical vector of length `length(y)`. A `TRUE` value indicates locations in `y` for which imputations are created.
`SL.library`	Either a character vector of prediction algorithms or a list containing character vectors. A list of functions included in the SuperLearner package can be found with SuperLearner::listWrappers().
`...`	Further arguments passed to SuperLearner.

Value

Binary Vector of randomly drawn imputed values.

Function to generate imputations using SuperLearner for data with a continuous outcome

Description

Function to generate imputations using SuperLearner for data with a continuous outcome

Usage

continuousSuperLearner(y, x, wy, SL.library, kernel, bw, bw.update, ...)
continuousSuperLearner(y, x, wy, SL.library, kernel, bw, bw.update, ...)

Arguments

`y`	Vector of observed and missing/imputed values of the variable to be imputed.
`x`	Numeric matrix of variables to be used as predictors in SuperLearner models with rows corresponding to observed values of the variable to be imputed and columns corresponding to individual predictor variables.
`wy`	Logical vector. A TRUE value indicates locations in `y` that are missing or imputed.
`SL.library`	Either a character vector of prediction algorithms or a list containing character vectors. A list of functions included in the SuperLearner package can be found with `SuperLearner::listWrappers()`.
`kernel`	one of `gaussian`, `uniform`, or `triangular`. Specifies the kernel to be used in estimating the distribution around a missing value.
`bw`	`NULL` or numeric value for bandwidth of kernel function (as standard deviations of the kernel).
`bw.update`	logical indicating whether bandwidths should be computed every iteration or only on the first iteration. Default is `TRUE`, but `FALSE` may speed up the run time at the cost of accuracy.
`...`	further arguments passed to `SuperLearner()`.

Value

numeric vector of randomly drawn imputed values.

Kernel functions used for local imputation

Description

Kernel functions used for local imputation

Usage

gaussianKernel(x, xcenter, bw = 1, lambda = NULL)
gaussianKernel(x, xcenter, bw = 1, lambda = NULL)

Arguments

`x`	numeric vector of values to weight.
`xcenter`	numeric value to center the kernel.
`bw`	bandwidth of the kernel.
`lambda`	kernel radius, function of `bw`.

Value

kernel values for x centered at xcenter.

Jackknife method for selection bandwidth

Description

Jackknife method for selection bandwidth

Usage

jackknifeBandwidthSelection(i, bwGrid, preds, y, delta, kernel)
jackknifeBandwidthSelection(i, bwGrid, preds, y, delta, kernel)

Arguments

`i`	integer referring to the index of the missing value to be imputed.
`bwGrid`	numeric vector of candidate bandwidth values
`preds`	numeric vector of predicted values for missing observations
`y`	numeric vector of length `n` of observed and imputed values.
`delta`	Binary vector of length `length(y)` indicating missingness. `1` where `y` is observed and `0` where `y` is missing.
`kernel`	one of `gaussian`, `uniform`, or `triangular`. Specifies the kernel to be used in estimating the distribution around a missing value.

Value

bandwidth

Computes jackknife variance

Description

Computes jackknife variance

Usage

jackknifeVariance(j, kernMatrix, delta, y)
jackknifeVariance(j, kernMatrix, delta, y)

Arguments

`j`	integer index for deleted observation in the jackknife procedure.
`kernMatrix`	`(n-1)` by `m` matrix of kernel values centered at missing observation `j` where `n` is the total number of observations and `m` is the number of candidate bandwidths.
`delta`	Binary vector of length `n` indicating missingness. `1` where `y` is observed and `0` where `y` is missing.
`y`	numeric vector of length `n` of observed values and imputed values.

Value

returns a single numeric value for the estimate of the jackknife variance.

Function to generate imputations using non-parametric and semi-parametric local imputation methods.

Description

Function to generate imputations using non-parametric and semi-parametric local imputation methods.

Usage

localImputation(
  i,
  preds,
  y,
  delta,
  bw = NULL,
  kernel = c("gaussian", "uniform", "triangular")
)
localImputation(
  i,
  preds,
  y,
  delta,
  bw = NULL,
  kernel = c("gaussian", "uniform", "triangular")
)

Arguments

`i`	integer referring to the index of the missing value to be imputed.
`preds`	numeric vector of predictions of missing values from SuperLearner.
`y`	numeric vector for variable to be imputed.
`delta`	binary vector of length `length(y)` indicating missingness. `1` where `y` is observed and `0` where `y` is missing.
`bw`	`NULL` or numeric value for bandwidth of kernel function (as standard deviations of the kernel).
`kernel`	one of `gaussian`, `uniform`, or `triangular`. Specifies the kernel to be used in estimating the distribution around a missing value.

Value

numeric vector of randomly drawn imputed values.

SuperLearner method for `mice` package.

Description

Method for the mice package that uses SuperLearner as the predctive algorithm. Model fitting is done using the SuperLearner package.

Usage

mice.impute.SuperLearner(
  y,
  ry,
  x,
  wy = NULL,
  SL.library,
  kernel = c("gaussian", "uniform", "triangular"),
  bw = c(0.1, 0.2, 0.25, 0.3, 0.5, 1, 2.5, 5, 10, 20),
  bw.update = TRUE,
  ...
)
mice.impute.SuperLearner(
  y,
  ry,
  x,
  wy = NULL,
  SL.library,
  kernel = c("gaussian", "uniform", "triangular"),
  bw = c(0.1, 0.2, 0.25, 0.3, 0.5, 1, 2.5, 5, 10, 20),
  bw.update = TRUE,
  ...
)

Arguments

`y`	Vector to be imputed
`ry`	Logical vector of length `length(y)` indicating the the subset `y[ry]` of elements in y to which the imputation model is fitted. The `ry` generally distinguishes the observed (`TRUE`) and missing values (`FALSE`) in `y`.
`x`	Numeric design matrix with `length(y)` rows with predictors for `y`. Matrix `x` may have no missing values.
`wy`	Logical vector of length `length(y)`. A `TRUE` value indicates locations in `y` for which imputations are created.
`SL.library`	For SuperLearner: Either a character vector of prediction algorithms or list containing character vectors as specified by the SuperLearner package. See details below.
`kernel`	One of "gaussian", "uniform", "triangular". Kernel function used to compute weights.
`bw`	NULL or numeric value for bandwidth of kernel function (as standard deviations of the kernel).
`bw.update`	logical indicating whether bandwidths should be computed every iteration or only on the first iteration. Default is `TRUE`, but `FALSE` may speed up the run time at the cost of accuracy.
`...`	Further arguments passed to `SuperLearner`.

Details

mice.impute.SuperLearner() is a method for use with the mice() function that implements the ensemble predictive model, SuperLearner (van der Laan, 2011), into the mice (van Buuren, 2011) multiple imputation procedure. This function is never called directly, instead a user that wishes to use SuperLearner in MICE simply needs to set the argument method = "SuperLearner" in the call to mice(). Arguments for the SuperLearner() function are passed from mice as extra arguments in the mice() call.

All MICE methods randomly generate imputed values for a number of data sets. The approach of SuperMICE is to estimate parameters for a normal distribution centered at the point estimate for an imputed value predicted by a SuperLearner model. The point estimates are obtained by fitting a selection of different predictive models on complete cases and determining an optimal weighted average of candidate models to predict the missing cases. SuperMICE uses the implementation of SuperLearner found in the SuperLearner package. The models to be used with SuperLearner() are supplied by the user as a character vector. For a full list of available methods see listWrappers().

SuperLearner models do not produce standard errors for estimates, so instead we use a kernel based estimate of local variance around each point estimate as the variance parameter in the normal distribution used to randomly sample values. The kernel can be set by the user with the kernel argument as either a gaussian kernel, uniform kernel, or triangular kernel. The user must also supply a list of candidate bandwidths in the bw argument as a numeric vector. For more information on the variance and bandwidth selection see Laqueur, et. al (2021). In every iteration the mice procedure, the optimal bandwidth is reselected. This may be changed to select the bandwidth only on the first iteration to speed up the total run time of the imputation by changing bw.update to FALSE; however this may bias your results. Note that this only applies to continuous response variables. In the binary case the variance is a function of the SuperLearner estimate.

Value

Vector with imputed data, same type as y, and of length sum(wy)

References

Laqueur, H. S., Shev, A. B., Kagawa, R. M. C. (2021). SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations. American Journal of Epidemiology, kwab271, doi:10.1093/aje/kwab271.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. doi:10.18637/jss.v045.i03.

van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2008) Super Learner, Statistical Applications of Genetics and Molecular Biology, 6, article 25.

Examples


  #Multiple imputation with missingness on a continuous variable.

  #Randomly generated data with missingness in x2. The probability of x2
  #  being missing increases with with value of x1.
  n <- 20
  pmissing <- 0.10
  x1 <- runif(n, min = -3, max = 3)
  x2 <- x1^2 + rnorm(n, mean = 0, sd = 1)
  error <- rnorm(n, mean = 0, sd = 1)
  y <- x1 + x2 + error
  f <- ecdf(x1)
  x2 <- ifelse(runif(x2) < (f(x1) * 2 * pmissing), NA, x2)
  dat <- data.frame(y, x1, x2)

  #Create vector of SuperLearner method names
  #  Note: see SuperLearner::listWrappers() for a full list of methods
  #    available.
  SL.lib <- c("SL.mean", "SL.glm")

  #Run mice().
  #  Note 1: m >= 30 and maxit >= 10 are recommended outside of this
  #    toy example
  #  Note 2: a denser bandwidth grid is recommended, see default for bw
  #    argument for example.
  imp.SL <- mice::mice(dat, m = 2, maxit = 2,
                       method = "SuperLearner",
                       print = TRUE, SL.library = SL.lib,
                       kernel = "gaussian",
                       bw = c(0.25, 1, 5))


#Multiple imputation with missingness on a continuous variable.

  #Randomly generated data with missingness in x2. The probability of x2
  #  being missing increases with with value of x1.
  n <- 20
  pmissing <- 0.10
  x1 <- runif(n, min = -3, max = 3)
  x2 <- x1^2 + rnorm(n, mean = 0, sd = 1)
  error <- rnorm(n, mean = 0, sd = 1)
  y <- x1 + x2 + error
  f <- ecdf(x1)
  x2 <- ifelse(runif(x2) < (f(x1) * 2 * pmissing), NA, x2)
  dat <- data.frame(y, x1, x2)

  #Create vector of SuperLearner method names
  #  Note: see SuperLearner::listWrappers() for a full list of methods
  #    available.
  SL.lib <- c("SL.mean", "SL.glm")

  #Run mice().
  #  Note 1: m >= 30 and maxit >= 10 are recommended outside of this
  #    toy example
  #  Note 2: a denser bandwidth grid is recommended, see default for bw
  #    argument for example.
  imp.SL <- mice::mice(dat, m = 2, maxit = 2,
                       method = "SuperLearner",
                       print = TRUE, SL.library = SL.lib,
                       kernel = "gaussian",
                       bw = c(0.25, 1, 5))

Package 'superMICE'

Help Index

Function to generate imputations using SuperLearner for data with a binary outcome.

Description

Usage

Arguments

Value

Function to generate imputations using SuperLearner for data with a continuous outcome

Description

Usage

Arguments

Value

Kernel functions used for local imputation

Description

Usage

Arguments

Value

Jackknife method for selection bandwidth

Description

Usage

Arguments

Value

Computes jackknife variance

Description

Usage

Arguments

Value

Function to generate imputations using non-parametric and semi-parametric local imputation methods.

Description

Usage

Arguments

Value

SuperLearner method for mice package.

Description

Usage

Arguments

Details

Value

References

See Also

Examples

SuperLearner method for `mice` package.