library(amp)
Here, we will go through one of the examples from the paper for which we have already built an influence curve (IC). For a more formal introduction of influence curves, see this article. In this example, \(Y\) is the outcome of interest, and \(X_1\),…,\(X_d\) are \(d\) covariates measured on each (independent) individual. The parameter in this example is the pearson correlation coefficient between the outcome of interest and each covariate, that is \(\psi_i = \text{corr}(Y, X_1)\). The null hypothesis we wish to test is that \(\psi_1 = \psi_2 =\)…\(=\psi_d=0\).
<- matrix(rnorm(5000), ncol = 5)
x_data <- rnorm(1000) + 0.02 * x_data[, 2]
y_data <- data.frame(y_data, x_data) obs_data
Here, we have already written a function that calculates both the parameter estimate and the IC for this parameter (we also choose to make no assumptions about the data generating mechanism).
::ic.pearson(obs_data, what = "est")
amp#> $est
#> X1 X2 X3 X4 X5
#> [1,] -0.02897102 -0.02868844 -0.06009732 0.05895252 -0.007182061
cor(y_data, x_data)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.02897102 -0.02868844 -0.06009732 0.05895252 -0.007182061
We are now ready to test the multivariate point null.
<- amp::mv_pn_test(obs_data, param_est = amp::ic.pearson, control = amp::test.control())
test_res hist(test_res$test_st_eld)
abline(v = test_res$pvalue, col = "red")
The derivation of influence functions for parameter estimators is quite technically challenging. However, once the IC and the parameter estimators have been derived, they can be used by this package.
As an example, we will consider the influence function for the sample mean which is just \(x - \bar{x}\). The function that is passed to the mv_pn_test will always take three arguments.
observ
) should be the observed
datawhat
) will specify if the function
should return the estimate (“est”), the influence curve (“ic”), or both
(“both”).control
takes any other arguments
used to control the function. Even if your function does not use other
arguments, the function definition should still contain
control = NULL
.The object returned by the function should be a list which contains
"est"
if what
is either "est"
or "both"
"ic"
if what
is either "ic"
or "both"
<- function(observ, what = "both", control = NULL){
ic.mean if (!(what %in% c("ic", "est", "both"))) {
stop("what must be one of ic (influence curve), est (estimate), or both")
}<- list()
ret if (what %in% c("ic", "both")) {
<- colMeans(x = observ)
col_means <- sweep(x = observ, MARGIN = 2,
infl STATS = col_means, FUN = "-")
$ic <- infl
ret
}if (what %in% c("est", "both")) {
$est <- colMeans(x = observ)
ret
}return(ret)
}
Now that we have our newly defined IC, we can pass it to the testing function. We first create a new dataset for which we are interested in testing if each covariate has mean zero:
set.seed(100)
<- matrix(rnorm(n = 500) +
obs_data_mean1 rep(c(0, 0, 0, 0, 0.01), each = 100),
ncol = 5, byrow = FALSE)
<- amp::mv_pn_test(obs_data_mean1, param_est = ic.mean,
res_1 control = amp::test.control())
<- matrix(rnorm(n = 500) +
obs_data_mean2 rep(c(0, 0, 0.32, 0, 0.07), each = 100),
ncol = 5, byrow = FALSE)
<- amp::mv_pn_test(obs_data_mean2, param_est = ic.mean,
res_2 control = amp::test.control())
print(c(res_1$pvalue, res_2$pvalue))
#> [1] 0.788 0.080
In cases where control will modify the behavior of the IC or
parameter estimation, these arguments should be appended to the list of
already created arguments in amp::test.control()
and passed
to the control
argument of the mv_pn_test
.
When passing control arguments to the param_est
function, it is recommended that if needed, these functions explicitly
pull these arguments out of control before passing them to a nested
function:
## Correct usage
<- function(x) return(x)
nested_fun1 <- function(observ, what = "both", control = NULL){
ic.mean1 if (!(what %in% c("ic", "est", "both"))) {
stop("what must be one of ic (influence curve), est (estimate), or both")
}<- list()
ret if (what %in% c("ic", "both")) {
<- nested_fun1(control$extra_arg)
mult <- mult * colMeans(x = observ)
col_means <- sweep(x = observ, MARGIN = 2,
infl STATS = col_means, FUN = "-")
$ic <- infl
ret
}if (what %in% c("est", "both")) {
$est <- colMeans(x = observ)
ret
}return(ret)
}
<- amp::mv_pn_test(obs_data_mean1, param_est = ic.mean1,
test1 control = c(amp::test.control(),
"extra_arg" = 2))
## Incorrect usage
<- function(x) return(x$extra_arg)
nested_fun2 <- function(observ, what = "both", control = NULL){
ic.mean2 if (!(what %in% c("ic", "est", "both"))) {
stop("what must be one of ic (influence curve), est (estimate), or both")
}<- list()
ret if (what %in% c("ic", "both")) {
<- nested_fun2(control)
mult <- mult * colMeans(x = observ)
col_means <- sweep(x = observ, MARGIN = 2,
infl STATS = col_means, FUN = "-")
$ic <- infl
ret
}if (what %in% c("est", "both")) {
$est <- colMeans(x = observ)
ret
}return(ret)
}<- amp::mv_pn_test(obs_data_mean1, param_est = ic.mean2,
test2 control = c(amp::test.control(),
"extra_arg" = 2))
#> Warning in .checkargs(obs_data, param_est, control): Some arguments appear not to be used by the test. These are:
#> extra_arg
#> To avoid this warning, explicitly call these arguments in your param_est function (rather than just passing all control arguments to another function.