| Title: | Argmin Inference over a Discrete Candidate Set |
|---|---|
| Description: | Provides methods to construct frequentist confidence sets with valid marginal coverage for identifying the population-level argmin or argmax based on IID data. For instance, given an n by p loss matrix—where n is the sample size and p is the number of models—the CS.argmin() method produces a discrete confidence set that contains the model with the minimal (best) expected risk with desired probability. The argmin.HT() method helps check if a specific model should be included in such a confidence set. The main implemented method is proposed by Tianyu Zhang, Hao Lee and Jing Lei (2024) "Winners with confidence: Discrete argmin inference with an application to model selection". |
| Authors: | Tianyu Zhang [aut], Hao Lee [aut, cre, cph], Jing Lei [aut] |
| Maintainer: | Hao Lee <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-20 08:55:29 UTC |
| Source: | https://github.com/xu3cl4/argmincs |
This function performs a hypothesis test to evaluate whether a given dimension may be the argmax.
It internally negates the data and reuses the implementation from argmin.HT.
argmax.HT(data, r = NULL, method = "softmin.LOO", ...)argmax.HT(data, r = NULL, method = "softmin.LOO", ...)
data |
(1) A n by p matrix of raw samples (for GTA), or (2) A n by (p-1) difference matrix (for SML, HML, NS, MT). Each row is a sample. |
r |
The dimension of interest for testing; defaults to NULL. Required for GTA. |
method |
A string indicating the method to use. Defaults to 'softmin.LOO'. See **Details** for supported methods and abbreviations. |
... |
Additional arguments passed to |
The supported methods include:
softmin.LOO (SML) |
Leave-one-out algorithm using exponential weighting. |
argmin.LOO (HML) |
A variant of SML that uses hard argmin instead of exponential weighting. Not recommended. |
nonsplit (NS) |
Variant of SML without data splitting. Requires a fixed lambda value. Not recommended. |
Bonferroni (MT) |
Multiple testing using Bonferroni correction. |
Gupta (GTA) |
The method from Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.. |
A character string: 'Accept' or 'Reject', indicating whether the dimension could be an argmax, and relevant statistics.
Chernozhukov V, Chetverikov D, Kato K (2013). “Testing many moment inequalities.” RePEc. IDEAS Working Paper Series.
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
set.seed(108) n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(p) data <- MASS::mvrnorm(n, mu, cov) ## Define the dimension of interest r <- 4 ## Construct difference matrix for dimension r difference.matrix.r <- matrix(rep(data[, r], p - 1), ncol = p - 1, byrow = FALSE) - data[, -r] ## softmin.LOO (SML) argmax.HT(difference.matrix.r) ## use seed argmax.HT(difference.matrix.r, seed=19) ## With known true difference true.mean.diff <- mu[r] - mu[-r] argmax.HT(difference.matrix.r, true.mean = true.mean.diff) ## Without scaling argmax.HT(difference.matrix.r, scale.input = FALSE) ## With a user-specified lambda argmax.HT(difference.matrix.r, lambda = sqrt(n) / 2.5) ## Add a seed for reproducibility argmax.HT(difference.matrix.r, seed = 17) ## argmin.LOO (HML) argmax.HT(difference.matrix.r, method = "HML") ## nonsplit method argmax.HT(difference.matrix.r, method = "NS", lambda = sqrt(n)/2.5) ## Bonferroni method (choose t test for normal data) argmax.HT(difference.matrix.r, method = "MT", test = "t") ## Gupta method (pass full data matrix) critical.val <- get.quantile.gupta.selection(p = length(mu)) argmax.HT(data, r, method = "GTA", critical.val = critical.val)set.seed(108) n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(p) data <- MASS::mvrnorm(n, mu, cov) ## Define the dimension of interest r <- 4 ## Construct difference matrix for dimension r difference.matrix.r <- matrix(rep(data[, r], p - 1), ncol = p - 1, byrow = FALSE) - data[, -r] ## softmin.LOO (SML) argmax.HT(difference.matrix.r) ## use seed argmax.HT(difference.matrix.r, seed=19) ## With known true difference true.mean.diff <- mu[r] - mu[-r] argmax.HT(difference.matrix.r, true.mean = true.mean.diff) ## Without scaling argmax.HT(difference.matrix.r, scale.input = FALSE) ## With a user-specified lambda argmax.HT(difference.matrix.r, lambda = sqrt(n) / 2.5) ## Add a seed for reproducibility argmax.HT(difference.matrix.r, seed = 17) ## argmin.LOO (HML) argmax.HT(difference.matrix.r, method = "HML") ## nonsplit method argmax.HT(difference.matrix.r, method = "NS", lambda = sqrt(n)/2.5) ## Bonferroni method (choose t test for normal data) argmax.HT(difference.matrix.r, method = "MT", test = "t") ## Gupta method (pass full data matrix) critical.val <- get.quantile.gupta.selection(p = length(mu)) argmax.HT(data, r, method = "GTA", critical.val = critical.val)
This is a wrapper to perform hypothesis test to see if a given dimension may be an argmin. Multiple methods are supported.
argmin.HT(data, r = NULL, method = "softmin.LOO", ...)argmin.HT(data, r = NULL, method = "softmin.LOO", ...)
data |
(1) A n by p data matrix for (GTA); each of its row is a p-dimensional sample, or (2) A n by (p-1) difference matrix for (SML, HML, NS, MT); each of its row is a (p-1)-dimensional sample differences |
r |
The dimension of interest for hypothesis test; defaults to NULL. (Only needed for GTA) |
method |
A string indicating the method for hypothesis test; defaults to 'softmin.LOO'. Passing an abbreviation is allowed. For the list of supported methods and their abbreviations, see Details. |
... |
Additional arguments to argmin.HT.LOO, lambda.adaptive.enlarge, is.lambda.feasible.LOO, argmin.HT.MT, argmin.HT.gupta. A correct argument name needs to be specified if it is used. |
The supported methods include:
softmin.LOO (SML) |
LOO (leave-one-out) algorithm, using the exponential weightings. Proposed by Zhang T, Lee H, Lei J (2024). “Winners with confidence: Discrete argmin inference with an application to model selection.” arXiv preprint arXiv:2408.02060.. |
argmin.LOO (HML) |
A variant of SML, but it uses (hard) argmin rather than exponential weighting. The method is not recommended because its type 1 error is not controlled. |
nonsplit (NS) |
A variant of SML, but no splitting is involved. One needs to pass a fixed lambda value as a required additional argument. The method is not recommended because its type 1 error is not controlled. |
Bonferroni (MT) |
Multiple testing with Bonferroni's correction. |
Gupta (GTA) |
The method in Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.. |
'Accept' or 'Reject'. A string indicating whether the given dimension could be an argmin (Accept) or not (Reject), and relevant statistics.
Zhang T, Lee H, Lei J (2024). “Winners with confidence: Discrete argmin inference with an application to model selection.” arXiv preprint arXiv:2408.02060.
Chernozhukov V, Chetverikov D, Kato K (2013). “Testing many moment inequalities.” RePEc. IDEAS Working Paper Series.
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
r <- 4 n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) ## softmin.LOO difference.matrix.r <- matrix(rep(data[,r], p-1), ncol=p-1, byrow=FALSE) - data[,-r] argmin.HT(difference.matrix.r) ## use seed argmin.HT(difference.matrix.r, seed=19) # provide centered test statistic (to simulate asymptotic normality) true.mean.difference.r <- mu[r] - mu[-r] argmin.HT(difference.matrix.r, true.mean=true.mean.difference.r) # keep the data unstandardized argmin.HT(difference.matrix.r, scale.input=FALSE) # use an user-specified lambda argmin.HT(difference.matrix.r, lambda=sqrt(n)/2.5) # add a seed argmin.HT(difference.matrix.r, seed=19) ## argmin.LOO/hard min argmin.HT(difference.matrix.r, method='HML') ## nonsplit argmin.HT(difference.matrix.r, method='NS', lambda=sqrt(n)/2.5) ## Bonferroni (choose t test because of normal data) argmin.HT(difference.matrix.r, method='MT', test='t') ## z test argmin.HT(difference.matrix.r, method='MT', test='z') ## Gupta critical.val <- get.quantile.gupta.selection(p=length(mu)) argmin.HT(data, r, method='GTA', critical.val=critical.val)r <- 4 n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) ## softmin.LOO difference.matrix.r <- matrix(rep(data[,r], p-1), ncol=p-1, byrow=FALSE) - data[,-r] argmin.HT(difference.matrix.r) ## use seed argmin.HT(difference.matrix.r, seed=19) # provide centered test statistic (to simulate asymptotic normality) true.mean.difference.r <- mu[r] - mu[-r] argmin.HT(difference.matrix.r, true.mean=true.mean.difference.r) # keep the data unstandardized argmin.HT(difference.matrix.r, scale.input=FALSE) # use an user-specified lambda argmin.HT(difference.matrix.r, lambda=sqrt(n)/2.5) # add a seed argmin.HT(difference.matrix.r, seed=19) ## argmin.LOO/hard min argmin.HT(difference.matrix.r, method='HML') ## nonsplit argmin.HT(difference.matrix.r, method='NS', lambda=sqrt(n)/2.5) ## Bonferroni (choose t test because of normal data) argmin.HT(difference.matrix.r, method='MT', test='t') ## z test argmin.HT(difference.matrix.r, method='MT', test='z') ## Gupta critical.val <- get.quantile.gupta.selection(p=length(mu)) argmin.HT(data, r, method='GTA', critical.val=critical.val)
Test whether a dimension is the argmin, using the method in (Gupta 1965).
argmin.HT.gupta( data, r, sample.mean = NULL, stds = NULL, critical.val = NULL, alpha = 0.05, ... )argmin.HT.gupta( data, r, sample.mean = NULL, stds = NULL, critical.val = NULL, alpha = 0.05, ... )
data |
A n by p data matrix; each of its row is a p-dimensional sample. |
r |
The dimension of interest for hypothesis test. |
sample.mean |
The sample mean of the n samples in data; defaults to NULL. It can be calculated via colMeans(data).
If performing multiple tests across dimensions, pre-computing |
stds |
A vector of the same (population) standard deviations for all dimensions; defaults to a vector of 1's. These are used to standardize the sample means. |
critical.val |
The quantile for the hypothesis test; defaults to NULL. It can be calculated via get.quantile.gupta.selection. If your experiment involves hypothesis testing over more than one dimension, pass a quantile to speed up computation. |
alpha |
The significance level of the hypothesis test; defaults to 0.05. |
... |
Additional argument to get.quantile.gupta.selection. A correct argument name needs to be specified if it is used. |
A list containing:
test.stat |
The test statistic |
. critical.value |
The critical value for the hypothesis test. Being greater than it leads to a rejection. |
ans |
'Reject' or 'Accept' |
This method requires independence among the dimensions.
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
Test if a dimension may be argmin, using the LOO (leave-one-out) algorithm in Zhang et al 2024.
argmin.HT.LOO( difference.matrix, sample.mean = NULL, min.algor = "softmin", lambda = NULL, const = 2.5, enlarge = TRUE, alpha = 0.05, true.mean.difference = NULL, output.weights = FALSE, scale.input = TRUE, seed = NULL, ... )argmin.HT.LOO( difference.matrix, sample.mean = NULL, min.algor = "softmin", lambda = NULL, const = 2.5, enlarge = TRUE, alpha = 0.05, true.mean.difference = NULL, output.weights = FALSE, scale.input = TRUE, seed = NULL, ... )
difference.matrix |
A n by (p-1) difference data matrix (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
sample.mean |
The sample mean of differences; defaults to NULL. It can be calculated via colMeans(difference.matrix). |
min.algor |
The algorithm to compute the test statistic by weighting across dimensions; 'softmin' uses exponential weighting, while 'argmin' picks the largest mean coordinate directly. Defaults to 'softmin'. |
lambda |
The real-valued tuning parameter for exponential weightings (the calculation of softmin); defaults to NULL. If lambda=NULL (recommended), the function would determine a lambda value in a data-driven way. |
const |
The scaling constant for initial data-driven lambda |
enlarge |
A boolean value indicating if the data-driven lambda should be determined via an iterative enlarging algorithm; defaults to TRUE. |
alpha |
The significance level of the hypothesis test; defaults to 0.05. |
true.mean.difference |
The population mean of the differences. (Optional); used to compute a centered test statistic for simulation or diagnostic purposes. |
output.weights |
A boolean variable specifying whether the exponential weights should be outputted; defaults to FALSE. |
scale.input |
A boolean variable specifying whether the input difference matrix should be standardized. Defaults to TRUE |
seed |
(Optional) If provided, used to seed the random sampling (for reproducibility). |
... |
Additional arguments to lambda.adaptive.enlarge, is.lambda.feasible.LOO. |
A list containing:
test.stat.scale |
The scaled test statistic |
critical.value |
The critical value for the hypothesis test. Being greater than it leads to a rejection. |
std |
The standard deviation estimate. |
ans |
A character string: either 'Reject' or 'Accept', depending on the test outcome. |
lambda |
The lambda used in the hypothesis testing. |
lambda.capped |
Boolean variable indicating the data-driven lambda has reached the large threshold n^5 |
residual.slepian |
The final approximate first order stability term for the data-driven lambda. |
variance.bound |
The final variance bound for the data-driven lambda. |
test.stat.centered |
(Optional) The centered test statistic, computed only if true.mean.difference is provided. |
exponential.weights |
(Optional) A (n by p-1) matrix storing the exponential weightings in the test statistic. |
Test if a dimension may be argmin, using multiple testing with Bonferroni's correction.
argmin.HT.MT(difference.matrix, sample.mean = NULL, test = "z", alpha = 0.05)argmin.HT.MT(difference.matrix, sample.mean = NULL, test = "z", alpha = 0.05)
difference.matrix |
A n by (p-1) difference data matrix (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
sample.mean |
The sample mean of differences; defaults to NULL. It can be calculated via colMeans(difference.matrix). |
test |
The test to perform: 't' or 'z'; defaults to 'z'. If the data are assumed normally distributed, use 't'; otherwise 'z'. |
alpha |
The significance level of the hypothesis test; defaults to 0.05. |
A list containing:
p.val |
p value without Bonferroni's correction. |
. critical.value |
The critical value for the hypothesis test. Being less than it leads to a rejection. |
ans |
'Reject' or 'Accept' |
Test if a dimension may be argmin without any splitting.
argmin.HT.nonsplit( difference.matrix, lambda, sample.mean = NULL, alpha = 0.05, scale.input = TRUE )argmin.HT.nonsplit( difference.matrix, lambda, sample.mean = NULL, alpha = 0.05, scale.input = TRUE )
difference.matrix |
A n by (p-1) difference data matrix (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
lambda |
The real-valued tuning parameter for exponential weightings (the calculation of softmin). |
sample.mean |
The sample mean of differences; defaults to NULL. It can be calculated via colMeans(difference.matrix). |
alpha |
The significance level of the hypothesis test; defaults to 0.05. |
scale.input |
A boolean variable specifying whether the input difference matrix should be standardized defaults to TRUE |
This method is not recommended, given its poor performance when p is small.
A list containing:
test.stat.scale |
The scaled test statistic |
. critical.value |
The critical value for the hypothesis test. Being greater than it leads to a rejection. |
std |
The standard deviation estimate. |
ans |
'Reject' or 'Accept' |
This is a wrapper to construct a confidence set for the argmax by negating the input and reusing CS.argmin.
CS.argmax(data, method = "softmin.LOO", alpha = 0.05, ...)CS.argmax(data, method = "softmin.LOO", alpha = 0.05, ...)
data |
An |
method |
A string indicating the method to use; defaults to 'softmin.LOO'. Can be abbreviated (e.g., 'SML' for 'softmin.LOO'). See Details for full list. |
alpha |
Significance level. The function returns a |
... |
Additional arguments passed to corresponding testing functions. |
The supported methods include:
softmin.LOO (SML) |
Leave-one-out algorithm using exponential weighting. |
argmin.LOO (HML) |
Variant of SML that uses hard argmin instead of soft weighting. Not recommended. |
nonsplit (NS) |
Variant of SML without data splitting. Requires a fixed lambda value. Not recommended. |
Bonferroni (MT) |
Multiple testing using Bonferroni correction. |
Gupta (GTA) |
The method of Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.. |
Futschik (FCHK) |
A two-step method from Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.. |
A vector of indices (1-based) representing the confidence set for the argmax.
Zhang T, Lee H, Lei J (2024). “Winners with confidence: Discrete argmin inference with an application to model selection.” arXiv preprint arXiv:2408.02060.
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
Chernozhukov V, Chetverikov D, Kato K (2013). “Testing many moment inequalities.” RePEc. IDEAS Working Paper Series.
set.seed(108) n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(p) data <- MASS::mvrnorm(n, mu, cov) ## softmin.LOO (SML) CS.argmax(data) ## argmin.LOO (HML) CS.argmax(data, method = "HML") ## nonsplit (NS) - requires lambda CS.argmax(data, method = "NS", lambda = sqrt(n)/2.5) ## Bonferroni (MT) - t test default CS.argmax(data, method = "MT", test = "t") ## Gupta (GTA) CS.argmax(data, method = "GTA") ## Futschik (FCHK) with default alpha.1 and alpha.2 CS.argmax(data, method = "FCHK") ## Futschik (FCHK) with user-specified alpha.1 and alpha.2 alpha.1 <- 0.001 alpha.2 <- 1 - (0.95 / (1 - alpha.1)) CS.argmax(data, method = "FCHK", alpha.1 = alpha.1, alpha.2 = alpha.2)set.seed(108) n <- 200 p <- 20 mu <- (1:p)/p cov <- diag(p) data <- MASS::mvrnorm(n, mu, cov) ## softmin.LOO (SML) CS.argmax(data) ## argmin.LOO (HML) CS.argmax(data, method = "HML") ## nonsplit (NS) - requires lambda CS.argmax(data, method = "NS", lambda = sqrt(n)/2.5) ## Bonferroni (MT) - t test default CS.argmax(data, method = "MT", test = "t") ## Gupta (GTA) CS.argmax(data, method = "GTA") ## Futschik (FCHK) with default alpha.1 and alpha.2 CS.argmax(data, method = "FCHK") ## Futschik (FCHK) with user-specified alpha.1 and alpha.2 alpha.1 <- 0.001 alpha.2 <- 1 - (0.95 / (1 - alpha.1)) CS.argmax(data, method = "FCHK", alpha.1 = alpha.1, alpha.2 = alpha.2)
This is a wrapper to construct a discrete confidence set for argmin. Multiple methods are supported.
CS.argmin(data, method = "softmin.LOO", alpha = 0.05, ...)CS.argmin(data, method = "softmin.LOO", alpha = 0.05, ...)
data |
A n by p data matrix; each row is a p-dimensional sample. |
method |
A string indicating the method used to construct the confidence set. Defaults to 'softmin.LOO'. Can be abbreviated (e.g., 'SML' for 'softmin.LOO'). See **Details** for available methods and abbreviations. |
alpha |
The significance level; defaults to 0.05. The function produces a |
... |
Additional arguments to argmin.HT.LOO, lambda.adaptive.enlarge, is.lambda.feasible.LOO, argmin.HT.MT, argmin.HT.gupta. A correct argument name needs to be specified if it is used. |
The supported methods include:
softmin.LOO (SML) |
Leave-one-out algorithm using exponential weighting. Proposed by Zhang T, Lee H, Lei J (2024). “Winners with confidence: Discrete argmin inference with an application to model selection.” arXiv preprint arXiv:2408.02060.. |
argmin.LOO (HML) |
A variant of SML that uses hard argmin instead of exponential weighting. Not recommended. |
nonsplit (NS) |
A variant of SML without data splitting. Requires a fixed lambda value as an additional argument. Not recommended |
Bonferroni (MT) |
Multiple testing using Bonferroni correction. |
Gupta (GTA) |
The method proposed by Gupta (1965). Requires independence and the same population standard deviation for all dimensions. |
Futschik (FCHK) |
A two-step method from Futschik and Pflug (1995). Requires independence and the same population standard deviation for all dimensions. |
A vector of indices (1-based) representing the (1 - alpha) confidence set.
Zhang T, Lee H, Lei J (2024). “Winners with confidence: Discrete argmin inference with an application to model selection.” arXiv preprint arXiv:2408.02060.
Chernozhukov V, Chetverikov D, Kato K (2013). “Testing many moment inequalities.” RePEc. IDEAS Working Paper Series.
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) ## softmin.LOO CS.argmin(data) ## use seed CS.argmin(data, seed=13) ## argmin.LOO CS.argmin(data, method='HML') ## nonsplit CS.argmin(data, method='NS', lambda=sqrt(n)/2.5) ## Bonferroni (choose t test because of normal data) CS.argmin(data, method='MT', test='t') ## Gupta CS.argmin(data, method='GTA') ## Futschik two-step method # default alpha.1, alpha.2 CS.argmin(data, method='FCHK') alpha.1 <- 0.0005 alpha.2 <- 1 - (0.95/(1 - alpha.1)) CS.argmin(data, method='FCHK', alpha.1=0.0005, alpha.2=alpha.2)r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) ## softmin.LOO CS.argmin(data) ## use seed CS.argmin(data, seed=13) ## argmin.LOO CS.argmin(data, method='HML') ## nonsplit CS.argmin(data, method='NS', lambda=sqrt(n)/2.5) ## Bonferroni (choose t test because of normal data) CS.argmin(data, method='MT', test='t') ## Gupta CS.argmin(data, method='GTA') ## Futschik two-step method # default alpha.1, alpha.2 CS.argmin(data, method='FCHK') alpha.1 <- 0.0005 alpha.2 <- 1 - (0.95/(1 - alpha.1)) CS.argmin(data, method='FCHK', alpha.1=0.0005, alpha.2=alpha.2)
Get the index of the smallest dimension apart from an index
find.sub.argmin(nums, idx, seed = NULL)find.sub.argmin(nums, idx, seed = NULL)
nums |
A vector of numbers |
idx |
An index to be excluded |
seed |
(Optional) If provided, used to seed the random sampling (for reproducibility). |
The index of the second smallest dimension (as an integer).
nums <- c(1,3,2) find.sub.argmin(nums,1) ## return 3 nums <- c(1,1,2) find.sub.argmin(nums,1) ## return 2nums <- c(1,3,2) find.sub.argmin(nums,1) ## return 3 nums <- c(1,1,2) find.sub.argmin(nums,1) ## return 2
Generate the quantile used for the selection procedure in (Gupta 1965) by Monte Carlo estimation.
get.quantile.gupta.selection(p, alpha = 0.05, N = 1e+05)get.quantile.gupta.selection(p, alpha = 0.05, N = 1e+05)
p |
The number of dimensions in your data matrix. |
alpha |
The level of the upper quantile; defaults to 0.05 (95% percentile). |
N |
The number of Monte Carlo repetitions; defaults to 100000. |
A list containing:
critica.val |
The 1 - alpha upper quantile. |
The quantile is pre-calculated for some common configurations of (p, alpha)
Gupta SS (1965). “On Some Multiple Decision (Selection and Ranking) Rules.” Technometrics, 7(2), 225–245. doi:10.1080/00401706.1965.10490251.
Futschik A, Pflug G (1995). “Confidence Sets for Discrete Stochastic Optimization.” Annals of Operations Research, 56(1), 95–108. doi:10.1007/BF02031702.
get.quantile.gupta.selection(p=10) get.quantile.gupta.selection(p=100)get.quantile.gupta.selection(p=10) get.quantile.gupta.selection(p=100)
for LOO algorithm.Check the feasibility of a tuning parameter for LOO algorithm by examining
whether its resulting is less than a threshold value,
i.e., the first order stability is likely achieved.
For further details, we refer to the paper Zhang et al 2024.
is.lambda.feasible.LOO( lambda, scaled.difference.matrix, sample.mean = NULL, threshold = 0.08, n.pairs = 100, seed = NULL )is.lambda.feasible.LOO( lambda, scaled.difference.matrix, sample.mean = NULL, threshold = 0.08, n.pairs = 100, seed = NULL )
lambda |
The real-valued tuning parameter for exponential weightings (the calculation of softmin). |
scaled.difference.matrix |
A n by (p-1) difference scaled.difference.matrix matrix after column-wise scaling (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
sample.mean |
The sample mean of the n samples in scaled.difference.matrix; defaults to NULL. It can be calculated via colMeans(scaled.difference.matrix). If your experiment involves hypothesis testing over more than one dimension, pass sample.mean=colMeans(scaled.difference.matrix) to speed up computation. |
threshold |
A threshold value to examine if the first order stability is likely achieved; defaults to 0.08. As its value gets smaller, the first order stability tends to increase while power might decrease. |
n.pairs |
The number of |
seed |
(Optional) An integer-valued seed for subsampling. |
A boolean value indicating if the given likely gives the first order stability.
in a data-driven way.Iteratively enlarge a tuning parameter to enhance the power of hypothesis testing.
The iterative algorithm ends when an enlarged unlikely yields the first order stability.
lambda.adaptive.enlarge( lambda, scaled.difference.matrix, sample.mean = NULL, mult.factor = 2, verbose = FALSE, seed = NULL, ... )lambda.adaptive.enlarge( lambda, scaled.difference.matrix, sample.mean = NULL, mult.factor = 2, verbose = FALSE, seed = NULL, ... )
lambda |
The real-valued tuning parameter for exponential weightings (the calculation of softmin). |
scaled.difference.matrix |
A n by (p-1) difference scaled.difference.matrix matrix after column-wise scaling (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
sample.mean |
The sample mean of the n samples in scaled.difference.matrix; defaults to NULL. It can be calculated via colMeans(scaled.difference.matrix). If your experiment involves hypothesis testing over more than one dimension, pass sample.mean=colMeans(scaled.difference.matrix) to speed up computation. |
mult.factor |
In each iteration, |
verbose |
A boolean value indicating if the number of iterations should be printed to console; defaults to FALSE. |
seed |
(Optional) If provided, used to seed for tie-breaking (for reproducibility). |
... |
Additional arguments to is.lambda.feasible.LOO. |
A list containing:
lambda |
The final (enlarged) lambda that is still feasible. |
capped |
Logical, TRUE if the enlargement was capped due to reaching the threshold. |
residual.slepian |
Residual value from the feasibility check at the final lambda. |
variance.bound |
Variance bound used in the final feasibility check. |
# Simulate data set.seed(123) r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) diff.mat <- get.difference.matrix(data, r) sample.mean.r <- get.sample.mean.r(sample.mean, r) lambda <- lambda.adaptive.LOO(diff.mat, sample.mean=sample.mean.r) # Run the enlargement algorithm res <- lambda.adaptive.enlarge(lambda, diff.mat, sample.mean=sample.mean.r) res # with a seed res <- lambda.adaptive.enlarge(lambda, diff.mat, sample.mean=sample.mean.r, seed=3) res# Simulate data set.seed(123) r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) diff.mat <- get.difference.matrix(data, r) sample.mean.r <- get.sample.mean.r(sample.mean, r) lambda <- lambda.adaptive.LOO(diff.mat, sample.mean=sample.mean.r) # Run the enlargement algorithm res <- lambda.adaptive.enlarge(lambda, diff.mat, sample.mean=sample.mean.r) res # with a seed res <- lambda.adaptive.enlarge(lambda, diff.mat, sample.mean=sample.mean.r, seed=3) res
for LOO algorithm.Generate a scaled.difference.matrix-driven for LOO algorithm motivated by the derivation of the first order stability.
For its precise definition, we refer to the paper Zhang et al 2024.
lambda.adaptive.LOO( scaled.difference.matrix, sample.mean = NULL, const = 2.5, seed = NULL )lambda.adaptive.LOO( scaled.difference.matrix, sample.mean = NULL, const = 2.5, seed = NULL )
scaled.difference.matrix |
A n by (p-1) difference scaled.difference.matrix matrix after column-wise scaling (reference dimension - the rest); each of its row is a (p-1)-dimensional vector of differences. |
sample.mean |
The sample mean of the n samples in scaled.difference.matrix; defaults to NULL. It can be calculated via colMeans(scaled.difference.matrix). |
const |
A scaling constant for the scaled.difference.matrix driven |
seed |
(Optional) If provided, used to seed for tie-breaking (for reproducibility). |
A scaled.difference.matrix-driven for LOO algorithm.
# Simulate data set.seed(123) r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) diff.mat <- get.difference.matrix(data, r) sample.mean.r <- get.sample.mean.r(sample.mean, r) lambda <- lambda.adaptive.LOO(diff.mat, sample.mean=sample.mean.r)# Simulate data set.seed(123) r <- 4 n <- 200 mu <- (1:20)/20 cov <- diag(length(mu)) set.seed(108) data <- MASS::mvrnorm(n, mu, cov) sample.mean <- colMeans(data) diff.mat <- get.difference.matrix(data, r) sample.mean.r <- get.sample.mean.r(sample.mean, r) lambda <- lambda.adaptive.LOO(diff.mat, sample.mean=sample.mean.r)