Title: | Global Sensitivity Analysis of Model Outputs and Importance Measures |
---|---|
Description: | A collection of functions for sensitivity analysis of model outputs (factor screening, global sensitivity analysis and robustness analysis), for variable importance measures of data, as well as for interpretability of machine learning models. Most of the functions have to be applied on scalar output, but several functions support multi-dimensional outputs. |
Authors: | Bertrand Iooss [aut, cre], Sebastien Da Veiga [aut], Alexandre Janon [aut], Gilles Pujol [aut] |
Maintainer: | Bertrand Iooss <[email protected]> |
License: | GPL-2 |
Version: | 1.30.1 |
Built: | 2024-12-27 02:42:58 UTC |
Source: | https://github.com/cran/sensitivity |
Methods and functions for global sensitivity analysis of model outputs, importance measures and machine learning model interpretability
The sensitivity package implements some global sensitivity analysis methods and importance measures:
Linear regression importance measures in regression or classification (logistic regression) contexts (Iooss et al., 2022; Clouvel et al., 2024):
Bettonvil's sequential bifurcations (Bettonvil and Kleijnen, 1996) (sb
);
Morris's "OAT" elementary effects screening method (morris
);
Derivative-based Global Sensitivity Measures:
Poincare constants for Derivative-based Global Sensitivity Measures (DGSM) (Lamboni et al., 2013; Roustant et al., 2017) (PoincareConstant
) and (PoincareOptimal
),
Squared coefficients computation in generalized chaos via Poincare differential operators (Roustant et al., 2019) (PoincareChaosSqCoef
),
Distributed Evaluation of Local Sensitivity Analysis (DELSA) (Rakovec et al., 2014) (delsa
);
Variance-based sensitivity indices (Sobol' indices) for independent inputs:
Estimation of the Sobol' first order indices with with B-spline Smoothing (Ratto and Pagano, 2010) (sobolSmthSpl
),
Monte Carlo estimation of Sobol' indices with independent inputs (also called pick-freeze method):
Sobol' scheme (Sobol, 1993) to compute the indices given by the variance decomposition up to a specified order (sobol
),
Saltelli's scheme (Saltelli, 2002) to compute first order, second order and total indices (sobolSalt
),
Saltelli's scheme (Saltelli, 2002) to compute first order and total indices (sobol2002
),
Mauntz-Kucherenko's scheme (Sobol et al., 2007) to compute first order and total indices using improved formulas for small indices (sobol2007
),
Jansen-Sobol's scheme (Jansen, 1999) to compute first order and total indices using improved formulas (soboljansen
),
Martinez's scheme using correlation coefficient-based formulas (Martinez, 2011; Touati, 2016) to compute first order and total indices, associated with theoretical confidence intervals (sobolmartinez
and soboltouati
),
Janon-Monod's scheme (Monod et al., 2006; Janon et al., 2013) to compute first order indices with optimal asymptotic variance (sobolEff
),
Mara's scheme (Mara and Joseph, 2008) to compute first order indices with a cost independent of the dimension, via permutations on a single matrix (sobolmara
),
Mighty estimator of first-order sensitivity indices based on rank statistics (correlation coefficient of Chatterjee, 2019; Gamboa et al., 2020) (sobolrank
),
Owen's scheme (Owen, 2013) to compute first order and total indices using improved formulas (via 3 input independent matrices) for small indices (sobolowen
),
Total Interaction Indices using Liu-Owen's scheme (Liu and Owen, 2006) (sobolTIIlo
) and pick-freeze scheme (Fruth et al., 2014) (sobolTIIpf
),
Replication-based procedures:
Estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolroalhs
),
Recursive estimation of the Sobol' first order and closed second order indices using replicated orthogonal array-based Latin hypecube sample (Gilquin et al., 2016) (sobolrec
),
Estimation of the Sobol' first order, second order and total indices using the generalized method with replicated orthogonal array-based Latin hypecube sample (Tissot and Prieur, 2015) (sobolrep
),
Sobol' indices estimation under inequality constraints (Gilquin et al., 2015) by extension of the replication procedure (Tissot and Prieur, 2015) (sobolroauc
),
Estimation of the Sobol' first order and total indices with Saltelli's so-called "extended-FAST" method (Saltelli et al., 1999) (fast99
),
Estimation of the Sobol' first order and total indices with kriging-based global sensitivity analysis (Le Gratiet et al., 2014) (sobolGP
);
Variance-based sensitivity indices valid for dependent inputs:
Exact computation of Shapley effects in the linear Gaussian framework (Broto et al., 2019) (shapleyLinearGaussian
),
Computation of Shapley effects in the Gaussian linear framework with an unknown block-diagonal covariance matrix (Broto et al., 2020) (shapleyBlockEstimation
),
Johnson-Shapley indices (Iooss and Clouvel, 2024) (johnsonshap
),
Estimation of Shapley effects by examining all permutations of inputs (Song et al., 2016) (shapleyPermEx
),
Estimation of Shapley effects by randomly sampling permutations of inputs (Song et al., 2016) (shapleyPermRand
),
Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) (shapleySubsetMc
),
Estimation of Shapley effects and all Sobol indices from data using nearest neighbors (Broto et al., 2018) (using a fast approximate algorithm) or ranking (Gamboa et al., 2020) (shapleysobol_knn
) and (sobolshap_knn
),
Estimation of Shapley effects from data using nearest neighbors method (Broto et al., 2018) with an optimized/parallelized computations and bootstrap confidence intervals estimations (shapleysobol_knn
),
Estimation of Proportional Marginal Effects (PME) (Herin et al., 2024) (pme_knn
);
Support index functions (support
) of Fruth et al. (2016);
Sensitivity Indices based on Csiszar f-divergence (sensiFdiv
) (particular cases: Borgonovo's indices and mutual-information based indices) and Hilbert-Schmidt Independence Criterion (sensiHSIC
and testHSIC
) (Da Veiga, 2015; De Lozzo and Marrel, 2016; Meynaoui et al., 2019);
Non-parametric variable significance test based on the empirical process (EPtest
) of Klein and Rochet (2022);
First-order quantile-oriented sensitivity indices as defined in Fort et al. (2016) via a kernel-based estimator related (Maume-Deschamps and Niang, 2018) (qosa
);
Target Sensitivity Analysis via Hilbert-Schmidt Independence Criterion (sensiHSIC
) (Spagnol et al., 2019);
Robustness analysis by the Perturbed-Law based Indices (PLI
) of Lemaitre et al. (2015), (PLIquantile
) of Sueur et al. (2017), (PLIsuperquantile
) of Iooss et al. (2021), and extension as (PLIquantile_multivar
) and (PLIsuperquantile_multivar
) ;
Extensions to multidimensional outputs for:
Sobol' indices (sobolMultOut
): Aggregated Sobol' indices (Lamboni et al., 2011; Gamboa et al., 2014) and functional (1D) Sobol' indices,
Shapley effects and Sobol' indices (shapleysobol_knn
) and (sobolshap_knn
): Functional (1D) indices,
HSIC indices (sensiHSIC
) (Da Veiga, 2015): Aggregated HSIC, potentially via a PCA step (Da Veiga, 2015),
Morris method (morrisMultOut
).
Moreover, some utilities are provided: standard test-cases (testmodels
), weight transformation function of the output sample (weightTSA
) to perform Target Sensitivity Analysis, normal and Gumbel truncated distributions (truncateddistrib
), squared integral estimate (squaredIntEstim
), Addelman and Kempthorne construction of orthogonal arrays of strength two (addelman_const
), discrepancy criteria (discrepancyCriteria_cplus
), maximin criteria (maximin_cplus
) and template file generation (template.replace
).
The sensitivity package has been designed to work either models written in R
than external models such as heavy computational codes. This is achieved with
the input argument model
present in all functions of this package.
The argument model
is expected to be either a
funtion or a predictor (i.e. an object with a predict
function such as
lm
).
If model = m
where m
is a function, it will be invoked
once by y <- m(X)
.
If model = m
where m
is a predictor, it will be invoked
once by y <- predict(m, X)
.
X
is the design of experiments, i.e. a data.frame
with
p
columns (the input factors) and n
lines (each, an
experiment), and y
is the vector of length n
of the
model responses.
The model in invoked once for the whole design of experiment.
The argument model
can be left to NULL
. This is refered to as
the decoupled approach and used with external computational codes that rarely
run on the statistician's computer. See decoupling
.
Bertrand Iooss, Sebastien Da Veiga, Alexandre Janon and Gilles Pujol with contributions from Paul Lemaitre for PLI
, Thibault Delage and Roman Sueur for PLIquantile
, Vanessa Verges for PLIquantile
, PLIsuperquantile
, PLIquantile_multivar
and PLIsuperquantile_multivar
, Laurent Gilquin for sobolroalhs
, sobolroauc
, sobolSalt
, sobolrep
, sobolrec
, as well as addelman_const
, discrepancyCriteria_cplus
and maximin_cplus
, Loic le Gratiet for sobolGP
, Khalid Boumhaout, Taieb Touati and Bernardo Ramos for sobolowen
and soboltouati
, Jana Fruth for PoincareConstant
, sobolTIIlo
and sobolTIIpf
, Gabriel Sarazin, Amandine Marrel, Anouar Meynaoui and Reda El Amri for their contributions to sensiHSIC
and testHSIC
, Joseph Guillaume and Oldrich Rakovec for delsa
and parameterSets
, Olivier Roustant for PoincareOptimal
, PoincareChaosSqCoef
, squaredIntEstim
and support
, Eunhye Song, Barry L. Nelson and Jeremy Staum for shapleyPermEx
and shapleyPermRand
, Baptiste Broto for shapleySubsetMc
, shapleyLinearGaussian
and shapleyBlockEstimation
, Filippo Monari for (sobolSmthSpl
) and (morrisMultOut
), Marouane Il Idrissi for lmg
, pmvd
and shapleysobol_knn
, associated to Margot Herin for pme_knn
, Laura Clouvel for johnson
, Paul Rochet for EPtest
, Frank Weber and Roelof Oomen for other contributions.
(maintainer: Bertrand Iooss [email protected])
S. Da Veiga, F. Gamboa, B. Iooss and C. Prieur, Basics and trends in sensitivity analysis, Theory and practice in R, SIAM, 2021.
R. Faivre, B. Iooss, S. Mahevas, D. Makowski, H. Monod, editors, 2013, Analyse de sensibilite et exploration de modeles. Applications aux modeles environnementaux, Editions Quae.
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2023, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022. https://hal.science/hal-03741384
B. Iooss, R. Kennet and P. Secchi, 2022, Different views of interpretability, In: Interpretability for Industry 4.0: Statistical and Machine Learning Approaches, A. Lepore, B. Palumbo and J-M. Poggi (Eds), Springer.
B. Iooss and A. Saltelli, 2017, Introduction: Sensitivity analysis. In: Springer Handbook on Uncertainty Quantification, R. Ghanem, D. Higdon and H. Owhadi (Eds), Springer.
A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.
addelman_const
implements the Addelman and Kempthorne construction of orthogonal arrays of strength two.
addelman_const(dimension, levels, choice="U")
addelman_const(dimension, levels, choice="U")
dimension |
The number of columns of the orthogonal array. |
levels |
The number of levels of the orthogonal array. Either a prime number or a prime power number. |
choice |
A character from the list ("U","V","W","X") specifying which orthogonal array to construct (see "Details"). |
The method of Addelman and Kempthorne allows to construct up to four orthogonal arrays. choice
specify which orthogonal array is to be constructed. Note that the four orthognal arrays depends on each others through linear equations.
A matrix corresponding to the orthogonal array constructed.
Laurent Gilquin
A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.
dimension <- 6 levels <- 7 OA <- addelman_const(dimension,levels,choice="U")
dimension <- 6 levels <- 7 OA <- addelman_const(dimension,levels,choice="U")
correlRatio
computes the correlation ratio between
a quantitative variable and a qualitative variable
correlRatio(X, y)
correlRatio(X, y)
X |
a vector containing the quantitative variable. |
y |
a vector containing the qualitative variable (e.g. a factor). |
The value of the correlation ratio
Bertrand Iooss
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
x <- runif(100) y <- round(x) correlRatio(x,y)
x <- runif(100) y <- round(x) correlRatio(x,y)
tell
and ask
are S3 generic methods for decoupling
simulations and sensitivity measures estimations. In general, they are
not used by the end-user for a simple R model, but rather for an
external computational code. Most of the sensitivity analyses objects
of this package overload tell
, whereas ask
is overloaded
for iterative methods only.
extract
is used as a post-treatment of a sobolshap_knn
object
tell(x, y = NULL, ...) ask(x, ...) extract(x, ...)
tell(x, y = NULL, ...) ask(x, ...) extract(x, ...)
x |
a typed list storing the state of the sensitivity study
(parameters, data, estimates), as returned by sensitivity analyses
objects constructors, such as |
y |
a vector of model responses. |
... |
additional arguments, depending on the method used. |
When a sensitivity analysis method is called with no model
(i.e. argument model = NULL
), it generates an incomplete object
x
that stores the design of experiments (field X
),
allowing the user to launch "by hand" the corresponding
simulations. The method tell
allows to pass these simulation
results to the incomplete object x
, thereafter estimating the
sensitivity measures.
The extract
method is useful if in a first step the Shapley effects
have been computed and thus sensitivity indices for all possible subsets
are available. The resulting sobolshap_knn
object can be
post-treated by extract
to get first-order and total Sobol indices
very easily.
When the method is iterative, the data to simulate are not stored in
the sensitivity analysis object x
, but generated at each
iteration with the ask
method; see for example
sb
.
tell
doesn't return anything. It computes the sensitivity
measures, and stores them in the list x
.
Side effect: tell
modifies its argument x
.
ask
returns the set of data to simulate.
extract
returns an object, from a sobolshap_knn
object,
containing first-order and total Sobol indices.
Gilles Pujol and Bertrand Iooss
# Example of use of fast99 with "model = NULL" x <- fast99(model = NULL, factors = 3, n = 1000, q = "qunif", q.arg = list(min = -pi, max = pi)) y <- ishigami.fun(x$X) tell(x, y) print(x) plot(x)
# Example of use of fast99 with "model = NULL" x <- fast99(model = NULL, factors = 3, n = 1000, q = "qunif", q.arg = list(min = -pi, max = pi)) y <- ishigami.fun(x$X) tell(x, y) print(x) plot(x)
delsa
implements Distributed Evaluation of
Local Sensitivity Analysis to calculate first order parameter
sensitivity at multiple locations in parameter space. The locations
in parameter space can either be obtained by a call to parameterSets
or by specifying X0
directly, in which case the prior variance
of each parameter varprior
also needs to be specified. Via plot
(which uses functions of the package ggplot2
and reshape2
),
the indices can be visualized.
delsa(model = NULL, perturb=1.01, par.ranges, samples, method, X0, varprior, varoutput, ...) ## S3 method for class 'delsa' tell(x, y = NULL,...) ## S3 method for class 'delsa' print(x, ...) ## S3 method for class 'delsa' plot(x, which=1:3, ask = dev.interactive(), ...)
delsa(model = NULL, perturb=1.01, par.ranges, samples, method, X0, varprior, varoutput, ...) ## S3 method for class 'delsa' tell(x, y = NULL,...) ## S3 method for class 'delsa' print(x, ...) ## S3 method for class 'delsa' plot(x, which=1:3, ask = dev.interactive(), ...)
model |
a function, or a model with a |
perturb |
Perturbation used to calculate sensitivity at each evaluation location |
par.ranges |
A named list of minimum and maximum parameter values |
samples |
Number of samples to generate. For the |
method |
Sampling scheme. See |
X0 |
Parameter values at which to evaluate sensitivity indices.
Can be used instead of specifying sampling |
varprior |
Prior variance. If |
varoutput |
Output variance. If |
... |
any other arguments for |
x |
a list of class |
y |
a vector of model responses. |
which |
if a subset of the plots is required, specify a subset of the numbers 1:3 |
ask |
logical; if TRUE, the user is asked before each plot, see |
print
shows summary of the first order indices across parameter space.
plot
shows: (1) the cumulative distribution function of first order
sensitivity across parameter space, (2) variation of first order sensitivity
in relation to model response, and (3) sensitivity in relation to parameter value.
delsa
returns a list of class "delsa"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
delsafirst |
the first order indices for each location in |
deriv |
the values of derivatives for each location in |
Conversion for sensitivity
package by Joseph Guillaume,
based on original R code by Oldrich Rakovec.
Addition of the varoutput
argument by Bertrand Iooss (2020).
Rakovec, O., M. C. Hill, M. P. Clark, A. H. Weerts, A. J. Teuling, R. Uijlenhoet (2014), Distributed Evaluation of Local Sensitivity Analysis (DELSA), with application to hydrologic models, Water Resour. Res., 50, 1-18
parameterSets
which is used to generate points, sensitivity
for other methods in the package
# Test case : the non-monotonic Sobol g-function # (there are 8 factors, all following the uniform distribution on [0,1]) library(randtoolbox) x <- delsa(model=sobol.fun, par.ranges=replicate(8,c(0,1),simplify=FALSE), samples=100,method="sobol") # Summary of sensitivity indices of each parameter across parameter space print(x) library(ggplot2) library(reshape2) plot(x)
# Test case : the non-monotonic Sobol g-function # (there are 8 factors, all following the uniform distribution on [0,1]) library(randtoolbox) x <- delsa(model=sobol.fun, par.ranges=replicate(8,c(0,1),simplify=FALSE), samples=100,method="sobol") # Summary of sensitivity indices of each parameter across parameter space print(x) library(ggplot2) library(reshape2) plot(x)
Compute discrepancy criteria. This function uses a C++ implementation of the function discrepancyCriteria
from package DiceDesign.
discrepancyCriteria_cplus(design,type='all')
discrepancyCriteria_cplus(design,type='all')
design |
a matrix corresponding to the design of experiments.
The discrepancy criteria are computed for a design in the unit cube [0,1] |
|||||||||||||||
type |
type of discrepancies (single value or vector) to be computed:
|
The discrepancy measures how far a given distribution of points deviates
from a perfectly uniform one. Different discrepancies are available.
For example, if we denote by the volume of a subset
of
and
the number of points of
falling in
, the
discrepancy is:
where ,
and
. The other L2-discrepancies are defined according to the same principle with different form from the subset
.
Among all the possibilities, discrepancyCriteria_cplus implements only the L2 discrepancies because it can be expressed analytically even for high dimension.
Centered L2-discrepancy is computed using the analytical expression done by Hickernell (1998). The user will refer to Pleming and Manteufel (2005) to have more details about the wrap around discrepancy.
A list containing the L2-discrepancies of the design
.
Laurent Gilquin
Fang K.T, Li R. and Sudjianto A. (2006) Design and Modeling for Computer Experiments, Chapman & Hall.
Hickernell F.J. (1998) A generalized discrepancy and quadrature error bound. Mathematics of Computation, 67, 299-322.
Pleming J.B. and Manteufel R.D. (2005) Replicated Latin Hypercube Sampling, 46th Structures, Structural Dynamics & Materials Conference, 16-21 April 2005, Austin (Texas) – AIAA 2005-1819.
The distance criterion provided by maximin_cplus
dimension <- 2 n <- 40 X <- matrix(runif(n*dimension),n,dimension) discrepancyCriteria_cplus(X)
dimension <- 2 n <- 40 X <- matrix(runif(n*dimension),n,dimension) discrepancyCriteria_cplus(X)
EPtest
builds the non-parametric variable significance test from Klein and Rochet (2022) for the null hypothesis where
is the Sobol index for the inputs
ans
is the Sobol index for all the inputs in
.
EPtest(X, y, u = NULL, doe = NULL, Kdoe = 10, tau = 0.1)
EPtest(X, y, u = NULL, doe = NULL, Kdoe = 10, tau = 0.1)
X |
a matrix or data.frame that contains the numerical inputs as columns. |
y |
a vector of output. |
u |
the vector of indices of the columns of X for which we want to test the significance. |
doe |
the design of experiment on which the empirical process is to be evaluated. It should be independent from X. |
Kdoe |
if doe is null and Kdoe is specified, the design of experiment is taken as Kdoe points drawn uniformly independently on intervals delimited by the range of each input. |
tau |
a regularization parameter to approximate the limit chi2 distribution of the test statistics under H0. |
EPtest
returns a list containing:
statistics |
The test statistics that follows a chi-squared distribution under the null hypothesis. |
ddl |
The number of degrees of freedom used in the limit chi-square distribution for the test. |
p-value |
The test p-value. |
Paul Rochet
T. Klein and P. Rochet, Test comparison for Sobol Indices over nested sets of variables, SIAM/ASA Journal on Uncertainty Quantification 10.4 (2022): 1586-1600.
# Model: Ishigami n = 100 X = matrix(runif(3*n, -pi, pi), ncol = 3) y = ishigami.fun(X) # Test the significance of X1, H0: S1 = 0 EPtest(X[, 1], y, u = NULL) # Test if X1 is sufficient to explain Y, H0: S1 = S123 EPtest(X, y, u = 1) # Test if X3 is significant in presence of X2, H0: S2 = S23 EPtest(X[, 2:3], y, u = 1)
# Model: Ishigami n = 100 X = matrix(runif(3*n, -pi, pi), ncol = 3) y = ishigami.fun(X) # Test the significance of X1, H0: S1 = 0 EPtest(X[, 1], y, u = NULL) # Test if X1 is sufficient to explain Y, H0: S1 = S123 EPtest(X, y, u = 1) # Test if X3 is significant in presence of X2, H0: S2 = S23 EPtest(X[, 2:3], y, u = 1)
fast99
implements the so-called "extended-FAST" method
(Saltelli et al. 1999). This method allows the estimation of first
order and total Sobol' indices for all the factors (alltogether
indices, where
is the number of factors) at a
total cost of
simulations.
fast99(model = NULL, factors, n, M = 4, omega = NULL, q = NULL, q.arg = NULL, ...) ## S3 method for class 'fast99' tell(x, y = NULL, ...) ## S3 method for class 'fast99' print(x, ...) ## S3 method for class 'fast99' plot(x, ylim = c(0, 1), ...)
fast99(model = NULL, factors, n, M = 4, omega = NULL, q = NULL, q.arg = NULL, ...) ## S3 method for class 'fast99' tell(x, y = NULL, ...) ## S3 method for class 'fast99' print(x, ...) ## S3 method for class 'fast99' plot(x, ylim = c(0, 1), ...)
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
n |
an integer giving the sample size, i.e. the length of the discretization of the s-space (see Cukier et al.). |
M |
an integer specifying the interference parameter, i.e. the number of harmonics to sum in the Fourier series decomposition (see Cukier et al.). |
omega |
a vector giving the set of frequencies, one frequency for each factor (see details below). |
q |
a vector of quantile functions names corresponding to wanted factors distributions (see details below). |
q.arg |
a list of quantile functions parameters (see details below). |
x |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
... |
any other arguments for |
If not given, the set of frequencies omega
is taken from
Saltelli et al. The first frequency of the vector omega
is
assigned to each factor in turn (corresponding to the
estimation of Sobol' indices
and
),
other frequencies being assigned to the remaining factors.
If the arguments q
and q.args
are not given, the factors
are taken uniformly distributed on . The
argument
q
must be list of character strings, giving the names
of the quantile functions (one for each factor), such as qunif
,
qnorm
... It can also be a single character string, meaning
same distribution for all. The argument q.arg
must be a list of
lists, each one being additional parameters for the corresponding
quantile function. For example, the parameters of the quantile
function qunif
could be list(min=1, max=2)
, giving an
uniform distribution on . If
q
is a single
character string, then q.arg
must be a single list (rather than
a list of one list).
fast99
returns a list of class "fast99"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
V |
the estimation of variance. |
D1 |
the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor. |
Dt |
the estimations of VCE with respect to each factor
complementary set of factors ("all but |
Gilles Pujol
A. Saltelli, S. Tarantola and K. Chan, 1999, A quantitative, model independent method for global sensitivity analysis of model output, Technometrics, 41, 39–56.
R. I. Cukier, H. B. Levine and K. E. Schuler, 1978, Nonlinear sensitivity analysis of multiparameter model systems. J. Comput. Phys., 26, 1–42.
# Test case : the non-monotonic Ishigami function x <- fast99(model = ishigami.fun, factors = 3, n = 1000, q = "qunif", q.arg = list(min = -pi, max = pi)) print(x) plot(x)
# Test case : the non-monotonic Ishigami function x <- fast99(model = ishigami.fun, factors = 3, n = 1000, q = "qunif", q.arg = list(min = -pi, max = pi)) print(x) plot(x)
johnson
computes the Johnson indices for correlated input relative importance by
decomposition for linear and logistic regression models. These
indices allocates a share of
to each input based on the relative
weight allocation (RWA) system, in the case of dependent or correlated inputs.
johnson(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'johnson' print(x, ...) ## S3 method for class 'johnson' plot(x, ylim = c(0,1), ...) ## S3 method for class 'johnson' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
johnson(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'johnson' print(x, ...) ## S3 method for class 'johnson' plot(x, ylim = c(0,1), ...) ## S3 method for class 'johnson' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
X |
a data frame (or object coercible by |
y |
a vector containing the responses corresponding to the design of experiments (model output variables). |
rank |
logical. If |
logistic |
logical. If |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level of the bootstrap confidence intervals. |
x |
the object returned by |
data |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
Logistic regression model (logistic = TRUE
) and rank-based indices
(rank = TRUE
) are incompatible.
johnson
returns a list of class "johnson"
, containing the following
components:
call |
the matched call. |
johnson |
a data frame containing the estimations of the johnson indices, bias and confidence intervals. |
Bertrand Iooss and Laura Clouvel
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
J.W. Johnson, 2000, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate Behavioral Research, 35:1-19.
J.W. Johnson and J.M. LeBreton, 2004, History and use of relative importance indices in organizational research, Organizational Research Methods, 7:238-257.
src
, lmg
, pmvd
, johnsonshap
################################## # Same example than the one in src() # a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1 + X2 + X3 y <- with(X, X1 + X2 + X3) # sensitivity analysis x <- johnson(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) ################################# # Same examples than the ones in lmg() library(boot) library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ########## # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") ######### # Linear Model y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-johnson(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-johnson(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-johnson(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ####### # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-johnson(X,y, logistic = TRUE) plot(x) print(x) ################################# # Test on a modified Linkletter fct with: # - multivariate normal inputs (all multicollinear) # - in dimension 50 (there are 42 dummy inputs) # - large-size sample (1e4) library(mvtnorm) n <- 1e4 d <- 50 sigma <- matrix(0.5,ncol=d,nrow=d) diag(sigma) <- 1 X <- rmvnorm(n, rep(0,d), sigma) y <- linkletter.fun(X) joh <- johnson(X,y) sum(joh$johnson) # gives the R2 plot(joh)
################################## # Same example than the one in src() # a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1 + X2 + X3 y <- with(X, X1 + X2 + X3) # sensitivity analysis x <- johnson(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) ################################# # Same examples than the ones in lmg() library(boot) library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ########## # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") ######### # Linear Model y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-johnson(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-johnson(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-johnson(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ####### # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-johnson(X,y, logistic = TRUE) plot(x) print(x) ################################# # Test on a modified Linkletter fct with: # - multivariate normal inputs (all multicollinear) # - in dimension 50 (there are 42 dummy inputs) # - large-size sample (1e4) library(mvtnorm) n <- 1e4 d <- 50 sigma <- matrix(0.5,ncol=d,nrow=d) diag(sigma) <- 1 X <- rmvnorm(n, rep(0,d), sigma) y <- linkletter.fun(X) joh <- johnson(X,y) sum(joh$johnson) # gives the R2 plot(joh)
johnsonshap
computes the Johnson-Shapley indices for correlated input
relative importance. These indices allocate a share of the output variance to
each input based on the relative weight allocation system,
in the case of dependent or correlated inputs.
johnsonshap(model = NULL, X1, N, nboot = 0, conf = 0.95) ## S3 method for class 'johnsonshap' print(x, ...) ## S3 method for class 'johnsonshap' plot(x, ylim = c(0,1), ...) ## S3 method for class 'johnsonshap' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
johnsonshap(model = NULL, X1, N, nboot = 0, conf = 0.95) ## S3 method for class 'johnsonshap' print(x, ...) ## S3 method for class 'johnsonshap' plot(x, ylim = c(0,1), ...) ## S3 method for class 'johnsonshap' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
a data frame (or object coercible by |
N |
an integer giving the size of each replicated design for the Sobol' indices computations via the sobolrep() fct. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level of the bootstrap confidence intervals. |
x |
the object returned by |
data |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
X1 is not used to run the model but just to perform the SVD; the model is run on a specific design which is internally generated.
By using bootstrap, values in the columns 'bias' and 'std. error' are arbitrarily put at 0 because of impossible computations; values in columns 'original', 'min c.i.' and 'max c.i.' are correctly computed.
johnsonshap
returns a list of class "johnsonshap"
, containing
all the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a matrix containing the design of experiments. |
sobrepZ |
the Sobol' indices of the transformed inputs (independent) |
Wstar |
the standardized weight matrix. |
johnsonshap |
a data frame containing the estimations of the Johnson-Shapley indices, bias and confidence intervals. |
Bertrand Iooss
B. Iooss and L. Clouvel, Une methode d'approximation des effets de Shapley en grande dimension, 54emes Journees de Statistique, Bruxelles, Belgique, July 3-7, 2023
library(ggplot2) library(boot) ##################################################### # Test case: the non-monotonic Sobol g-function (with independent inputs) n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- johnsonshap(model = sobol.fun, X1 = X, N = n) print(x) plot(x) ggplot(x) ############################################# # 3D analytical toy functions described in Iooss & Clouvel (2023) library(mvtnorm) Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) # 2 correlated inputs Cov3d2 <- function(rho){ # correl (X1,X2) Cormat <- matrix(c(1,rho,0,rho,1,0,0,0,1),3,3) return( ( sig %*% t(sig) ) * Cormat) } mu3d <- c(1,0,0) ; sig3d <- c(0.25,1,1) d <- 3 ; mu <- mu3d ; sig <- sig3d ; Covm <- Cov3d2 Xvec <- c("X1","X2","X3") n <- 1e4 # initial sample size N <- 1e4 # cost to estimate indices rho <- 0.9 # correlation coef for dependent inputs' case ################ # Linear model + a strong 2nd order interaction toy3d <- function(x) return(x[,1]*(1+x[,1]*(cos(x[,2]+x[,3])^2))) # interaction X2X3 toy <- toy3d # Independent case Covmat <- Covm(0) X <- as.data.frame(Xall(n)) Y <- toy(X) joh <- johnson(X, Y, nboot=100) print(joh) johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100) print(johshap) ggplot(johshap) # Dependent case Covmat <- Covm(rho) Xdep <- as.data.frame(Xall(n)) Ydep <- toy(Xdep) joh <- johnson(Xdep, Ydep, nboot=0) print(joh) johshap <- johnsonshap(model = toy, X1 = Xdep, N = N, nboot=100) print(johshap) ggplot(johshap) ################ # Strongly non-inear model + a strong 2nd order interaction toy3dNL <- function(x) return(sin(x[,1]*pi/2)*(1+x[,1]*(cos(x[,2]+x[,3])^2))) # non linearity in X1 toy <- toy3dNL # Independent case Covmat <- Covm(0) X <- as.data.frame(Xall(n)) Y <- toy(X) joh <- johnson(X, Y, nboot=100) print(joh) johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100) print(johshap) ggplot(johshap) # Dependent case Covmat <- Covm(rho) Xdep <- as.data.frame(Xall(n)) Ydep <- toy(Xdep) joh <- johnson(Xdep, Ydep, nboot=0) print(joh) johshap <- johnsonshap(model = NULL, X1 = Xdep, N = N, nboot=100) y <- toy(johshap$X) tell(johshap, y) print(johshap) ggplot(johshap)
library(ggplot2) library(boot) ##################################################### # Test case: the non-monotonic Sobol g-function (with independent inputs) n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- johnsonshap(model = sobol.fun, X1 = X, N = n) print(x) plot(x) ggplot(x) ############################################# # 3D analytical toy functions described in Iooss & Clouvel (2023) library(mvtnorm) Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) # 2 correlated inputs Cov3d2 <- function(rho){ # correl (X1,X2) Cormat <- matrix(c(1,rho,0,rho,1,0,0,0,1),3,3) return( ( sig %*% t(sig) ) * Cormat) } mu3d <- c(1,0,0) ; sig3d <- c(0.25,1,1) d <- 3 ; mu <- mu3d ; sig <- sig3d ; Covm <- Cov3d2 Xvec <- c("X1","X2","X3") n <- 1e4 # initial sample size N <- 1e4 # cost to estimate indices rho <- 0.9 # correlation coef for dependent inputs' case ################ # Linear model + a strong 2nd order interaction toy3d <- function(x) return(x[,1]*(1+x[,1]*(cos(x[,2]+x[,3])^2))) # interaction X2X3 toy <- toy3d # Independent case Covmat <- Covm(0) X <- as.data.frame(Xall(n)) Y <- toy(X) joh <- johnson(X, Y, nboot=100) print(joh) johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100) print(johshap) ggplot(johshap) # Dependent case Covmat <- Covm(rho) Xdep <- as.data.frame(Xall(n)) Ydep <- toy(Xdep) joh <- johnson(Xdep, Ydep, nboot=0) print(joh) johshap <- johnsonshap(model = toy, X1 = Xdep, N = N, nboot=100) print(johshap) ggplot(johshap) ################ # Strongly non-inear model + a strong 2nd order interaction toy3dNL <- function(x) return(sin(x[,1]*pi/2)*(1+x[,1]*(cos(x[,2]+x[,3])^2))) # non linearity in X1 toy <- toy3dNL # Independent case Covmat <- Covm(0) X <- as.data.frame(Xall(n)) Y <- toy(X) joh <- johnson(X, Y, nboot=100) print(joh) johshap <- johnsonshap(model = toy, X1 = X, N = N, nboot=100) print(johshap) ggplot(johshap) # Dependent case Covmat <- Covm(rho) Xdep <- as.data.frame(Xall(n)) Ydep <- toy(Xdep) joh <- johnson(Xdep, Ydep, nboot=0) print(joh) johshap <- johnsonshap(model = NULL, X1 = Xdep, N = N, nboot=100) y <- toy(johshap$X) tell(johshap, y) print(johshap) ggplot(johshap)
decomposition for linear and logistic regression modelslmg
computes the Lindeman, Merenda and Gold (LMG) indices for correlated
input relative importance by decomposition for linear and logistic
regression models. These indices allocates a share of
to each input
based on the Shapley attribution system, in the case of dependent or correlated inputs.
lmg(X, y, logistic = FALSE, rank = FALSE, nboot = 0, conf = 0.95, max.iter = 1000, parl = NULL) ## S3 method for class 'lmg' print(x, ...) ## S3 method for class 'lmg' plot(x, ylim = c(0,1), ...)
lmg(X, y, logistic = FALSE, rank = FALSE, nboot = 0, conf = 0.95, max.iter = 1000, parl = NULL) ## S3 method for class 'lmg' print(x, ...) ## S3 method for class 'lmg' plot(x, ylim = c(0,1), ...)
X |
a matrix or data frame containing the observed covariates (i.e., features, input variables...). |
y |
a numeric vector containing the observed outcomes (i.e.,
dependent variable). If |
logistic |
logical. If |
rank |
logical. If |
nboot |
the number of bootstrap replicates for the computation of confidence intervals. |
conf |
the confidence level of the bootstrap confidence intervals. |
max.iter |
if |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
The computation is done using the subset procedure, defined in Broto, Bachoc
and Depecker (2020), that is computing all the for all possible
sub-models first, and then affecting the Shapley weights according to the Lindeman,
Merenda and Gold (1980) definition.
For logistic regression (logistic=TRUE
), the
value is equal to:
If either a logistic regression model (logistic = TRUE
), or any column
of X
is categorical (i.e., of class factor
), then the rank-based
indices cannot be computed. In both those cases, rank = FALSE
is forced
by default (with a warning
).
If too many cores for the machine are passed on to the parl
argument,
the chosen number of cores is defaulted to the available cores minus one.
lmg
returns a list of class "lmg"
, containing the following
components:
call |
the matched call. |
lmg |
a data frame containing the estimations of the LMG indices. |
R2s |
the estimations of the |
indices |
list of all subsets corresponding to the structure of R2s. |
w |
the Shapley weights. |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
logistic |
logical. |
boot |
logical. |
nboot |
number of bootstrap replicates. |
rank |
logical. |
parl |
number of chosen cores for the computation. |
conf |
level for the confidence intervals by bootstrap. |
Marouane Il Idrissi
Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).
D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs, Environmental Modelling & Software, 143, 105115, 2021
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Actes des 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp 497-502, Nice, France, Juin 2021
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
Lindeman RH, Merenda PF, Gold RZ (1980). Introduction to Bivariate and Multivariate Analysis. Scott, Foresman, Glenview, IL.
pcc
, src
, johnson
, shapleyPermEx
, shapleysobol_knn
, pmvd
, pme_knn
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ############################ # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") ############################# # Linear Model y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-lmg(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-lmg(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-lmg(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ############################ # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-lmg(X,y, logistic = TRUE) plot(x) print(x) # Parallel computing #x<-lmg(X,y, logistic = TRUE, parl=2) #plot(x) #print(x)
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ############################ # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") ############################# # Linear Model y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-lmg(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-lmg(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-lmg(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ############################ # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-lmg(X,y, logistic = TRUE) plot(x) print(x) # Parallel computing #x<-lmg(X,y, logistic = TRUE, parl=2) #plot(x) #print(x)
Compute the maximin
criterion (also called mindist). This function uses a C++ implementation of the function mindist from package DiceDesign.
maximin_cplus(design)
maximin_cplus(design)
design |
a matrix representing the design of experiments in the unit cube [0,1] |
The maximin criterion is defined by:
where is the minimal distance between the point
and the other points
of the
design
.
A higher value corresponds to a more regular scaterring of design points.
A real number equal to the value of the maximin criterion for the design
.
Laurent Gilquin
Gunzburer M., Burkdart J. (2004) Uniformity measures for point samples in hypercubes https://people.sc.fsu.edu/~jburkardt/.
Jonshon M.E., Moore L.M. and Ylvisaker D. (1990) Minmax and maximin distance designs, J. of Statis. Planning and Inference, 26, 131-148.
Chen V.C.P., Tsui K.L., Barton R.R. and Allen J.K. (2003) A review of design and modeling in computer experiments, Handbook of Statistics, 22, 231-261.
discrepancy measures provided by discrepancyCriteria_cplus
.
dimension <- 2 n <- 40 X <- matrix(runif(n*dimension),n,dimension) maximin_cplus(X)
dimension <- 2 n <- 40 X <- matrix(runif(n*dimension),n,dimension) maximin_cplus(X)
morris
implements the Morris's elementary effects screening
method (Morris, 1991). This method, based on design of experiments,
allows to identify the few important factors at a cost of simulations (where
is the number
of factors). This implementation includes some improvements of the
original method: space-filling optimization of the design (Campolongo
et al. 2007) and simplex-based design (Pujol 2009).
morris(model = NULL, factors, r, design, binf = 0, bsup = 1, scale = TRUE, ...) ## S3 method for class 'morris' tell(x, y = NULL, ...) ## S3 method for class 'morris' print(x, ...) ## S3 method for class 'morris' plot(x, identify = FALSE, atpen = FALSE, y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'morris' plot3d(x, alpha = c(0.2, 0), sphere.size = 1, y_col = NULL, y_dim3 = NULL)
morris(model = NULL, factors, r, design, binf = 0, bsup = 1, scale = TRUE, ...) ## S3 method for class 'morris' tell(x, y = NULL, ...) ## S3 method for class 'morris' print(x, ...) ## S3 method for class 'morris' plot(x, identify = FALSE, atpen = FALSE, y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'morris' plot3d(x, alpha = c(0.2, 0), sphere.size = 1, y_col = NULL, y_dim3 = NULL)
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
r |
either an integer giving the number of repetitions of the design,
i.e. the number of elementary effect computed per factor, or a
vector of two integers |
design |
a list specifying the design type and its parameters:
|
binf |
either an integer, specifying the minimum value for the factors, or a vector for different values for each factor. |
bsup |
either an integer, specifying the maximum value for the factors, or a vector for different values for each factor. |
scale |
logical. If |
x |
a list of class |
y |
a vector of model responses. |
identify |
logical. If |
atpen |
logical. If |
y_col |
an integer defining the index of the column of |
y_dim3 |
an integer defining the index in the third dimension of
|
alpha |
a vector of three values between 0.0 (fully transparent) and 1.0
(opaque) (see |
sphere.size |
a numeric value, the scale factor for displaying the spheres. |
... |
for |
plot.morris
draws the graph.
plot3d.morris
draws the graph (requires the rgl package). On this graph, the
points are in a domain bounded by a cone and two planes (application
of the Cauchy-Schwarz inequality).
When using the space-filling improvement (Campolongo et al. 2007) of the Morris design, we recommend to install before the "pracma" R package: its "distmat"" function makes running the function with a large number of initial estimates (r2) significantly faster (by accelerating the inter-point distances calculations).
This version of morris
also supports matrices and three-dimensional
arrays as output of model
.
morris
returns a list of class "morris"
, containing all
the input argument detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
either a vector, a matrix or a three-dimensional array of model
responses (depends on the output of |
ee |
|
Notice that the statistics of interest (,
and
) are not stored. They can be printed by the
print
method, but to extract numerical values, one has to
compute them with the following instructions:
If x$y
is a vector:
mu <- apply(x$ee, 2, mean) mu.star <- apply(x$ee, 2, function(x) mean(abs(x))) sigma <- apply(x$ee, 2, sd)
If x$y
is a matrix:
mu <- apply(x$ee, 3, function(M){ apply(M, 2, mean) }) mu.star <- apply(abs(x$ee), 3, function(M){ apply(M, 2, mean) }) sigma <- apply(x$ee, 3, function(M){ apply(M, 2, sd) })
If x$y
is a three-dimensional array:
mu <- sapply(1:dim(x$ee)[4], function(i){ apply(x$ee[, , , i, drop = FALSE], 3, function(M){ apply(M, 2, mean) }) }, simplify = "array") mu.star <- sapply(1:dim(x$ee)[4], function(i){ apply(abs(x$ee)[, , , i, drop = FALSE], 3, function(M){ apply(M, 2, mean) }) }, simplify = "array") sigma <- sapply(1:dim(x$ee)[4], function(i){ apply(x$ee[, , , i, drop = FALSE], 3, function(M){ apply(M, 2, sd) }) }, simplify = "array")
It is highly recommended to use the function with the argument
scale = TRUE
to avoid an uncorrect interpretation of factors that
would have different orders of magnitude.
when generating the design of experiments, identical repetitions are removed, leading to a lower number than requested.
Gilles Pujol, with contributions from Frank Weber (2016)
M. D. Morris, 1991, Factorial sampling plans for preliminary computational experiments, Technometrics, 33, 161–174.
F. Campolongo, J. Cariboni and A. Saltelli, 2007, An effective screening design for sensitivity, Environmental Modelling and Software, 22, 1509–1518.
G. Pujol, 2009, Simplex-based screening designs for estimating metamodels, Reliability Engineering and System Safety 94, 1156–1160.
# Test case : the non-monotonic function of Morris x <- morris(model = morris.fun, factors = 20, r = 4, design = list(type = "oat", levels = 5, grid.jump = 3)) print(x) plot(x) library(rgl) plot3d.morris(x) # (requires the package 'rgl') # Only for demonstration purposes: a model function returning a matrix morris.fun_matrix <- function(X){ res_vector <- morris.fun(X) cbind(res_vector, 2 * res_vector) } x <- morris(model = morris.fun_matrix, factors = 20, r = 4, design = list(type = "oat", levels = 5, grid.jump = 3)) plot(x, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array morris.fun_array <- function(X){ res_vector <- morris.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x <- morris(model = morris.fun_array, factors = 20, r = 4, design = list(type = "simplex", scale.factor = 1)) plot(x, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
# Test case : the non-monotonic function of Morris x <- morris(model = morris.fun, factors = 20, r = 4, design = list(type = "oat", levels = 5, grid.jump = 3)) print(x) plot(x) library(rgl) plot3d.morris(x) # (requires the package 'rgl') # Only for demonstration purposes: a model function returning a matrix morris.fun_matrix <- function(X){ res_vector <- morris.fun(X) cbind(res_vector, 2 * res_vector) } x <- morris(model = morris.fun_matrix, factors = 20, r = 4, design = list(type = "oat", levels = 5, grid.jump = 3)) plot(x, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array morris.fun_array <- function(X){ res_vector <- morris.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x <- morris(model = morris.fun_array, factors = 20, r = 4, design = list(type = "simplex", scale.factor = 1)) plot(x, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
morrisMultOut
extend the Morris's elementary effects screening
method (Morris 1991) to model with multidimensional outputs.
morrisMultOut(model = NULL, factors, r, design, binf = 0, bsup = 1, scale = TRUE, ...) ## S3 method for class 'morrisMultOut' tell(x, y = NULL, ...)
morrisMultOut(model = NULL, factors, r, design, binf = 0, bsup = 1, scale = TRUE, ...) ## S3 method for class 'morrisMultOut' tell(x, y = NULL, ...)
model |
NULL or a function returning a outputs a matrix having as columns the model outputs. |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
r |
either an integer giving the number of repetitions of the design,
i.e. the number of elementary effect computed per factor, or a
vector of two integers |
design |
a list specifying the design type and its parameters:
|
binf |
either an integer, specifying the minimum value for the factors, or a vector for different values for each factor. |
bsup |
either an integer, specifying the maximum value for the factors, or a vector for different values for each factor. |
scale |
logical. If |
x |
a list of class |
y |
a vector of model responses. |
... |
for |
All the methods available for object of class "morris"
are available also for objects of class "morrisMultOut"
.
See the documentation relative to the function "morris"
for more details.
morrisMultOut
returns a list of class "c(morrisMultOut, morris)"
, containing all
the input argument detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a matrix having as columns the model responses. |
ee |
a vector of aggregated elementary effects. |
Filippo Monari
Monari F. and P. Strachan, 2017. Characterization of an airflow network model by sensitivity analysis: parameter screening, fixing, prioritizing and mapping. Journal of Building Performance Simulation, 2017, 10, 17-36.
mdl <- function (X) t(atantemp.fun(X)) x = morrisMultOut(model = mdl, factors = 4, r = 50, design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, scale = FALSE) print(x) plot(x) x = morrisMultOut(model = NULL, factors = 4, r = 50, design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, scale = FALSE) Y = mdl(x[['X']]) tell(x, Y) print(x) plot(x)
mdl <- function (X) t(atantemp.fun(X)) x = morrisMultOut(model = mdl, factors = 4, r = 50, design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, scale = FALSE) print(x) plot(x) x = morrisMultOut(model = NULL, factors = 4, r = 50, design = list(type = "oat", levels = 5, grid.jump = 3), binf = -1, bsup = 5, scale = FALSE) Y = mdl(x[['X']]) tell(x, Y) print(x) plot(x)
Generate parameter sets from given ranges, with chosen sampling scheme
parameterSets(par.ranges, samples, method = c("sobol", "innergrid", "grid"))
parameterSets(par.ranges, samples, method = c("sobol", "innergrid", "grid"))
par.ranges |
A named list of minimum and maximum parameter values |
samples |
Number of samples to generate. For the |
method |
the sampling scheme; see Details |
Method "sobol"
generates uniformly distributed Sobol low discrepancy numbers,
using the sobol function in the randtoolbox package.
Method "grid"
generates a grid within the parameter ranges, including its extremes,
with number of points determined by samples
Method "innergrid"
generates a grid within the parameter ranges, with edges
of the grid offset from the extremes. The offset is calculated as half
of the resolution of the grid diff(par.ranges)/samples/2
.
the result is a matrix
, with named columns for each parameter in par.ranges
.
Each row represents one parameter set.
Joseph Guillaume, based on similar function by Felix Andrews
delsa
, which uses this function
X.grid <- parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=c(10,10),method="grid") plot(X.grid) X.innergrid<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=c(10,10),method="innergrid") points(X.innergrid,col="red") library(randtoolbox) X.sobol<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=100,method="sobol") plot(X.sobol)
X.grid <- parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=c(10,10),method="grid") plot(X.grid) X.innergrid<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=c(10,10),method="innergrid") points(X.innergrid,col="red") library(randtoolbox) X.sobol<-parameterSets(par.ranges=list(V1=c(1,1000),V2=c(1,4)), samples=100,method="sobol") plot(X.sobol)
pcc
computes the Partial Correlation Coefficients (PCC),
Semi-Partial Correlation Coefficients (SPCC), Partial Rank Correlation
Coefficients (PRCC) or Semi-Partial Rank Correlation Coefficients (SPRCC),
which are variance-based measures based on linear (resp. monotonic)
assumptions, in the case of (linearly) correlated factors.
pcc(X, y, rank = FALSE, semi = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'pcc' print(x, ...) ## S3 method for class 'pcc' plot(x, ylim = c(-1,1), ...) ## S3 method for class 'pcc' ggplot(data, mapping = aes(), ..., environment = parent.frame(), ylim = c(-1,1))
pcc(X, y, rank = FALSE, semi = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'pcc' print(x, ...) ## S3 method for class 'pcc' plot(x, ylim = c(-1,1), ...) ## S3 method for class 'pcc' ggplot(data, mapping = aes(), ..., environment = parent.frame(), ylim = c(-1,1))
X |
a data frame (or object coercible by |
y |
a vector containing the responses corresponding to the design of experiments (model output variables). |
rank |
logical. If |
semi |
logical. If |
logistic |
logical. If |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level of the bootstrap confidence intervals. |
x |
the object returned by |
data |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
Logistic regression model (logistic = TRUE
) and rank-based indices
(rank = TRUE
) are incompatible.
pcc
returns a list of class "pcc"
, containing the following
components:
call |
the matched call. |
PCC |
a data frame containing the estimations of the PCC
indices, bias and confidence intervals (if |
PRCC |
a data frame containing the estimations of the PRCC
indices, bias and confidence intervals (if |
SPCC |
a data frame containing the estimations of the PCC
indices, bias and confidence intervals (if |
SPRCC |
a data frame containing the estimations of the PRCC
indices, bias and confidence intervals (if |
Gilles Pujol and Bertrand Iooss
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2023, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
J.W. Johnson and J.M. LeBreton, 2004, History and use of relative importance indices in organizational research, Organizational Research Methods, 7:238-257.
A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.
# a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1^2 + X2 + X3 y <- with(X, X1^2 + X2 + X3) # sensitivity analysis x <- pcc(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) ggplot(x, ylim = c(-1.5,1.5)) x <- pcc(X, y, semi = TRUE, nboot = 100) print(x) plot(x)
# a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1^2 + X2 + X3 y <- with(X, X1^2 + X2 + X3) # sensitivity analysis x <- pcc(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) ggplot(x, ylim = c(-1.5,1.5)) x <- pcc(X, y, semi = TRUE, nboot = 100) print(x) plot(x)
PLI computes the Perturbed-Law based Indices (PLI), also known as the Density Modification Based Reliability Sensitivity Indices (DMBRSI), which are robustness indices related to a probability of exceedence of a model output (i.e. a failure probability), estimated by a Monte Carlo method. See Lemaitre et al. (2015).
PLI(failurepoints,failureprobabilityhat,samplesize,deltasvector, InputDistributions,type="MOY",samedelta=TRUE)
PLI(failurepoints,failureprobabilityhat,samplesize,deltasvector, InputDistributions,type="MOY",samedelta=TRUE)
failurepoints |
a matrix of failure points coordinates, one column per variable. |
failureprobabilityhat |
the estimation of failure probability P through rough Monte Carlo method. |
samplesize |
the size of the sample used to estimate P. One must have Pchap=dim(failurepoints)[1]/samplesize |
deltasvector |
a vector containing the values of delta for which the indices will be computed. |
InputDistributions |
a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far:
|
type |
a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation:
|
samedelta |
a boolean used with the value "MOY" for type.
|
PLI
returns a list of matrices, containing:
A matrix where the PLI are stored. Each column corresponds to an input, each line corresponds to a twist of amplitude delta.
A matrix where their standard deviation are stored.
Paul Lemaitre and Bertrand Iooss
C. Gauchy and J. Stenger and R. Sueur and B. Iooss, An information geometry approach for robustness analysis in uncertainty quantification of computer codes, Technometrics, 64:80-91, 2022.
P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.
E. Borgonovo and B. Iooss, 2017, Moment independent importance measures and a common rationale, In: Springer Handbook on UQ, R. Ghanem, D. Higdon and H. Owhadi (Eds).
PLIquantile, PLIquantile_multivar, PLIsuperquantile,
PLIsuperquantile_multivar
# Model: Ishigami function with a treshold at -7 # Failure points are those < -7 distributionIshigami = list() for (i in 1:3){ distributionIshigami[[i]]=list("unif",c(-pi,pi)) distributionIshigami[[i]]$r=("runif") } # Monte Carlo sampling to obtain failure points N = 100000 X = matrix(0,ncol=3,nrow=N) for( i in 1:3) X[,i] = runif(N,-pi,pi) T = ishigami.fun(X) s = sum(as.numeric(T < -7)) # Number of failure pdefchap = s/N # Failure probability ptsdef = X[T < -7,] # Failure points # sensitivity indices with perturbation of the mean v_delta = seq(-3,3,1/20) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY", samedelta=TRUE) BIshm = Toto[[1]] SIshm = Toto[[2]] par(mfrow=c(1,1),mar=c(4,5,1,1)) plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red") lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red") abline(h=0,lty=2) legend(0,3,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # sensitivity indices with perturbation of the variance v_delta = seq(1,5,1/4) # user parameter. (the true variance is 3.29) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="VAR", samedelta=TRUE) BIshv=Toto[[1]] SIshv=Toto[[2]] par(mfrow=c(2,1),mar=c(1,5,1,1)+0.1) plot(v_delta,BIshv[,2],ylim=c(-.5,.5),xlab=expression(V_f), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red") lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red") par(mar=c(4,5.1,1.1,1.1)) plot(v_delta,BIshv[,2],ylim=c(-30,.7),xlab=expression(V[f]), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red") lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red") legend(2.5,-10,legend=c("X1","X2","X3"),col=c("darkgreen","black","red"), pch=c(15,19,17),cex=1.5) ############################################################## # Example with an inverse probability transform # (to obtain Gaussian inputs from Uniform ones) # Monte Carlo sampling (the inputs are Uniform) N = 100000 X = matrix(0,ncol=3,nrow=N) for( i in 1:3) X[,i] = runif(N,-pi,pi) T = ishigami.fun(X) s = sum(as.numeric(T < -7)) # Number of failure pdefchap = s/N # Failure probability # Empirical transform (applied on the sample) Xn <- matrix(0,nrow=N,ncol=3) for (i in 1:3){ ecdfx <- ecdf(X[,i]) q <- ecdfx(X[,i]) Xn[,i] <- qnorm(q) # Gaussian anamorphosis # infinite max values => putting the symetrical values of min values Xn[which(Xn[,i]==Inf),i] <- - Xn[which.min(Xn[,i]),i] } # Visualization of a perturbed density (the one of X1 perturbed on the mean) delta_mean_gauss <- 1 # perturbed value on the mean of the Gaussian transform Xtr <- quantile(ecdfx,pnorm(Xn[,1] + delta_mean_gauss)) # backtransform par(mfrow=c(1,1)) plot(density(Xtr), col="red") ; lines(density(X[,1])) # sensitivity indices with perturbation of the mean distributionIshigami = list() for (i in 1:3){ distributionIshigami[[i]]=list("norm",c(0,1)) distributionIshigami[[i]]$r=("rnorm") } ptsdef = Xn[T < -7,] # Failure points # failure points with Gaussian distrib. v_delta = seq(-1.5,1.5,1/20) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY", samedelta=TRUE) BIshm = Toto[[1]] SIshm = Toto[[2]] par(mfrow=c(1,1),mar=c(4,5,1,1)) plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red") lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red") abline(h=0,lty=2) legend(0,3,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
# Model: Ishigami function with a treshold at -7 # Failure points are those < -7 distributionIshigami = list() for (i in 1:3){ distributionIshigami[[i]]=list("unif",c(-pi,pi)) distributionIshigami[[i]]$r=("runif") } # Monte Carlo sampling to obtain failure points N = 100000 X = matrix(0,ncol=3,nrow=N) for( i in 1:3) X[,i] = runif(N,-pi,pi) T = ishigami.fun(X) s = sum(as.numeric(T < -7)) # Number of failure pdefchap = s/N # Failure probability ptsdef = X[T < -7,] # Failure points # sensitivity indices with perturbation of the mean v_delta = seq(-3,3,1/20) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY", samedelta=TRUE) BIshm = Toto[[1]] SIshm = Toto[[2]] par(mfrow=c(1,1),mar=c(4,5,1,1)) plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red") lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red") abline(h=0,lty=2) legend(0,3,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # sensitivity indices with perturbation of the variance v_delta = seq(1,5,1/4) # user parameter. (the true variance is 3.29) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="VAR", samedelta=TRUE) BIshv=Toto[[1]] SIshv=Toto[[2]] par(mfrow=c(2,1),mar=c(1,5,1,1)+0.1) plot(v_delta,BIshv[,2],ylim=c(-.5,.5),xlab=expression(V_f), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red") lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red") par(mar=c(4,5.1,1.1,1.1)) plot(v_delta,BIshv[,2],ylim=c(-30,.7),xlab=expression(V[f]), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshv[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshv[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshv[,2]+1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,2]-1.96*SIshv[,2],col="black") lines(v_delta,BIshv[,1]+1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,1]-1.96*SIshv[,1],col="darkgreen") lines(v_delta,BIshv[,3]+1.96*SIshv[,3],col="red") lines(v_delta,BIshv[,3]-1.96*SIshv[,3],col="red") legend(2.5,-10,legend=c("X1","X2","X3"),col=c("darkgreen","black","red"), pch=c(15,19,17),cex=1.5) ############################################################## # Example with an inverse probability transform # (to obtain Gaussian inputs from Uniform ones) # Monte Carlo sampling (the inputs are Uniform) N = 100000 X = matrix(0,ncol=3,nrow=N) for( i in 1:3) X[,i] = runif(N,-pi,pi) T = ishigami.fun(X) s = sum(as.numeric(T < -7)) # Number of failure pdefchap = s/N # Failure probability # Empirical transform (applied on the sample) Xn <- matrix(0,nrow=N,ncol=3) for (i in 1:3){ ecdfx <- ecdf(X[,i]) q <- ecdfx(X[,i]) Xn[,i] <- qnorm(q) # Gaussian anamorphosis # infinite max values => putting the symetrical values of min values Xn[which(Xn[,i]==Inf),i] <- - Xn[which.min(Xn[,i]),i] } # Visualization of a perturbed density (the one of X1 perturbed on the mean) delta_mean_gauss <- 1 # perturbed value on the mean of the Gaussian transform Xtr <- quantile(ecdfx,pnorm(Xn[,1] + delta_mean_gauss)) # backtransform par(mfrow=c(1,1)) plot(density(Xtr), col="red") ; lines(density(X[,1])) # sensitivity indices with perturbation of the mean distributionIshigami = list() for (i in 1:3){ distributionIshigami[[i]]=list("norm",c(0,1)) distributionIshigami[[i]]$r=("rnorm") } ptsdef = Xn[T < -7,] # Failure points # failure points with Gaussian distrib. v_delta = seq(-1.5,1.5,1/20) Toto = PLI(failurepoints=ptsdef,failureprobabilityhat=pdefchap,samplesize=N, deltasvector=v_delta,InputDistributions=distributionIshigami,type="MOY", samedelta=TRUE) BIshm = Toto[[1]] SIshm = Toto[[2]] par(mfrow=c(1,1),mar=c(4,5,1,1)) plot(v_delta,BIshm[,2],ylim=c(-4,4),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,BIshm[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,BIshm[,3],col="red",pch=17,cex=1.5) lines(v_delta,BIshm[,2]+1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,2]-1.96*SIshm[,2],col="black") lines(v_delta,BIshm[,1]+1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,1]-1.96*SIshm[,1],col="darkgreen") lines(v_delta,BIshm[,3]+1.96*SIshm[,3],col="red") lines(v_delta,BIshm[,3]-1.96*SIshm[,3],col="red") abline(h=0,lty=2) legend(0,3,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5)
PLIquantile computes the Perturbed-Law based Indices (PLI) for quantile, which are robustness indices related to a quantile of a model output, estimated by a Monte Carlo method, See Sueur et al. (2017) and Iooss et al. (2020).
PLIquantile(order,x,y,deltasvector,InputDistributions,type="MOY",samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
PLIquantile(order,x,y,deltasvector,InputDistributions,type="MOY",samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
order |
the order of the quantile to estimate. |
x |
the matrix of simulation points coordinates, one column per variable. |
y |
the vector of model outputs. |
deltasvector |
a vector containing the values of delta for which the indices will be computed. |
InputDistributions |
a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far:
|
type |
a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation:
|
samedelta |
a boolean used with the value "MOY" for type.
|
percentage |
a boolean that defines the formula used for the PLI.
|
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
bootsample |
If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values. |
PLIquantile
returns a list of matrices (each column corresponds to an input,
each line corresponds to a twist of amplitude delta)
containing the following components:
PLI |
the PLI. |
PLICIinf |
the bootstrap lower confidence interval values of the PLI. |
PLICIsup |
the bootstrap upper confidence interval values of the PLI. |
quantile |
the perturbed quantile. |
quantileCIinf |
the bootstrap lower confidence interval values of the perturbed quantile. |
quantileCIsup |
the bootstrap upper confidence interval values of the perturbed quantile. |
Paul Lemaitre, Bertrand Iooss, Thibault Delage and Roman Sueur
T. Delage, R. Sueur and B. Iooss, 2018, Robustness analysis of epistemic uncertainties propagation studies in LOCA assessment thermal-hydraulic model, ANS Best Estimate Plus Uncertainty International Conference (BEPU 2018), Lucca, Italy, May 13-19, 2018.
C. Gauchy, J. Stenger, R. Sueur and B. Iooss, 2022, An information geometry approach for robustness analysis in uncertainty quantification of computer codes, Technometrics, 64:80-91.
B. Iooss, V. Verges and V. Larget, 2022, BEPU robustness analysis via perturbed law-based sensitivity indices, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 236:855-865.
P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, 2015, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.
R. Sueur, N. Bousquet, B. Iooss and J. Bect, 2016, Perturbed-Law based sensitivity Indices for sensitivity analysis in structural reliability, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.
R. Sueur, B. Iooss and T. Delage, 2017, Sensitivity analysis using perturbed-law based indices for quantiles and application to an industrial case, 10th International Conference on Mathematical Methods in Reliability (MMR 2017), Grenoble, France, July 2017.
PLI, PLIsuperquantile PLIquantile_multivar,
PLIsuperquantile_multivar
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) # Monte Carlo sampling N = 5000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 # quantile order q95 = quantile(Y,alpha) nboot=20 # put nboot=200 for consistency # sensitivity indices with perturbation of the mean v_delta = seq(-1,1,1/10) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=FALSE,nboot=nboot) # Plotting the PLI par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-1.5,1.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(0.8,1.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the perturbed quantiles par(mar=c(4,5,1,1)) plot(v_delta,toto$quantile[,2],ylim=c(1.5,6.5),xlab=expression(delta), ylab=expression(hat(q[i*delta])),pch=19,cex=1.5) points(v_delta,toto$quantile[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$quantile[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$quantileCIinf[,2],col="black") lines(v_delta,toto$quantileCIsup[,2],col="black") lines(v_delta,toto$quantileCIinf[,1],col="darkgreen") lines(v_delta,toto$quantileCIsup[,1],col="darkgreen") lines(v_delta,toto$quantileCIinf[,3],col="red") lines(v_delta,toto$quantileCIsup[,3],col="red") abline(h=q95,lty=2) legend(0.5,2.4,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ########################################################### # Plotting the PLI in percentage with refined confidence intervals toto = PLIquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=TRUE,nboot=nboot,bootsample=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.6,0.6),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(0,0.6,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ################################################### # another visualization by using the plotCI() fct # (from plotrix package) for the CI plotting(from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"), symbols=c(15,19,17),overlay=c(FALSE,TRUE,TRUE)) par(mar=c(4,5,1,1),xpd=TRUE) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=parameters$overlay[i], xlab="", ylab="") } title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") of Y="~2*X[1] + X[2] + X[3]/2)) abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3"),col=parameters$colors, pch=parameters$symbols,cex=1.5)
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) # Monte Carlo sampling N = 5000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 # quantile order q95 = quantile(Y,alpha) nboot=20 # put nboot=200 for consistency # sensitivity indices with perturbation of the mean v_delta = seq(-1,1,1/10) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=FALSE,nboot=nboot) # Plotting the PLI par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-1.5,1.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(0.8,1.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the perturbed quantiles par(mar=c(4,5,1,1)) plot(v_delta,toto$quantile[,2],ylim=c(1.5,6.5),xlab=expression(delta), ylab=expression(hat(q[i*delta])),pch=19,cex=1.5) points(v_delta,toto$quantile[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$quantile[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$quantileCIinf[,2],col="black") lines(v_delta,toto$quantileCIsup[,2],col="black") lines(v_delta,toto$quantileCIinf[,1],col="darkgreen") lines(v_delta,toto$quantileCIsup[,1],col="darkgreen") lines(v_delta,toto$quantileCIinf[,3],col="red") lines(v_delta,toto$quantileCIsup[,3],col="red") abline(h=q95,lty=2) legend(0.5,2.4,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ########################################################### # Plotting the PLI in percentage with refined confidence intervals toto = PLIquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=TRUE,nboot=nboot,bootsample=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.6,0.6),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(0,0.6,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ################################################### # another visualization by using the plotCI() fct # (from plotrix package) for the CI plotting(from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"), symbols=c(15,19,17),overlay=c(FALSE,TRUE,TRUE)) par(mar=c(4,5,1,1),xpd=TRUE) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=parameters$overlay[i], xlab="", ylab="") } title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") of Y="~2*X[1] + X[2] + X[3]/2)) abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3"),col=parameters$colors, pch=parameters$symbols,cex=1.5)
PLIquantile_multivar computes the Perturbed-Law based Indices (PLI) for quantile and simultaneous perturbations of the means of 2 inputs, estimated by a Monte Carlo method.
PLIquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
PLIquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions,samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE)
order |
the order of the quantile to estimate. |
x |
the matrix of simulation points coordinates, one column per variable. |
y |
the vector of model outputs. |
inputs |
the vector of the two inputs' indices for which the indices will be computed. |
deltasvector |
a vector containing the values of the perturbed means for which the indices will be computed. Warning: if samedelta=FALSE, deltasvector has to be the vector of deltas (mean perturbations) |
InputDistributions |
a list of list. Each list contains, as a list,
the name of the distribution to be used and the parameters.
Implemented cases so far (for a mean perturbation):
Gaussian, Uniform, Triangle, Left Trucated Gaussian,
Left Truncated Gumbel. Using Gumbel requires the package |
samedelta |
a boolean used with the value "MOY" for type.
|
percentage |
a boolean that defines the formula used for the PLI.
|
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
bootsample |
If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values. |
This function does not allow perturbations on the variance of the inputs' distributions.
PLIquantile_multivar
returns a list of matrices
(delta twist of input 1 (in rows) vs. delta twist of input 2 (in columns))
containing the following components:
PLI |
the PLI. |
PLICIinf |
the bootstrap lower confidence interval values of the PLI. |
PLICIsup |
the bootstrap upper confidence interval values of the PLI. |
quantile |
the perturbed quantile. |
quantileCIinf |
the bootstrap lower confidence interval values of the perturbed quantile. |
quantileCIsup |
the bootstrap upper confidence interval values of the perturbed quantile. |
Bertrand Iooss
T. Delage, R. Sueur and B. Iooss, 2018, Robustness analysis of epistemic uncertainties propagation studies in LOCA assessment thermal-hydraulic model, ANS Best Estimate Plus Uncertainty International Conference (BEPU 2018), Lucca, Italy, May 13-19, 2018.
B. Iooss, V. Verges and V. Larget, 2022, BEPU robustness analysis via perturbed law-based sensitivity indices, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 236:855-865.
P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, 2015, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.
R. Sueur, N. Bousquet, B. Iooss and J. Bect, 2016, Perturbed-Law based sensitivity Indices for sensitivity analysis in structural reliability, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.
R. Sueur, B. Iooss and T. Delage, 2017, Sensitivity analysis using perturbed-law based indices for quantiles and application to an industrial case, 10th International Conference on Mathematical Methods in Reliability (MMR 2017), Grenoble, France, July 2017.
PLI, PLIquantile, PLIsuperquantile, PLIsuperquantile_multivar
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) N = 5000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 nboot <- 20 # put nboot=200 for consistency q95 = quantile(Y,alpha) v_delta = seq(-1,1,1/10) toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=0) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) abline(h=0,lty=2) legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) # with bootstrap v_delta = seq(-1,1,2/10) toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,diag(toto12$PLICIinf),col="blue") lines(v_delta,diag(toto12$PLICIsup),col="blue") lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) ################################################### # another visualizations by using the plotrix, # viridisLite, lattice and grid packages (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17)) par(mar=c(4,5,1,1),xpd=TRUE) plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf), xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2), cex=1.5,col="blue",pch=16) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=TRUE) } abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3","X1X2"), col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5) # Visu of all the PLIs (at any paired combinations of deltas) library(viridisLite) library(lattice) library(grid) colnames(toto12$PLI) = round(v_delta,2) rownames(toto12$PLI) = round(v_delta,2) coul = viridis(100) levelplot(toto12$PLI, col.regions = coul, xlab=bquote(delta[X~.(1)]), ylab=bquote(delta[X~.(2)]), main=bquote(hat(PLI)[quantile[~X[1]~X[2]]]))
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) N = 5000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 nboot <- 20 # put nboot=200 for consistency q95 = quantile(Y,alpha) v_delta = seq(-1,1,1/10) toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=0) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) abline(h=0,lty=2) legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) # with bootstrap v_delta = seq(-1,1,2/10) toto12 = PLIquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE) toto = PLIquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,diag(toto12$PLICIinf),col="blue") lines(v_delta,diag(toto12$PLICIsup),col="blue") lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) ################################################### # another visualizations by using the plotrix, # viridisLite, lattice and grid packages (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17)) par(mar=c(4,5,1,1),xpd=TRUE) plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf), xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-quantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2), cex=1.5,col="blue",pch=16) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=TRUE) } abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3","X1X2"), col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5) # Visu of all the PLIs (at any paired combinations of deltas) library(viridisLite) library(lattice) library(grid) colnames(toto12$PLI) = round(v_delta,2) rownames(toto12$PLI) = round(v_delta,2) coul = viridis(100) levelplot(toto12$PLI, col.regions = coul, xlab=bquote(delta[X~.(1)]), ylab=bquote(delta[X~.(2)]), main=bquote(hat(PLI)[quantile[~X[1]~X[2]]]))
PLIsuperquantile computes the Perturbed-Law based Indices (PLI) for superquantile, which are robustness indices related to a superquantile of a model output, estimated by a Monte Carlo method. See Iooss et al. (2020).
PLIsuperquantile(order,x,y,deltasvector,InputDistributions,type="MOY", samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
PLIsuperquantile(order,x,y,deltasvector,InputDistributions,type="MOY", samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
order |
the order of the superquantile to estimate. |
x |
the matrix of simulation points coordinates, one column per variable. |
y |
the vector of model outputs. |
deltasvector |
a vector containing the values of delta for which the indices will be computed. |
InputDistributions |
a list of list. Each list contains, as a list, the name of the distribution to be used and the parameters. Implemented cases so far:
|
type |
a character string in which the user will specify the type of perturbation wanted. The sense of "deltasvector" varies according to the type of perturbation:
|
samedelta |
a boolean used with the value "MOY" for type.
|
percentage |
a boolean that defines the formula used for the PLI.
|
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
bootsample |
If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2020). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values. |
bias |
defines the version of PLI-superquantile:
|
PLIsuperquantile
returns a list of matrices (each column corresponds to an input,
each line corresponds to a twist of amplitude delta)
containing the following components:
PLI |
the PLI. |
PLICIinf |
the bootstrap lower confidence interval values of the PLI. |
PLICIsup |
the bootstrap upper confidence interval values of the PLI. |
superquantile |
the perturbed superquantile. |
superquantileCIinf |
the bootstrap lower confidence interval values of the perturbed superquantile. |
superquantileCIsup |
the bootstrap upper confidence interval values of the perturbed superquantile. |
Bertrand Iooss
B. Iooss, V. Verges and V. Larget, 2022, BEPU robustness analysis via perturbed law-based sensitivity indices, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 236:855-865.
P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, 2015, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.
PLI, PLIquantile, PLIsuperquantile_multivar
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) # Monte Carlo sampling N = 10000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 q95 = quantile(Y,alpha) sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95]) nboot=20 # change to nboot=200 for consistency # sensitivity indices with perturbation of the mean v_delta = seq(-1,1,1/10) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=FALSE,nboot=nboot,bias=TRUE) # Plotting the PLI par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.5,0.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,0.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the perturbed superquantiles par(mar=c(4,5,1,1)) plot(v_delta,toto$superquantile[,2],ylim=c(3,7),xlab=expression(delta), ylab=expression(hat(q[i*delta])),pch=19,cex=1.5) points(v_delta,toto$superquantile[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$superquantile[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$superquantileCIinf[,2],col="black") lines(v_delta,toto$superquantileCIsup[,2],col="black") lines(v_delta,toto$superquantileCIinf[,1],col="darkgreen") lines(v_delta,toto$superquantileCIsup[,1],col="darkgreen") lines(v_delta,toto$superquantileCIinf[,3],col="red") lines(v_delta,toto$superquantileCIsup[,3],col="red") abline(h=q95,lty=2) legend(-1,7,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the unbiased PLI in percentage with refined confidence intervals toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE,percentage=TRUE, nboot=nboot,bootsample=FALSE,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.4,0.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,0.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ################################################## # another visualization by using the plotCI() fct # (from plotrix package) for the CI plotting (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17), overlay=c(FALSE,TRUE,TRUE)) par(mar=c(4,5,1,1),xpd=TRUE) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=parameters$overlay[i], xlab="", ylab="") } title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") of Y="~2*X[1] + X[2] + X[3]/2)) abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3"), col=parameters$colors,pch=parameters$symbols,cex=1.5)
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) # Monte Carlo sampling N = 10000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 q95 = quantile(Y,alpha) sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95]) nboot=20 # change to nboot=200 for consistency # sensitivity indices with perturbation of the mean v_delta = seq(-1,1,1/10) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE, percentage=FALSE,nboot=nboot,bias=TRUE) # Plotting the PLI par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.5,0.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,0.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the perturbed superquantiles par(mar=c(4,5,1,1)) plot(v_delta,toto$superquantile[,2],ylim=c(3,7),xlab=expression(delta), ylab=expression(hat(q[i*delta])),pch=19,cex=1.5) points(v_delta,toto$superquantile[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$superquantile[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$superquantileCIinf[,2],col="black") lines(v_delta,toto$superquantileCIsup[,2],col="black") lines(v_delta,toto$superquantileCIinf[,1],col="darkgreen") lines(v_delta,toto$superquantileCIsup[,1],col="darkgreen") lines(v_delta,toto$superquantileCIinf[,3],col="red") lines(v_delta,toto$superquantileCIsup[,3],col="red") abline(h=q95,lty=2) legend(-1,7,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) # Plotting the unbiased PLI in percentage with refined confidence intervals toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta, InputDistributions=distribution,type="MOY",samedelta=TRUE,percentage=TRUE, nboot=nboot,bootsample=FALSE,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,toto$PLI[,2],ylim=c(-0.4,0.5),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=19,cex=1.5) points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,0.5,legend=c("X1","X2","X3"), col=c("darkgreen","black","red"),pch=c(15,19,17),cex=1.5) ################################################## # another visualization by using the plotCI() fct # (from plotrix package) for the CI plotting (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17), overlay=c(FALSE,TRUE,TRUE)) par(mar=c(4,5,1,1),xpd=TRUE) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=parameters$overlay[i], xlab="", ylab="") } title(xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") of Y="~2*X[1] + X[2] + X[3]/2)) abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3"), col=parameters$colors,pch=parameters$symbols,cex=1.5)
PLIquantile_multivar computes the Perturbed-Law based Indices (PLI) for superquantile and simultaneous perturbations of the means of 2 inputs, estimated by a Monte Carlo method.
PLIsuperquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions, samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
PLIsuperquantile_multivar(order,x,y,inputs,deltasvector,InputDistributions, samedelta=TRUE, percentage=TRUE,nboot=0,conf=0.95,bootsample=TRUE,bias=TRUE)
order |
the order of the quantile to estimate. |
x |
the matrix of simulation points coordinates, one column per variable. |
y |
the vector of model outputs. |
inputs |
the vector of the two inputs' indices for which the indices will be computed. |
deltasvector |
a vector containing the values of the perturbed means for which the indices will be computed. Warning: if samedelta=FALSE, deltasvector has to be the vector of deltas (mean perturbations) |
InputDistributions |
a list of list. Each list contains, as a list,
the name of the distribution to be used and the parameters.
Implemented cases so far (for a mean perturbation):
Gaussian, Uniform, Triangle, Left Trucated Gaussian,
Left Truncated Gumbel. Using Gumbel requires the package |
samedelta |
a boolean used with the value "MOY" for type.
|
percentage |
a boolean that defines the formula used for the PLI.
|
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
bootsample |
If TRUE, the uncertainty about the original quantile estimation is taken into account in the PLI confidence intervals (see Iooss et al., 2021). If FALSE, standard confidence intervals are computed for the PLI. It mainly changes the CI at small delta values. |
bias |
defines the version of PLI-superquantile:
|
This function does not allow perturbations on the variance of the inputs' distributions.
PLIsuperquantile_multivar
returns a list of matrices
(delta twist of input 1 (in rows) vs. delta twist of input 2 (in columns))
containing the following components:
PLI |
the PLI. |
PLICIinf |
the bootstrap lower confidence interval values of the PLI. |
PLICIsup |
the bootstrap upper confidence interval values of the PLI. |
quantile |
the perturbed quantile. |
quantileCIinf |
the bootstrap lower confidence interval values of the perturbed superquantile. |
quantileCIsup |
the bootstrap upper confidence interval values of the perturbed superquantile. |
Bertrand Iooss
T. Delage, R. Sueur and B. Iooss, 2018, Robustness analysis of epistemic uncertainties propagation studies in LOCA assessment thermal-hydraulic model, ANS Best Estimate Plus Uncertainty International Conference (BEPU 2018), Lucca, Italy, May 13-19, 2018.
B. Iooss, V. Verges and V. Larget, 2022, BEPU robustness analysis via perturbed law-based sensitivity indices, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 236:855-865.
P. Lemaitre, E. Sergienko, A. Arnaud, N. Bousquet, F. Gamboa and B. Iooss, 2015, Density modification based reliability sensitivity analysis, Journal of Statistical Computation and Simulation, 85:1200-1223.
R. Sueur, N. Bousquet, B. Iooss and J. Bect, 2016, Perturbed-Law based sensitivity Indices for sensitivity analysis in structural reliability, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.
R. Sueur, B. Iooss and T. Delage, 2017, Sensitivity analysis using perturbed-law based indices for quantiles and application to an industrial case, 10th International Conference on Mathematical Methods in Reliability (MMR 2017), Grenoble, France, July 2017.
PLI, PLIquantile, PLIsuperquantile, PLIquantile_multivar
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) N = 10000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 nboot <- 20 # put nboot=200 for consistency q95 = quantile(Y,alpha) sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95]) v_delta = seq(-1,1,1/10) toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,bias=FALSE) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=0,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) abline(h=0,lty=2) legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) # with bootstrap (put in comment because too long for the CRAN tests) v_delta = seq(-1,1,2/10) toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,diag(toto12$PLICIinf),col="blue") lines(v_delta,diag(toto12$PLICIsup),col="blue") lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) ################################################### # another visualizations by using the plotrix, # viridisLite, lattice and grid packages (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17)) par(mar=c(4,5,1,1),xpd=TRUE) plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf), xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2), cex=1.5,col="blue",pch=16) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=TRUE) } abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3","X1X2"), col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5) # Visu of all the PLIs (at any paired combinations of deltas) library(viridisLite) library(lattice) library(grid) colnames(toto12$PLI) = round(v_delta,2) rownames(toto12$PLI) = round(v_delta,2) coul = viridis(100) levelplot(toto12$PLI,col.regions=coul,main=bquote(hat(PLI)[superquantile[~X[1]~X[2]]]), xlab=bquote(delta[X~.(1)]),ylab=bquote(delta[X~.(2)]))
# Model: 3D function distribution = list() for (i in 1:3) distribution[[i]]=list("norm",c(0,1)) N = 10000 X = matrix(0,ncol=3,nrow=N) for(i in 1:3) X[,i] = rnorm(N,0,1) Y = 2 * X[,1] + X[,2] + X[,3]/2 alpha <- 0.95 nboot <- 20 # put nboot=200 for consistency q95 = quantile(Y,alpha) sq95a <- mean(Y*(Y>q95)/(1-alpha)) ; sq95b <- mean(Y[Y>q95]) v_delta = seq(-1,1,1/10) toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,bias=FALSE) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=0,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),,ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) abline(h=0,lty=2) legend(-1,1.,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) # with bootstrap (put in comment because too long for the CRAN tests) v_delta = seq(-1,1,2/10) toto12 = PLIsuperquantile_multivar(alpha,X,Y,c(1,2),deltasvector=v_delta, InputDistributions=distribution,samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE) toto = PLIsuperquantile(alpha,X,Y,deltasvector=v_delta,InputDistributions=distribution, type="MOY",samedelta=TRUE,nboot=nboot,bootsample=FALSE,bias=FALSE) par(mar=c(4,5,1,1)) plot(v_delta,diag(toto12$PLI),ylim=c(-1,1),xlab=expression(delta), ylab=expression(hat(PLI[i*delta])),pch=16,cex=1.5,col="blue") points(v_delta,toto$PLI[,1],col="darkgreen",pch=15,cex=1.5) points(v_delta,toto$PLI[,2],col="black",pch=19,cex=1.5) points(v_delta,toto$PLI[,3],col="red",pch=17,cex=1.5) lines(v_delta,diag(toto12$PLICIinf),col="blue") lines(v_delta,diag(toto12$PLICIsup),col="blue") lines(v_delta,toto$PLICIinf[,2],col="black") lines(v_delta,toto$PLICIsup[,2],col="black") lines(v_delta,toto$PLICIinf[,1],col="darkgreen") lines(v_delta,toto$PLICIsup[,1],col="darkgreen") lines(v_delta,toto$PLICIinf[,3],col="red") lines(v_delta,toto$PLICIsup[,3],col="red") abline(h=0,lty=2) legend(-1,1,legend=c("X1","X2","X3","X1X2"),col=c("darkgreen","black","red","blue"), pch=c(15,19,17,16),cex=1.5) ################################################### # another visualizations by using the plotrix, # viridisLite, lattice and grid packages (from Vanessa Verges) library(plotrix) parameters = list(colors=c("darkgreen","black","red"),symbols=c(15,19,17)) par(mar=c(4,5,1,1),xpd=TRUE) plotCI(v_delta,diag(toto12$PLI),ui=diag(toto12$PLICIsup),li=diag(toto12$PLICIinf), xlab=expression(delta),ylab=expression(hat(PLI[i*delta])), main=bquote("PLI-superquantile (N ="~.(N) ~ ","~alpha~"="~.(alpha)~ ") on "~X[1]~"and"~X[2]~"of Y="~2*X[1] + X[2] + X[3]/2), cex=1.5,col="blue",pch=16) for (i in 1:3){ plotCI(v_delta,toto$PLI[,i],ui=toto$PLICIsup[,i],li=toto$PLICIinf[,i], cex=1.5,col=parameters$colors[i],pch=parameters$symbols[i], add=TRUE) } abline(h=0,lty=2) legend("topleft",legend=c("X1","X2","X3","X1X2"), col=c(parameters$colors,"blue"),pch=c(parameters$symbols,16),cex=1.5) # Visu of all the PLIs (at any paired combinations of deltas) library(viridisLite) library(lattice) library(grid) colnames(toto12$PLI) = round(v_delta,2) rownames(toto12$PLI) = round(v_delta,2) coul = viridis(100) levelplot(toto12$PLI,col.regions=coul,main=bquote(hat(PLI)[superquantile[~X[1]~X[2]]]), xlab=bquote(delta[X~.(1)]),ylab=bquote(delta[X~.(2)]))
Methods to plot the normalized support index functions (Fruth et al., 2016).
## S3 method for class 'support' plot(x, i = 1:ncol(x$X), xprob = FALSE, p = NULL, p.arg = NULL, ylim = NULL, col = 1:3, lty = 1:3, lwd = c(2,2,1), cex = 1, ...) ## S3 method for class 'support' scatterplot(x, i = 1:ncol(x$X), xprob = FALSE, p = NULL, p.arg = NULL, cex = 1, cex.lab = 1, ...)
## S3 method for class 'support' plot(x, i = 1:ncol(x$X), xprob = FALSE, p = NULL, p.arg = NULL, ylim = NULL, col = 1:3, lty = 1:3, lwd = c(2,2,1), cex = 1, ...) ## S3 method for class 'support' scatterplot(x, i = 1:ncol(x$X), xprob = FALSE, p = NULL, p.arg = NULL, cex = 1, cex.lab = 1, ...)
x |
an object of class support. |
i |
an optional vector of integers indicating the subset of input variables |
xprob |
an optional boolean indicating whether the inputs should be plotted in probability scale. |
p |
, |
p.arg |
list of probability names and parameters for the input distribution. |
ylim |
, |
col |
, |
lty |
, |
lwd |
, |
cex |
, |
cex.lab |
usual graphical parameters. |
... |
additional graphical parameters to be passed to |
If xprob = TRUE
, the input variable X_i
is plotted in probability scale according to the informations provided in the arguments p, p.arg
: The x-axis is thus F(x)
, where F
is the cdf of X_i
. If these ones are not provided, the empirical distribution is used for rescaling: The x-axis is thus Fn(x)
, where Fn
is the empirical cdf of X_i
.
Legend details:
zeta*T : normalized total support index function
zeta* : normalized 1st-order support index function
nu* : normalized DGSM
Notice that the sum of (normalized) DGSM (nu*) over all input variables is equal to 1. Furthermore, the expectation of the total support index function (zeta*T) is equal to the (normalized) DGSM (nu*).
O. Roustant
Estimation of support index functions: support
pme_knn
computes the proportional marginal effects (PME), from Herin et al. (2024)
via a nearest neighbor estimation.
Parallelized computations are possible to accelerate the estimation process.
It can be used with categorical inputs (which are transformed with one-hot encoding before
computing the nearest-neighbors), dependent inputs and multiple outputs.
For large sample sizes, the nearest neighbour algorithm can be significantly accelerated
by using approximate nearest neighbour search.
pme_knn(model=NULL, X, method = "knn", tol = NULL, marg = T, n.knn = 2, n.limit = 2000, noise = F, rescale = F, nboot = NULL, boot.level = 0.8, conf=0.95, parl=NULL, ...) ## S3 method for class 'pme_knn' tell(x, y, ...) ## S3 method for class 'pme_knn' print(x, ...) ## S3 method for class 'pme_knn' plot(x, ylim = c(0,1), ...) ## S3 method for class 'pme_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
pme_knn(model=NULL, X, method = "knn", tol = NULL, marg = T, n.knn = 2, n.limit = 2000, noise = F, rescale = F, nboot = NULL, boot.level = 0.8, conf=0.95, parl=NULL, ...) ## S3 method for class 'pme_knn' tell(x, y, ...) ## S3 method for class 'pme_knn' print(x, ...) ## S3 method for class 'pme_knn' plot(x, ylim = c(0,1), ...) ## S3 method for class 'pme_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function defining the model to analyze, taking X as an argument. |
X |
a matrix or data frame containing the observed inputs. |
method |
the algorithm to be used for estimation, either "rank" or "knn",
see details. Default is |
tol |
tolerance under which an input is considered as being a zero input. See details. |
marg |
whether to chose the closed Sobol' ( |
n.knn |
the number of nearest neighbours used for estimation. |
n.limit |
sample size limit above which approximate nearest neighbour search is activated. |
noise |
a logical which is TRUE if the model or the output sample is noisy. See details. |
rescale |
a logical indicating if continuous inputs must be rescaled before distance computations.
If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure
(cf. whiten() function in package |
nboot |
the number of bootstrap resamples for the bootstrap estimate of confidence intervals. See details. |
boot.level |
a numeric between 0 and 1 for the proportion of the bootstrap sample size. |
conf |
the confidence level of the bootstrap confidence intervals. |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
data |
the object returned by |
y |
a numeric univariate vector containing the observed outputs. |
ylim |
the y-coordinate limits for plotting. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
additional arguments to be passed to |
For method="rank"
, the estimator is defined in Gamboa et al. (2020)
following Chatterjee (2019).For first-order indices it is based on an input
ranking (same algorithm as in sobolrank
) while for higher orders,
it uses an approximate heuristic solution of the traveling salesman problem
applied to the input sample distances (cf. TSP() function in package
TSP
). For method="knn"
, ranking and TSP are replaced by a
nearest neighbour search as proposed in Broto et al. (2020) and in Azadkia
& Chatterjee (2020) for a similar coefficient.
The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the Sobol' closed indices for all possible sub-models first, and then computing the proportional values recursively, as detailed in Feldman (2005), but using an extension to non strictly positive games (Herin et al., 2024).
Since boostrap creates ties which are not accounted for in the algorithm,
confidence intervals are obtained by sampling without replacement with a
proportion of the total sample size boot.level
, drawn uniformly.
If the outputs are noisy, the argument noise
can be used: it only has
an impact on the estimation of one specific sensitivity index, namely
. If there is no noise this index is equal
to 1, while in the presence of noise it must be estimated.
The distance used for subsets with mixed inputs (continuous and categorical) is the Euclidean distance, thanks to a one-hot encoding of categorical inputs.
If too many cores for the machine are passed on to the parl
argument,
the chosen number of cores is defaulted to the available cores minus one.
If marg = TRUE
(default), the chosen value function to compute the
proportional values are the total Sobol' indices (dual of the underlying
cooperative game). If marg = FALSE
, then the closed Sobol' indices
are used instead. Differences may appear between the two.
Zero inputs are defined by the tol
argument. If null
,
then inputs with:
are considered as zero input in the detection of spurious variables. If provided, zero inputs are detected when:
pme_knn
returns a list of class "pme_knn"
:
call |
the matched call. |
PME |
the estimations of the PME indices. |
VE |
the estimations of the closed Sobol' indices for all possible sub-models. |
indices |
list of all subsets corresponding to the structure of VE. |
method |
which estimation method has been used. |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
n.knn |
value of the |
rescale |
wheter the design matrix has been rescaled. |
n.limit |
value of the |
boot.level |
value of the |
noise |
wheter the PME must sum up to one or not. |
boot |
logical, wheter bootstrap confidence interval estimates have been performed. |
nboot |
value of the |
parl |
value of the |
conf |
value of the |
marg |
value of the |
tol |
value of the |
Marouane Il Idrissi, Margot Herin
Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.
Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.
Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.
Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).
M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for sensitivity analysis with correlated inputs, Proceedings of the 10th International Conferenceon Sensitivity Analysis of Model Output (SAMO 2022), p 42-43, Tallahassee, Florida, March 2022.
M. Herin, M. Il Idrissi, V. Chabridon and B. Iooss, Proportional marginal effects for global sensitivity analysis, SIAM/ASA Journal of Uncertainty Quantification, 12:667-692 2024
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs. Environmental Modelling & Software, 143, 105115.
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.
sobolrank
, shapleysobol_knn
, shapleyPermEx
, shapleySubsetMc
, lmg
, pmvd
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(RANN) ########################################################### # Linear Model with Gaussian correlated inputs library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-pme_knn(model=NULL, X=X, n.knn=3, noise=TRUE) tell(x,y) print(x) plot(x) # With Boostrap confidence intervals x<-pme_knn(model=NULL, X=X, nboot=10, n.knn=3, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################################################### # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- pme_knn(model = NULL, X = X, method = "knn", n.knn = 5, n.limit = 2000) tell(x,Y) plot(x) library(ggplot2) ; ggplot(x) ###################################################### # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- pme_knn(model = modlin, X = X, method = "knn", n.knn = 5, rescale = TRUE, n.limit = 2000) print(x) plot(x)
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(RANN) ########################################################### # Linear Model with Gaussian correlated inputs library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-pme_knn(model=NULL, X=X, n.knn=3, noise=TRUE) tell(x,y) print(x) plot(x) # With Boostrap confidence intervals x<-pme_knn(model=NULL, X=X, nboot=10, n.knn=3, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################################################### # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- pme_knn(model = NULL, X = X, method = "knn", n.knn = 5, n.limit = 2000) tell(x,Y) plot(x) library(ggplot2) ; ggplot(x) ###################################################### # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- pme_knn(model = modlin, X = X, method = "knn", n.knn = 5, rescale = TRUE, n.limit = 2000) print(x) plot(x)
pmvd
computes the PMVD indices derived from Feldman (2005) applied to
the explained variance () as a performance metric.
They allow for relative importance indices by
decomposition
for linear and logistic regression models. These indices allocate a share of
to each input based on a Proportional attribution system,
allowing for covariates with null regression coefficients to have indices
equal to 0, despite their potential dependence with other covariates (Exclusion
principle).
pmvd(X, y, logistic = FALSE, tol = NULL, rank = FALSE, nboot = 0, conf = 0.95, max.iter = 1000, parl = NULL) ## S3 method for class 'pmvd' print(x, ...) ## S3 method for class 'pmvd' plot(x, ylim = c(0,1), ...)
pmvd(X, y, logistic = FALSE, tol = NULL, rank = FALSE, nboot = 0, conf = 0.95, max.iter = 1000, parl = NULL) ## S3 method for class 'pmvd' print(x, ...) ## S3 method for class 'pmvd' plot(x, ylim = c(0,1), ...)
X |
a matrix or data frame containing the observed covariates (i.e., features, input variables...). |
y |
a numeric vector containing the observed outcomes (i.e.,
dependent variable). If |
logistic |
logical. If |
tol |
covariates with absolute marginal contributions less or equal to
|
rank |
logical. If |
nboot |
the number of bootstrap replicates for the computation of confidence intervals. |
conf |
the confidence level of the bootstrap confidence intervals. |
max.iter |
if |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
The computation of the PMVD is done using the recursive method defined in
Feldman (2005), but using the subset procedure defined in Broto, Bachoc
and Depecker (2020), that is computing all the for all
possible sub-models first, and then computing
recursively for all
subsets of covariates. See Il Idrissi et al. (2021).
For logistic regression (logistic=TRUE
), the
value is equal to:
If either a logistic regression model (logistic = TRUE
), or any column
of X
is categorical (i.e., of class factor
), then the rank-based
indices cannot be computed. In both those cases, rank = FALSE
is forced
by default (with a warning
).
If too many cores for the machine are passed on to the parl
argument,
the chosen number of cores is defaulted to the available cores minus one.
Spurious covariates are defined by the tol
argument. If null
,
then covariates with:
are omitted, and their pmvd
index is set to zero. In other cases, the
spurious covariates are detected by:
pmvd
returns a list of class "pmvd"
, containing the following
components:
call |
the matched call. |
pmvd |
a data frame containing the estimations of the PMVD indices. |
R2s |
the estimations of the |
indices |
list of all subsets corresponding to the structure of R2s. |
P |
the values of |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
logistic |
logical. |
boot |
logical. |
nboot |
number of bootstrap replicates. |
rank |
logical. |
parl |
number of chosen cores for the computation. |
conf |
level for the confidence intervals by bootstrap. |
Marouane Il Idrissi
Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).
D.V. Budescu (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114:542-551.
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
Feldman, B. (2005) Relative Importance and Value SSRN Electronic Journal.
U. Gromping (2006). Relative importance for linear regression in R: the Package relaimpo. Journal of Statistical Software, 17:1-27.
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Actes des 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp 497-502, Nice, France, Juin 2021
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
library(parallel) library(gtools) library(boot) library(mvtnorm) set.seed(1234) n <- 100 beta<-c(1,-2,3) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ############################ # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) ############################# # Linear Model y <- X%*%beta + rnorm(n) # Without Bootstrap confidence intervals x<-pmvd(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-pmvd(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-pmvd(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ############################ # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-pmvd(X,y, logistic = TRUE) plot(x) print(x) # Parallel computing #x<-pmvd(X,y, logistic = TRUE, parl=2) #plot(x) #print(x)
library(parallel) library(gtools) library(boot) library(mvtnorm) set.seed(1234) n <- 100 beta<-c(1,-2,3) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) ############################ # Gaussian correlated inputs X <-rmvnorm(n, rep(0,3), sigma) ############################# # Linear Model y <- X%*%beta + rnorm(n) # Without Bootstrap confidence intervals x<-pmvd(X, y) print(x) plot(x) # With Boostrap confidence intervals x<-pmvd(X, y, nboot=100, conf=0.95) print(x) plot(x) # Rank-based analysis x<-pmvd(X, y, rank=TRUE, nboot=100, conf=0.95) print(x) plot(x) ############################ # Logistic Regression y<-as.numeric(X%*%beta + rnorm(n)>0) x<-pmvd(X,y, logistic = TRUE) plot(x) print(x) # Parallel computing #x<-pmvd(X,y, logistic = TRUE, parl=2) #plot(x) #print(x)
This program computes the squared coefficient of the function decomposition in the tensor basis formed by eigenfunctions of Poincare differential operators. After division by the variance of the model output, it provides lower bounds of first-order and total Sobol' indices.
PoincareChaosSqCoef(PoincareEigen, multiIndex, design, output, outputGrad = NULL, inputIndex = 1, der = FALSE, method = "unbiased")
PoincareChaosSqCoef(PoincareEigen, multiIndex, design, output, outputGrad = NULL, inputIndex = 1, der = FALSE, method = "unbiased")
PoincareEigen |
output list from PoincareOptimal() function |
multiIndex |
vector of indices (l1, ..., ld). A coordinate equal to 0 corresponds to the constant basis function 1 |
design |
design of experiments (matrix of size n x d) with d the number of inputs and n the number of observations |
output |
vector of length n (y1, ..., yn) of output values at |
outputGrad |
matrix n x d whose columns contain the output partial derivatives at |
inputIndex |
index of the input variable (between 1 and d) |
der |
logical (default=FALSE): should we use the formula with derivatives to compute the squared coefficient? |
method |
"biased" or "unbiased" formula when estimating the squared integral. See |
Similarly to polynomial chaos, where tensors of polynomials are used, we consider here tensor
basis formed by eigenfunctions of Poincare differential operators. This basis is also orthonormal,
and Parseval formula lead to lower bound for (unnormalized) Sobol, total Sobol indices, and any variance-based index.
Denoting by one tensor basis, the corresponding coefficient is equal to
.
For a given input variable (say to simplify notations), it can be rewritten with derivatives as:
The function returns an estimate of , corresponding to one of these two forms (derivative-free, or derivative-based).
An estimate of the squared coefficient.
Olivier Roustant and Bertrand Iooss
O. Roustant, F. Gamboa and B. Iooss, Parseval inequalities and lower bounds for variance-based sensitivity indices, Electronic Journal of Statistics, 14:386-412, 2020
# A simple example g <- function(x, a){ res <- x[, 1] + a*x[, 1]*x[, 2] attr(res, "grad") <- cbind(1 + a * x[, 2], a * x[, 1]) return(res) } n <- 1e3 set.seed(0) X <- matrix(runif(2*n, min = -1/2, max = 1/2), nrow = n, ncol = 2) a <- 3 fX <- g(X, a = a) out_1 <- out_2 <- PoincareOptimal(distr = list("unif", -1/2, 1/2), only.values = FALSE, der = TRUE, method = "quad") out <- list(out_1, out_2) # Lower bounds for X1 c2_10 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = FALSE) c2_11 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = FALSE) c2_10_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = TRUE) c2_11_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = TRUE) LB1 <- c(8/pi^4, c2_10, c2_10_der) LB1tot <- LB1 + c(64/pi^8 * a^2, c2_11, c2_11_der) LB <- cbind(LB1, LB1tot) rownames(LB) <- c("True lower bound value", "Estimated, no derivatives", "Estimated, with derivatives") colnames(LB) <- c("D1", "D1tot") cat("True values of D1 and D1tot:", c(1/12, 1/12 + a^2 / 144),"\n") cat("Sample size: ", n, "\n") cat("Lower bounds computed with the first Poincare eigenvalue:\n") print(LB) cat("\nN.B. Increase the sample size to see the convergence to true lower bound values.\n") ############################################################ # Flood model example (see Roustant et al., 2017, 2019) library(evd) # Gumbel law library(triangle) # Triangular law # Flood model Fcrues_full2=function(X,ans=0){ # ans=1 gives Overflow output; ans=2 gives Cost output; ans=0 gives both mat=matrix(X,ncol=8); if (ans==0){ reponse=matrix(NA,nrow(mat),2);} else{ reponse=rep(NA,nrow(mat));} for (i in 1:nrow(mat)) { H = (mat[i,1] / (mat[i,2]*mat[i,8]*sqrt((mat[i,4] - mat[i,3])/mat[i,7])))^(0.6) ; S = mat[i,3] + H - mat[i,5] - mat[i,6] ; if (S > 0){ Cp = 1 ;} else{ Cp = 0.2 + 0.8 * (1 - exp(-1000 / S^4));} if (mat[i,5]>8){ Cp = Cp + mat[i,5]/20 ;} else{ Cp = Cp + 8/20 ;} if (ans==0){ reponse[i,1] = S ; reponse[i,2] = Cp ; } if (ans==1){ reponse[i] = S ;} if (ans==2){ reponse[i] = Cp ;} } return(RES=reponse) } # Flood model derivatives (by finite-differences) dFcrues_full2 <- function(X, i, ans, eps){ der = X X1 = X X1[,i] = X[,i]+eps der = (Fcrues_full2(X1,ans) - Fcrues_full2(X,ans))/(eps) return(der) } # Function for flood model inputs sampling EchantFcrues_full2<-function(taille){ X = matrix(NA,taille,8) X[,1] = rgumbel.trunc(taille,loc=1013.0,scale=558.0,min=500,max=3000) X[,2] = rnorm.trunc(taille,mean=30.0,sd=8,min=15.) X[,3] = rtriangle(taille,a=49,b=51,c=50) X[,4] = rtriangle(taille,a=54,b=56,c=55) X[,5] = runif(taille,min=7,max=9) X[,6] = rtriangle(taille,a=55,b=56,c=55.5) X[,7] = rtriangle(taille,a=4990,b=5010,c=5000) X[,8] = rtriangle(taille,a=295,b=305,c=300) return(X) } d <- 8 n <- 1e3 eps <- 1e-7 # finite-differences for derivatives x <- EchantFcrues_full2(n) yy <- Fcrues_full2(x, ans=2) y <- scale(yy, center = TRUE, scale = FALSE)[,1] dy <- NULL for (i in 1:d) dy <- cbind(dy, dFcrues_full2(x, i, ans=2, eps)) method <- "quad" out_1 <- PoincareOptimal(distr = list("gumbel", 1013, 558), min=500,max=3000, only.values = FALSE, der = TRUE, method = method) out_2 <- PoincareOptimal(distr = list("norm", 30, 8), min=15, max=200, only.values = FALSE, der = TRUE, method = method) out_3 <- PoincareOptimal(distr = list("triangle", 49, 51, 50), only.values = FALSE, der = TRUE, method = method) out_4 <- PoincareOptimal(distr = list("triangle", 54, 56, 55), only.values = FALSE, der = TRUE, method = method) out_5 <- PoincareOptimal(distr = list("unif", 7, 9), only.values = FALSE, der = TRUE, method = method) out_6 <- PoincareOptimal(distr = list("triangle", 55, 56, 55.5), only.values = FALSE, der = TRUE, method = method) out_7 <- PoincareOptimal(distr = list("triangle", 4990, 5010, 5000), only.values = FALSE, der = TRUE, method = method) out_8 <- PoincareOptimal(distr = list("triangle", 295, 305, 300), only.values = FALSE, der = TRUE, method = method) out_ <- list(out_1,out_2,out_3,out_4,out_5,out_6,out_7,out_8) c2 <- c2der <- c2tot <- c2totder <- rep(0,d) for (i in 1:d){ m <- diag(1,d,d) ; m[,i] <- 1 for (j in 1:d){ cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], design = x, output = y, outputGrad = NULL, inputIndex = i, der = FALSE) c2tot[i] <- c2tot[i] + cc if (j == i) c2[i] <- cc cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], design = x, output = y, outputGrad = dy, inputIndex = i, der = TRUE) c2totder[i] <- c2totder[i] + cc if (j == i) c2der[i] <- cc } } print("Lower bounds of first-order Sobol' indices without derivatives:") print(c2/var(y)) print("Lower bounds of first-order Sobol' indices with derivatives:") print(c2der/var(y)) print("Lower bounds of total Sobol' indices without derivatives:") print(c2tot/var(y)) print("Lower bounds of total Sobol' indices with derivatives:") print(c2totder/var(y))
# A simple example g <- function(x, a){ res <- x[, 1] + a*x[, 1]*x[, 2] attr(res, "grad") <- cbind(1 + a * x[, 2], a * x[, 1]) return(res) } n <- 1e3 set.seed(0) X <- matrix(runif(2*n, min = -1/2, max = 1/2), nrow = n, ncol = 2) a <- 3 fX <- g(X, a = a) out_1 <- out_2 <- PoincareOptimal(distr = list("unif", -1/2, 1/2), only.values = FALSE, der = TRUE, method = "quad") out <- list(out_1, out_2) # Lower bounds for X1 c2_10 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = FALSE) c2_11 <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = FALSE) c2_10_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 0), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = TRUE) c2_11_der <- PoincareChaosSqCoef(PoincareEigen = out, multiIndex = c(1, 1), design = X, output = fX, outputGrad = attr(fX, "grad"), inputIndex = 1, der = TRUE) LB1 <- c(8/pi^4, c2_10, c2_10_der) LB1tot <- LB1 + c(64/pi^8 * a^2, c2_11, c2_11_der) LB <- cbind(LB1, LB1tot) rownames(LB) <- c("True lower bound value", "Estimated, no derivatives", "Estimated, with derivatives") colnames(LB) <- c("D1", "D1tot") cat("True values of D1 and D1tot:", c(1/12, 1/12 + a^2 / 144),"\n") cat("Sample size: ", n, "\n") cat("Lower bounds computed with the first Poincare eigenvalue:\n") print(LB) cat("\nN.B. Increase the sample size to see the convergence to true lower bound values.\n") ############################################################ # Flood model example (see Roustant et al., 2017, 2019) library(evd) # Gumbel law library(triangle) # Triangular law # Flood model Fcrues_full2=function(X,ans=0){ # ans=1 gives Overflow output; ans=2 gives Cost output; ans=0 gives both mat=matrix(X,ncol=8); if (ans==0){ reponse=matrix(NA,nrow(mat),2);} else{ reponse=rep(NA,nrow(mat));} for (i in 1:nrow(mat)) { H = (mat[i,1] / (mat[i,2]*mat[i,8]*sqrt((mat[i,4] - mat[i,3])/mat[i,7])))^(0.6) ; S = mat[i,3] + H - mat[i,5] - mat[i,6] ; if (S > 0){ Cp = 1 ;} else{ Cp = 0.2 + 0.8 * (1 - exp(-1000 / S^4));} if (mat[i,5]>8){ Cp = Cp + mat[i,5]/20 ;} else{ Cp = Cp + 8/20 ;} if (ans==0){ reponse[i,1] = S ; reponse[i,2] = Cp ; } if (ans==1){ reponse[i] = S ;} if (ans==2){ reponse[i] = Cp ;} } return(RES=reponse) } # Flood model derivatives (by finite-differences) dFcrues_full2 <- function(X, i, ans, eps){ der = X X1 = X X1[,i] = X[,i]+eps der = (Fcrues_full2(X1,ans) - Fcrues_full2(X,ans))/(eps) return(der) } # Function for flood model inputs sampling EchantFcrues_full2<-function(taille){ X = matrix(NA,taille,8) X[,1] = rgumbel.trunc(taille,loc=1013.0,scale=558.0,min=500,max=3000) X[,2] = rnorm.trunc(taille,mean=30.0,sd=8,min=15.) X[,3] = rtriangle(taille,a=49,b=51,c=50) X[,4] = rtriangle(taille,a=54,b=56,c=55) X[,5] = runif(taille,min=7,max=9) X[,6] = rtriangle(taille,a=55,b=56,c=55.5) X[,7] = rtriangle(taille,a=4990,b=5010,c=5000) X[,8] = rtriangle(taille,a=295,b=305,c=300) return(X) } d <- 8 n <- 1e3 eps <- 1e-7 # finite-differences for derivatives x <- EchantFcrues_full2(n) yy <- Fcrues_full2(x, ans=2) y <- scale(yy, center = TRUE, scale = FALSE)[,1] dy <- NULL for (i in 1:d) dy <- cbind(dy, dFcrues_full2(x, i, ans=2, eps)) method <- "quad" out_1 <- PoincareOptimal(distr = list("gumbel", 1013, 558), min=500,max=3000, only.values = FALSE, der = TRUE, method = method) out_2 <- PoincareOptimal(distr = list("norm", 30, 8), min=15, max=200, only.values = FALSE, der = TRUE, method = method) out_3 <- PoincareOptimal(distr = list("triangle", 49, 51, 50), only.values = FALSE, der = TRUE, method = method) out_4 <- PoincareOptimal(distr = list("triangle", 54, 56, 55), only.values = FALSE, der = TRUE, method = method) out_5 <- PoincareOptimal(distr = list("unif", 7, 9), only.values = FALSE, der = TRUE, method = method) out_6 <- PoincareOptimal(distr = list("triangle", 55, 56, 55.5), only.values = FALSE, der = TRUE, method = method) out_7 <- PoincareOptimal(distr = list("triangle", 4990, 5010, 5000), only.values = FALSE, der = TRUE, method = method) out_8 <- PoincareOptimal(distr = list("triangle", 295, 305, 300), only.values = FALSE, der = TRUE, method = method) out_ <- list(out_1,out_2,out_3,out_4,out_5,out_6,out_7,out_8) c2 <- c2der <- c2tot <- c2totder <- rep(0,d) for (i in 1:d){ m <- diag(1,d,d) ; m[,i] <- 1 for (j in 1:d){ cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], design = x, output = y, outputGrad = NULL, inputIndex = i, der = FALSE) c2tot[i] <- c2tot[i] + cc if (j == i) c2[i] <- cc cc <- PoincareChaosSqCoef(PoincareEigen = out_, multiIndex = m[j,], design = x, output = y, outputGrad = dy, inputIndex = i, der = TRUE) c2totder[i] <- c2totder[i] + cc if (j == i) c2der[i] <- cc } } print("Lower bounds of first-order Sobol' indices without derivatives:") print(c2/var(y)) print("Lower bounds of first-order Sobol' indices with derivatives:") print(c2der/var(y)) print("Lower bounds of total Sobol' indices without derivatives:") print(c2tot/var(y)) print("Lower bounds of total Sobol' indices with derivatives:") print(c2totder/var(y))
A DGSM is a sensitivity index relying on the integral (over the space domain of the input variables) of the squared derivatives of a model output with respect to one model input variable. The product between a DGSM and a Poincare Constant (Roustant et al., 2014: Roustant et al., 2017) gives an upper bound of the total Sobol' index corresponding to the same input (Lamboni et al., 2013; Kucherenko and Iooss, 2016).
This Poincare constant depends on the type of probability distribution of the input variable. In the particular case of log-concave distribution, analytical formulas are available for double-exponential transport by the way of the median value (Lamboni et al., 2013). For truncated log-concave distributions, different formulas are available (Roustant et al., 2014). For general distributions (truncated or not), some Poincare constants can be computed via a relatively simple optimization process using different formula coming from transport inequalities (Roustant et al., 2017).
Notice that the analytical formula based on the log-concave law cases is a subcase of the
double-exponential transport. In all cases, with this function, the smallest constant is obtained using
the logistic transport formula. PoincareOptimal
allows to obtained the best (optimal)
constant using another (spectral) method.
IMPORTANT: This program is useless for the two following input variable distributions:
uniform on interval: The optimal Poincare constant is
.
normal with a standard deviation : The optimal Poincare constant is
.
PoincareConstant(dfct=dnorm, qfct=qnorm, pfct=pnorm, logconcave=FALSE, transport="logistic", optimize.interval=c(-100, 100), truncated=FALSE, min=0, max=1, ...)
PoincareConstant(dfct=dnorm, qfct=qnorm, pfct=pnorm, logconcave=FALSE, transport="logistic", optimize.interval=c(-100, 100), truncated=FALSE, min=0, max=1, ...)
dfct |
the probability density function of the input variable |
qfct |
the quantile function of the input variable |
pfct |
the distribution function of the input variable |
logconcave |
logical value: TRUE for a log-concave distribution (analyical formula will be used). Requires argument 'dfct' and 'qfct'. FALSE (default value) means that the calculations will be performed using transport-based formulas (applicable for log-concave and non-log concave cases) |
transport |
If logconcave=FALSE, choice of the transport inequalities to be used: "double_exp" (default value) for double exponential transport and "logistic" for logistic transport". Requires argument 'dfct' and 'pfct' |
optimize.interval |
In the transport-based case (logconcave=FALSE), a vector containing the end-points of the interval to be searched for the maximum of the function to be optimized |
truncated |
logical value: TRUE for a truncated distribution. Default value is FALSE |
min |
the minimal bound in the case of a truncated distribution |
max |
the maximal bound in the case of a truncated distribution |
... |
additional arguments |
In the case of truncated distributions (truncated=TRUE), in addition to the min and max arguments: - the truncated distribution name has to be passed in the 'dfct' and 'pfct' arguments if logconcave=FALSE, - the non-truncated distribution name has to be passed in the 'dfct' and 'qfct' arguments if logconcave=TRUE. Moreover, if min and max are finite, optimize.interval is required to be defined as c(min,max).
PoincareConstant
returns the value of the Poincare constant.
Jana Fruth, Bertrand Iooss and Olivier Roustant
S. Kucherenko and B. Iooss, Derivative-based global sensitivity measures, In: R. Ghanem, D. Higdon and H. Owhadi (eds.), Handbook of Uncertainty Quantification, 2016.
M. Lamboni, B. Iooss, A-L. Popelin and F. Gamboa, Derivative-based global sensitivity measures: General links with Sobol' indices and numerical tests, Mathematics and Computers in Simulation, 87:45-54, 2013.
O. Roustant, F. Barthe and B. Iooss, Poincare inequalities on intervals - application to sensitivity analysis, Electronic Journal of Statistics, Vol. 11, No. 2, 3081-3119, 2017.
O. Roustant, J. Fruth, B. Iooss and S. Kuhnt, Crossed-derivative-based sensitivity measures for interaction screening, Mathematics and Computers in Simulation, 105:105-118, 2014.
# Exponential law (log-concave) PoincareConstant(dfct=dexp,qfct=qexp,pfct=NULL,rate=1, logconcave=TRUE) # log-concave assumption PoincareConstant(dfct=dexp,qfct=NULL,pfct=pexp,rate=1, optimize.interval=c(0, 15)) # logistic transport approach # Weibull law (log-concave) PoincareConstant(dfct=dweibull,qfct=NULL,pfct=pweibull, optimize.interval=c(0, 15),shape=1,scale=1) # logistic transport approach # Triangular law (log-concave) library(triangle) PoincareConstant(dfct=dtriangle, qfct=qtriangle, pfct=NULL, a=-1, b=1, c=0, logconcave=TRUE) # log-concave assumption PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, transport="double_exp", optimize.interval=c(-1,1)) # Double-exp transport PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, optimize.interval=c(-1,1)) # Logistic transport calculation # Normal N(0,1) law truncated on [-1.87,+infty] PoincareConstant(dfct=dnorm,qfct=qnorm,pfct=pnorm,mean=0,sd=1,logconcave=TRUE, transport="double_exp", truncated=TRUE, min=-1.87, max=999) # log-concave hyp # Double-exponential transport approach PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, transport="double_exp", optimize.interval=c(-1.87,20)) # Logistic transport approach PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, optimize.interval=c(-1.87,20)) # Gumbel law (log-concave) library(evd) PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=NULL, loc=0, scale=1, logconcave=TRUE, transport="double_exp") # log-concave assumption PoincareConstant(dfct=dgumbel, qfct=NULL, pfct=pgumbel, loc=0, scale=1, transport="double_exp", optimize.interval=c(-3,20)) # Double-exp transport PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, optimize.interval=c(-3,20)) # Logistic transport approach # Truncated Gumbel law (log-concave) # Double-exponential transport approach PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, logconcave=TRUE, transport="double_exp", truncated=TRUE, min=-0.92, max=3.56) # log-concave assumption PoincareConstant(dfct=dgumbel.trunc, qfct=NULL, pfct=pgumbel.trunc, loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, transport="double_exp", optimize.interval=c(-0.92,3.56)) # Logistic transport approach PoincareConstant(dfct=dgumbel.trunc, qfct=qgumbel.trunc, pfct=pgumbel.trunc, loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, optimize.interval=c(-0.92,3.56))
# Exponential law (log-concave) PoincareConstant(dfct=dexp,qfct=qexp,pfct=NULL,rate=1, logconcave=TRUE) # log-concave assumption PoincareConstant(dfct=dexp,qfct=NULL,pfct=pexp,rate=1, optimize.interval=c(0, 15)) # logistic transport approach # Weibull law (log-concave) PoincareConstant(dfct=dweibull,qfct=NULL,pfct=pweibull, optimize.interval=c(0, 15),shape=1,scale=1) # logistic transport approach # Triangular law (log-concave) library(triangle) PoincareConstant(dfct=dtriangle, qfct=qtriangle, pfct=NULL, a=-1, b=1, c=0, logconcave=TRUE) # log-concave assumption PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, transport="double_exp", optimize.interval=c(-1,1)) # Double-exp transport PoincareConstant(dfct=dtriangle, qfct=NULL, pfct=ptriangle, a=-1, b=1, c=0, optimize.interval=c(-1,1)) # Logistic transport calculation # Normal N(0,1) law truncated on [-1.87,+infty] PoincareConstant(dfct=dnorm,qfct=qnorm,pfct=pnorm,mean=0,sd=1,logconcave=TRUE, transport="double_exp", truncated=TRUE, min=-1.87, max=999) # log-concave hyp # Double-exponential transport approach PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, transport="double_exp", optimize.interval=c(-1.87,20)) # Logistic transport approach PoincareConstant(dfct=dnorm.trunc, qfct=qnorm.trunc, pfct=pnorm.trunc, mean=0, sd=1, truncated=TRUE, min=-1.87, max=999, optimize.interval=c(-1.87,20)) # Gumbel law (log-concave) library(evd) PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=NULL, loc=0, scale=1, logconcave=TRUE, transport="double_exp") # log-concave assumption PoincareConstant(dfct=dgumbel, qfct=NULL, pfct=pgumbel, loc=0, scale=1, transport="double_exp", optimize.interval=c(-3,20)) # Double-exp transport PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, optimize.interval=c(-3,20)) # Logistic transport approach # Truncated Gumbel law (log-concave) # Double-exponential transport approach PoincareConstant(dfct=dgumbel, qfct=qgumbel, pfct=pgumbel, loc=0, scale=1, logconcave=TRUE, transport="double_exp", truncated=TRUE, min=-0.92, max=3.56) # log-concave assumption PoincareConstant(dfct=dgumbel.trunc, qfct=NULL, pfct=pgumbel.trunc, loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, transport="double_exp", optimize.interval=c(-0.92,3.56)) # Logistic transport approach PoincareConstant(dfct=dgumbel.trunc, qfct=qgumbel.trunc, pfct=pgumbel.trunc, loc=0, scale=1, truncated=TRUE, min=-0.92, max=3.56, optimize.interval=c(-0.92,3.56))
A DGSM is a sensitivity index relying on the integral (over the space domain of the input variables) of the squared derivatives of a model output with respect to one model input variable. The product between a DGSM and a Poincare Constant (Roustant et al., 2014: Roustant et al., 2017), on the type of probability distribution of the input variable, gives an upper bound of the total Sobol' index corresponding to the same input (Lamboni et al., 2013; Kucherenko and Iooss, 2016).
This function provides the optimal Poincare constant as explained in Roustant et al. (2017). It solves numerically the spectral problem corresponding to the Poincare inequality, with Neumann conditions. The differential equation is f” - V'f'= - lambda f with f'(a) = f'(b) = 0. In addition, all the spectral decomposition can be returned by the function. The eigenvalues are sorted in ascending order, starting from zero. The information corresponding to the optimal constant is thus given in the second column.
IMPORTANT: This program is useless for the two following input variable distributions:
uniform on interval: The optimal Poincare constant is
.
normal with a standard deviation : The optimal Poincare constant is
.
PoincareOptimal(distr=list("unif",c(0,1)), min=NULL, max=NULL, n = 500, method = c("quadrature", "integral"), only.values = TRUE, der = FALSE, plot = FALSE, ...)
PoincareOptimal(distr=list("unif",c(0,1)), min=NULL, max=NULL, n = 500, method = c("quadrature", "integral"), only.values = TRUE, der = FALSE, plot = FALSE, ...)
distr |
a list or a function corresponding to the probability distribution.
|
min |
see below |
max |
[min,max]: interval on which the distribution is truncated. Choose low and high quantiles in case of unbounded distribution. Choose NULL for uniform and triangular distributions |
n |
number of discretization steps |
method |
method of integration: "quadrature" (default value) uses the trapez quadrature (close and quicker), "integral" is longer but does not make any approximation |
only.values |
if TRUE, only eigen values are computed and returned, otherwise both eigenvalues and eigenvectors are returned (default value is TRUE) |
der |
if TRUE, compute the eigenfunction derivatives (default value is FALSE) |
plot |
logical:if TRUE and only.values=FALSE, plots a minimizer of the Rayleigh ratio (default value is FALSE) |
... |
additional arguments |
For the uniform, normal, triangular and Gumbel distributions, the optimal constants are computed on the standardized correponding distributions (for a better numerical efficiency). In these cases, the return optimal constant and eigenvalues correspond to original distributions.
PoincareOptimal
returns a list containing:
opt |
the optimal Poincare constant |
values |
the eigenvalues in increasing order, starting from 0. Thus, the second one is the spectral gap, equal to the inverse of the Poincare constant |
vectors |
the values of eigenfunctions at |
der |
the values of eigenfunction derivatives at |
knots |
a sequence of length |
Olivier Roustant and Bertrand Iooss
O. Roustant, F. Barthe and B. Iooss, Poincare inequalities on intervals - application to sensitivity analysis, Electronic Journal of Statistics, Vol. 11, No. 2, 3081-3119, 2017.
O Roustant, F. Gamboa, B Iooss. Parseval inequalities and lower bounds # for variance-based sensitivity indices. 2019. hal-02140127
PoincareConstant, PoincareChaosSqCoef
# uniform on [a, b] a <- -1 ; b <- 1 out <- PoincareOptimal(distr = list("unif", a, b)) cat("Poincare constant (theory -- estimated):", (b-a)^2/pi^2, "--", out$opt, "\n") # truncated standard normal on [-1, 1] # the optimal Poincare constant is then equal to 1/3, # as -1 and 1 are consecutive roots of the 2nd Hermite polynomial X*X - 1. out <- PoincareOptimal(distr = dnorm, min = -1, max = 1, plot = TRUE, only.values = FALSE) cat("Poincare constant (theory -- estimated):", 1/3, "--", out$opt, "\n") # truncated standard normal on [-1.87, +infty] out <- PoincareOptimal(distr = list("norm", 0, 1), min = -1.87, max = 5, method = "integral", n = 500) print(out$opt) # truncated Gumbel(0,1) on [-0.92, 3.56] library(evd) out <- PoincareOptimal(distr = list("gumbel", 0, 1), min = -0.92, max = 3.56, method = "integral", n = 500) print(out$opt) # symetric triangular [-1,1] library(triangle) out <- PoincareOptimal(distr = list("triangle", -1, 1, 0), min = NULL, max = NULL) cat("Poincare constant (theory -- estimated):", 0.1729, "--", out$opt, "\n") # Lognormal distribution out <- PoincareOptimal(distr = list("lognorm", 1, 2), min = 3, max = 10, only.values = FALSE, plot = TRUE, method = "integral") print(out$opt) ## ------------------------------- ## Illustration for eigenfunctions on the uniform distribution ## (corresponds to Fourier series) b <- 1 a <- -b out <- PoincareOptimal(distr = list("unif", a, b), only.values = FALSE, der = TRUE, method = "quad") # Illustration for 3 eigenvalues par(mfrow = c(3,2)) eigenNumber <- 1:3 # eigenvalue number for (k in eigenNumber[1:3]){ # keep the 3 first ones (for graphics) plot(out$knots, out$vectors[, k + 1], type = "l", ylab = "", main = paste("Eigenfunction", k), xlab = paste("Eigenvalue:", round(out$values[k+1], digits = 3))) sgn <- sign(out$vectors[1, k + 1]) lines(out$knots, sgn * sqrt(2) * cos(pi * k * (out$knots/(b-a) + 0.5)), col = "red", lty = "dotted") plot(out$knots, out$der[, k + 1], type = "l", ylab = "", main = paste("Eigenfunction derivative", k), xlab = "") sgn <- sign(out$vectors[1, k + 1]) lines(out$knots, - sgn * sqrt(2) / (b-a) * pi * k * sin(pi * k * (out$knots/(b-a) + 0.5)), col = "red", lty = "dotted") } # how to create a function for one eigenfunction and eigenvalue, # given N values eigenFun <- approxfun(x = out$knots, y = out$vectors[, 2]) eigenDerFun <- approxfun(x = out$knots, y = out$der[, 2]) x <- runif(n = 3, min = -1/2, max = 1/2) eigenFun(x) eigenDerFun(x)
# uniform on [a, b] a <- -1 ; b <- 1 out <- PoincareOptimal(distr = list("unif", a, b)) cat("Poincare constant (theory -- estimated):", (b-a)^2/pi^2, "--", out$opt, "\n") # truncated standard normal on [-1, 1] # the optimal Poincare constant is then equal to 1/3, # as -1 and 1 are consecutive roots of the 2nd Hermite polynomial X*X - 1. out <- PoincareOptimal(distr = dnorm, min = -1, max = 1, plot = TRUE, only.values = FALSE) cat("Poincare constant (theory -- estimated):", 1/3, "--", out$opt, "\n") # truncated standard normal on [-1.87, +infty] out <- PoincareOptimal(distr = list("norm", 0, 1), min = -1.87, max = 5, method = "integral", n = 500) print(out$opt) # truncated Gumbel(0,1) on [-0.92, 3.56] library(evd) out <- PoincareOptimal(distr = list("gumbel", 0, 1), min = -0.92, max = 3.56, method = "integral", n = 500) print(out$opt) # symetric triangular [-1,1] library(triangle) out <- PoincareOptimal(distr = list("triangle", -1, 1, 0), min = NULL, max = NULL) cat("Poincare constant (theory -- estimated):", 0.1729, "--", out$opt, "\n") # Lognormal distribution out <- PoincareOptimal(distr = list("lognorm", 1, 2), min = 3, max = 10, only.values = FALSE, plot = TRUE, method = "integral") print(out$opt) ## ------------------------------- ## Illustration for eigenfunctions on the uniform distribution ## (corresponds to Fourier series) b <- 1 a <- -b out <- PoincareOptimal(distr = list("unif", a, b), only.values = FALSE, der = TRUE, method = "quad") # Illustration for 3 eigenvalues par(mfrow = c(3,2)) eigenNumber <- 1:3 # eigenvalue number for (k in eigenNumber[1:3]){ # keep the 3 first ones (for graphics) plot(out$knots, out$vectors[, k + 1], type = "l", ylab = "", main = paste("Eigenfunction", k), xlab = paste("Eigenvalue:", round(out$values[k+1], digits = 3))) sgn <- sign(out$vectors[1, k + 1]) lines(out$knots, sgn * sqrt(2) * cos(pi * k * (out$knots/(b-a) + 0.5)), col = "red", lty = "dotted") plot(out$knots, out$der[, k + 1], type = "l", ylab = "", main = paste("Eigenfunction derivative", k), xlab = "") sgn <- sign(out$vectors[1, k + 1]) lines(out$knots, - sgn * sqrt(2) / (b-a) * pi * k * sin(pi * k * (out$knots/(b-a) + 0.5)), col = "red", lty = "dotted") } # how to create a function for one eigenfunction and eigenvalue, # given N values eigenFun <- approxfun(x = out$knots, y = out$vectors[, 2]) eigenDerFun <- approxfun(x = out$knots, y = out$der[, 2]) x <- runif(n = 3, min = -1/2, max = 1/2) eigenFun(x) eigenDerFun(x)
qosa
implements the estimation of first-order quantile-oriented sensitivity indices
as defined in Fort et al. (2016) with a kernel-based estimator of conditonal probability density functions
closely related to the one proposed by Maume-Deschamps and Niang (2018).
qosa
also supports a kernel-based estimation of Sobol first-order indices (i.e. Nadaraya-Watson).
qosa(model = NULL, X1, X2 = NULL, type = "quantile", alpha = 0.1, split.sample = 2/3, nsample = 1e4, nboot = 0, conf = 0.95, ...) ## S3 method for class 'qosa' tell(x, y = NULL, ...) ## S3 method for class 'qosa' print(x, ...) ## S3 method for class 'qosa' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'qosa' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
qosa(model = NULL, X1, X2 = NULL, type = "quantile", alpha = 0.1, split.sample = 2/3, nsample = 1e4, nboot = 0, conf = 0.95, ...) ## S3 method for class 'qosa' tell(x, y = NULL, ...) ## S3 method for class 'qosa' print(x, ...) ## S3 method for class 'qosa' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'qosa' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
a random sample of the inputs used for the estimation of conditional probability density functions.
If |
X2 |
a random sample of the inputs used to evaluate the conditional probability density functions.
If NULL, it is constructed with the last |
type |
a string specifying which first-order sensitivity indices must be estimated: quantile-oriented indices ( |
alpha |
if |
split.sample |
if |
nsample |
the number of samples from the conditional probability density functions used
to estimate the conditional quantiles (if |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
Quantile-oriented sensitivty indices were defined as a special case of sensitivity indices based on contrast functions in Fort et al. (2016).
The estimator used by qosa
follows closely the one proposed by Maume-Deschamps & Niang (2018).
The only difference is that Maume-Deschamps and Niang (2018) use the following kernel-based estimate of the conditional cumulative distribution function:
whereas we use
meaning that is replaced by
where
is the cumulative distribution function of the standard normal distribution (since kernel
is Gaussian).
The two definitions thus coincide when
. Our formula arises from a kernel density estimator of the joint pdf with a diagonal bandwidth.
In a future version, it will be genralized to a general bandwidth matrix for improved performance.
qosa
returns a list of class "qosa"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
X1 |
a |
X |
a |
y |
a vector of model responses. |
S |
the estimations of the Sobol' sensitivity indices. |
Sebastien Da Veiga
Fort, J. C., Klein, T., and Rachdi, N. (2016). New sensitivity analysis subordinated to a contrast. Communications in Statistics-Theory and Methods, 45(15), 4349-4364.
Maume-Deschamps, V., and Niang, I. (2018). Estimation of quantile oriented sensitivity indices. Statistics & Probability Letters, 134, 122-127.
library(ks) library(ggplot2) library(boot) # Test case : difference of two exponential distributions (Fort et al. (2016)) # We use two samples with different sizes n1 <- 5000 X1 <- data.frame(matrix(rexp(2 * n1,1), nrow = n1)) n2 <- 1000 X2 <- data.frame(matrix(rexp(2 * n2,1), nrow = n2)) Y1 <- X1[,1] - X1[,2] Y2 <- X2[,1] - X2[,2] x <- qosa(model = NULL, X1, X2, type = "quantile", alpha = 0.1) tell(x,c(Y1,Y2)) print(x) ggplot(x) # Test case : difference of two exponential distributions (Fort et al. (2016)) # We use only one sample n <- 1000 # put n=10000 for more consistency X <- data.frame(matrix(rexp(2 * n,1), nrow = n)) Y <- X[,1] - X[,2] x <- qosa(model = NULL, X1 = X, type = "quantile", alpha = 0.7) tell(x,Y) print(x) ggplot(x) # Test case : the Ishigami function # We estimate first-order Sobol' indices (by specifying 'mean') # Next lines are put in comment because too long fro CRAN tests #n <- 5000 #nboot <- 50 #X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) #x <- qosa(model = ishigami.fun, X1 = X, type = "mean", nboot = nboot) #print(x) #ggplot(x)
library(ks) library(ggplot2) library(boot) # Test case : difference of two exponential distributions (Fort et al. (2016)) # We use two samples with different sizes n1 <- 5000 X1 <- data.frame(matrix(rexp(2 * n1,1), nrow = n1)) n2 <- 1000 X2 <- data.frame(matrix(rexp(2 * n2,1), nrow = n2)) Y1 <- X1[,1] - X1[,2] Y2 <- X2[,1] - X2[,2] x <- qosa(model = NULL, X1, X2, type = "quantile", alpha = 0.1) tell(x,c(Y1,Y2)) print(x) ggplot(x) # Test case : difference of two exponential distributions (Fort et al. (2016)) # We use only one sample n <- 1000 # put n=10000 for more consistency X <- data.frame(matrix(rexp(2 * n,1), nrow = n)) Y <- X[,1] - X[,2] x <- qosa(model = NULL, X1 = X, type = "quantile", alpha = 0.7) tell(x,Y) print(x) ggplot(x) # Test case : the Ishigami function # We estimate first-order Sobol' indices (by specifying 'mean') # Next lines are put in comment because too long fro CRAN tests #n <- 5000 #nboot <- 50 #X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) #x <- qosa(model = ishigami.fun, X1 = X, type = "mean", nboot = nboot) #print(x) #ggplot(x)
sb
implements the Sequential Bifurcations screening
method (Bettonvil and Kleijnen 1996).
sb(p, sign = rep("+", p), interaction = FALSE) ## S3 method for class 'sb' ask(x, i = NULL, ...) ## S3 method for class 'sb' tell(x, y, ...) ## S3 method for class 'sb' print(x, ...) ## S3 method for class 'sb' plot(x, ...)
sb(p, sign = rep("+", p), interaction = FALSE) ## S3 method for class 'sb' ask(x, i = NULL, ...) ## S3 method for class 'sb' tell(x, y, ...) ## S3 method for class 'sb' print(x, ...) ## S3 method for class 'sb' plot(x, ...)
p |
number of factors. |
sign |
a vector fo length |
interaction |
a boolean, |
x |
a list of class |
y |
a vector of model responses. |
i |
an integer, used to force a wanted bifurcation instead of that proposed by the algorithm. |
... |
not used. |
The model without interaction is
while the model with interactions is
In both cases, the factors are assumed to be uniformly distributed on
. This is a difference with Bettonvil
et al. where the factors vary across
in the former
case, while
in the latter.
Another difference with Bettonvil et al. is that in the current implementation, the groups are splitted right in the middle.
sb
returns a list of class "sb"
, containing all
the input arguments detailed before, plus the following components:
i |
the vector of bifurcations. |
y |
the vector of observations. |
ym |
the vector of mirror observations (model with interactions only). |
The groups effects can be displayed with the print
method.
Gilles Pujol
B. Bettonvil and J. P. C. Kleijnen, 1996, Searching for important factors in simulation models with many factors: sequential bifurcations, European Journal of Operational Research, 96, 180–194.
# a model with interactions p <- 50 beta <- numeric(length = p) beta[1:5] <- runif(n = 5, min = 10, max = 50) beta[6:p] <- runif(n = p - 5, min = 0, max = 0.3) beta <- sample(beta) gamma <- matrix(data = runif(n = p^2, min = 0, max = 0.1), nrow = p, ncol = p) gamma[lower.tri(gamma, diag = TRUE)] <- 0 gamma[1,2] <- 5 gamma[5,9] <- 12 f <- function(x) { return(sum(x * beta) + (x %*% gamma %*% x))} # 10 iterations of SB sa <- sb(p, interaction = TRUE) for (i in 1 : 10) { x <- ask(sa) y <- list() for (i in names(x)) { y[[i]] <- f(x[[i]]) } tell(sa, y) } print(sa) plot(sa)
# a model with interactions p <- 50 beta <- numeric(length = p) beta[1:5] <- runif(n = 5, min = 10, max = 50) beta[6:p] <- runif(n = p - 5, min = 0, max = 0.3) beta <- sample(beta) gamma <- matrix(data = runif(n = p^2, min = 0, max = 0.1), nrow = p, ncol = p) gamma[lower.tri(gamma, diag = TRUE)] <- 0 gamma[1,2] <- 5 gamma[5,9] <- 12 f <- function(x) { return(sum(x * beta) + (x %*% gamma %*% x))} # 10 iterations of SB sa <- sb(p, interaction = TRUE) for (i in 1 : 10) { x <- ask(sa) y <- list() for (i in names(x)) { y[[i]] <- f(x[[i]]) } tell(sa, y) } print(sa) plot(sa)
sensiFdiv
conducts a density-based sensitivity
analysis where the impact of an input variable is defined
in terms of dissimilarity between the original output density function
and the output density function when the input variable is fixed.
The dissimilarity between density functions is measured with Csiszar f-divergences.
Estimation is performed through kernel density estimation and
the function kde
of the package ks
.
sensiFdiv(model = NULL, X, fdiv = "TV", nboot = 0, conf = 0.95, ...) ## S3 method for class 'sensiFdiv' tell(x, y = NULL, ...) ## S3 method for class 'sensiFdiv' print(x, ...) ## S3 method for class 'sensiFdiv' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sensiFdiv' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sensiFdiv(model = NULL, X, fdiv = "TV", nboot = 0, conf = 0.95, ...) ## S3 method for class 'sensiFdiv' tell(x, y = NULL, ...) ## S3 method for class 'sensiFdiv' print(x, ...) ## S3 method for class 'sensiFdiv' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sensiFdiv' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X |
a matrix or |
fdiv |
a string or a list of strings specifying the Csiszar f-divergence to be used. Available choices are "TV" (Total-Variation), "KL" (Kullback-Leibler), "Hellinger" and "Chi2" (Neyman chi-squared). |
nboot |
the number of bootstrap replicates |
conf |
the confidence level for confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
Some of the Csiszar f-divergences produce sensitivity indices that have already been studied in the context of sensitivity analysis. In particular, "TV" leads to the importance measure proposed by Borgonovo (2007) (up to a constant), "KL" corresponds to the mutual information (Krzykacz-Hausmann 2001) and "Chi2" produces the squared-loss mutual information. See Da Veiga (2015) for details.
sensiFdiv
returns a list of class "sensiFdiv"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
S |
the estimations of the Csiszar f-divergence sensitivity indices. If several divergences have been selected, Sis a list where each element encompasses the estimations of the sensitivity indices for one of the divergence. |
Sebastien Da Veiga, Snecma
Borgonovo E. (2007), A new uncertainty importance measure, Reliability Engineering and System Safety 92(6), 771–784.
Da Veiga S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283–1305.
Krzykacz-Hausmann B. (2001), Epistemic sensitivity analysis based on the concept of entropy, Proceedings of SAMO2001, 53–57.
library(ks) # Test case : the non-monotonic Sobol g-function n <- 100 X <- data.frame(matrix(runif(8 * n), nrow = n)) # Density-based sensitivity analysis # the next lines are put in comment because too long for CRAN tests #x <- sensiFdiv(model = sobol.fun, X = X, fdiv = c("TV","KL"), nboot=30) #print(x) #library(ggplot2) #ggplot(x)
library(ks) # Test case : the non-monotonic Sobol g-function n <- 100 X <- data.frame(matrix(runif(8 * n), nrow = n)) # Density-based sensitivity analysis # the next lines are put in comment because too long for CRAN tests #x <- sensiFdiv(model = sobol.fun, X = X, fdiv = c("TV","KL"), nboot=30) #print(x) #library(ggplot2) #ggplot(x)
sensiHSIC
allows to conduct global sensitivity analysis (GSA) in many different contexts thanks to several sensitivity measures based on the Hilbert-Schmidt independence criterion (HSIC). The so-called HSIC sensitivity indices depend on the kernels which are affected to the input variables as well as on the kernel which is affected to the output object
. For each random entity, a reproducing kernel Hilbert space (RKHS) is associated to the chosen kernel and allows to represent probability distributions in an appropriate function space. The influence of
on
is then measured through the distance between the joint probability distribution (true impact of
on
in the numerical model) and the product of marginal distributions (no impact of
on
) after embedding those distributions into a bivariate RKHS. Such a GSA approach has three main advantages:
The input variables may be correlated.
Any kind of mathematical object is supported (provided that a kernel function is available).
Accurate estimation is possible even in presence of very few data (no more than a hundred of input-output samples).
In sensiHSIC
, each input variable is expected to be scalar (either discrete or continous). On the contrary, a much wider collection of mathematical objects are supported for the output variable
. In particular,
may be:
A scalar output (either discrete or continous). If so, one single kernel family is selected among the kernel collection.
A low-dimensional vector output. If so, a kernel is selected for each output variable and the final output kernel is built by tensorization.
A high-dimensional vector output or a functional output. In this case, the output data must be seen as time series observations. Three different methods are proposed.
A preliminary dimension reduction may be performed. In order to achieve this, a principal component analysis (PCA) based on the empirical covariance matrix helps identify the first terms of the Kharunen-Loeve expansion. The final output kernel is then built in the reduced subspace where the functional data are projected.
The dynamic time warping (DTW) algorithm may be combined with a translation-invariant kernel. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
The global alignment kernel (GAK) may be directly applied on the functional data. Unlike the DTW kernel, it was specifically designed to deal with time series and to measure the impact of one input variable on the shape of the output curve.
Many variants of the original HSIC indices are now available in sensiHSIC
.
Normalized HSIC indices (R2-HSIC)
The original HSIC indices defined in Gretton et al. (2005) may be hard to interpret because they do not admit a universal upper bound. A first step to overcome this difficulty was enabled by Da Veiga (2015) with the definition of the R2-HSIC indices. The resulting sensitivity indices can no longer be greater than .
Target HSIC indices (T-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of target sensitivity analysis (TSA). The idea is to measure the impact of each input variable on a specific part of the output distribution (for example the upper tail). To achieve this, a weight function
is applied on
before computing HSIC indices.
Conditional HSIC indices (C-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of conditional sensitivity analysis (CSA). The idea is to measure the impact of each input variable on
when a specific event occurs. This conditioning event is detected on the output variable
and its amplitude is taken into account thanks to a weight function
.
HSIC-ANOVA indices
To improve the interpretability of HSIC indices, Da Veiga (2021) came up with an ANOVA-like decomposition that allows to establish a strict separation of main effects and interaction effects in the HSIC paradigm. The first-order and total-order HSIC-ANOVA indices are then defined just as the first-order and total-order Sobol' indices. However, this framework only holds if two major assumptions are verified: the input variables must be mutually independent and all input kernels must belong to the very restrained class of ANOVA kernels.
As most sensitivity measures, HSIC indices allow to rank the input variables according to the influence they have on the output variable
. They can also be used for a screening purpose, that is to distinguish between truly influential input variables and non-influential input variables. The user who is interested in this topic is invited to consult the documentation of the function
testHSIC
.
sensiHSIC(model = NULL, X, target = NULL, cond = NULL, kernelX = "rbf", paramX = NA, kernelY = "rbf", paramY = NA, estimator.type = "V-stat", nboot = 0, conf = 0.95, anova = list(obj = "no", is.uniform = TRUE), sensi = NULL, save.GM = list(KX = TRUE, KY = TRUE), ...) ## S3 method for class 'sensiHSIC' tell(x, y = NULL, ...) ## S3 method for class 'sensiHSIC' print(x, ...) ## S3 method for class 'sensiHSIC' plot(x, ylim = c(0, 1), ...)
sensiHSIC(model = NULL, X, target = NULL, cond = NULL, kernelX = "rbf", paramX = NA, kernelY = "rbf", paramY = NA, estimator.type = "V-stat", nboot = 0, conf = 0.95, anova = list(obj = "no", is.uniform = TRUE), sensi = NULL, save.GM = list(KX = TRUE, KY = TRUE), ...) ## S3 method for class 'sensiHSIC' tell(x, y = NULL, ...) ## S3 method for class 'sensiHSIC' print(x, ...) ## S3 method for class 'sensiHSIC' plot(x, ylim = c(0, 1), ...)
model |
A function, or a statistical model with a |
X |
A
|
target |
A list of options to perform TSA. At least,
|
cond |
A list of options to perform CSA. At least,
|
kernelX |
A string or a vector of
For each input variable, available choices include In addition, let us assume that all input variables are uniformly distributed on
|
paramX |
A scalar value or a vector of
|
kernelY |
A string, a vector of To deal with a scalar output or a low-dimensional vector output, it is advised to select one kernel per output dimension and to tensorize all selected kernels. In this case,
Have a look at To deal with a high-dimensional vector output or a functional output, it is advised to reduce dimension or to use a dedicated kernel. In this case,
|
paramY |
A scalar value or a vector of values with output kernel parameters.
In other cases, Case 1:
Case 2:
Case 3:
Case 4:
|
estimator.type |
A string specifying the kind of estimator used for HSIC indices. Available choices include |
nboot |
Number of bootstrap replicates. |
conf |
A scalar value (between |
anova |
A list of parameters to achieve an ANOVA-like decomposition based on HSIC indices. At least,
|
sensi |
An object of class
|
save.GM |
A list of parameters indicating whether Gram matrices have to be saved. The list
|
x |
An object of class |
y |
A |
ylim |
A vector of two values specifying the |
... |
Any other arguments for |
Let be an input-output pair. The kernels assigned to
and
are respectively denoted by
and
.
For many global sensitivity measures, the influence of on
is measured in the light of the probabilistic dependence that exists within the input-output pair
. For this, a dissimilarity measure is applied between the joint probability distribution (true impact of
and
in the numerical model) and the product of marginal distributions (no impact of
on
). For instance, Borgonovo's sensitivity measure is built upon the total variation distance between those two probability distributions. See Borgonovo and Plischke (2016) for further details.
The HSIC-based sensitivity measure can be understood in this way since the index results from the application of the Hilbert-Schmidt independence criterion (HSIC) on the pair
. This criterion is nothing but a special kind of dissimilarity measure between the joint probability distribution and the product of marginal distributions. This dissimilarity measure is called the maximum mean discrepancy (MMD) and its definition relies on the selected kernels
and
. According to the theory of reproducing kernels, every kernel
is related to a reproducing kernel Hilbert space (RKHS).Then, if
is affected to a random variable
, any probability distribution describing the random behavior of
may be represented within the induced RKHS. In this setup, the dissimilarity between the joint probability distribution and the product of marginal distributions is then measured through the squared norm of their images into the bivariate RKHS. The user is referred to Gretton et al. (2006) for additional details on the mathematical construction of HSIC indices.
In practice, it may be difficult to understand how measures dependence within
. An alternative definition relies on the concept of feature map. Let us recall that the value taken by a kernel function can always be seen as the scalar product of two feature functions lying in a feature space. Definition 1 in Gretton et al. (2005) introduces
as the Hilbert-Schmidt norm of a covariance-like operator between random features. For this reason, having access to the input and output feature maps may help identify the dependence patterns captured by
.
Kernels must be chosen very carefully. There exists a wide variety of kernels but only a few f them meet the needs of GSA. As is supposed to be a dependence measure, it must be equal to
if and only if
and
are independent. A sufficient condition to enable this equivalence is to take two characteristic kernels. The reader is referred to Fukumizu et al. (2004) for the mathematical definition of a characteristic kernel and to Sriperumbur et al. (2010) for an overview of the major related results. In particular:
The Gaussian kernel, the Laplace kernel, the Matern kernel and the Matern
kernel (all defined on
) are characteristic.
The transformed versions of the four abovementioned kernels (all defined on ) are characteristic.
All Sobolev kernels (defined on ) are characteristic.
The categorical kernel (defined on any discrete probability space) is characteristic.
Lemma 1 in Gretton et al. (2005) provides a third way of defining . Since the associated formula is only based on three expectation terms, the corresponding estimation procedures are very simple and they do not ask for a large amount of input-output samples to be accurate. Two kinds of estimators may be used for
: the V-statistic estimator (which is non negative, biased and asymptotically unbiased) or the U-statistic estimator (unbiased). For both estimators, the computational complexity is
where
is the sample size.
The user must always keep in mind the key steps leading to the estimation of :
Input samples are simulated and the corresponding output samples are computed with the numerical model.
An input kernel and an output kernel
are selected.
In case of target sensitivity analysis: output samples are transformed by means of a weight function .
The input and output Gram matrices are constructed.
In case of conditional sensitivity analysis: conditioning weights are computed by means of a weight function .
The final estimate is computed. It depends on the selected estimator type (either a U-statistic or a V-statistic).
All what follows is written for a scalar output but the same is true for any scalar input
.
Let denote the support of the output probability distribution. A kernel is a symmetric and positive definite function defined on the domain
. Different kernel families are available in
sensiHSIC
.
To deal with continuous probability distributions on , one can use:
The covariance kernel of the fractional Browian motion ("dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the dot-product kernel ("linear"
), the Matern kernel (
"matern3"
), the Matern kernel (
"matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
To deal with continuous probability distributions on , one can use:
Any of the abovementioned kernel (restricted to ).
The transformed exponential kernel ("laplace_anova"
), the transformed Matern kernel (
"matern3_anova"
), the transformed Matern kernel (
"matern5_anova"
), the transformed Gaussian kernel ("rbf_anova"
), the Sobolev kernel with smoothness parameter (
"sobolev1"
) and the Sobolev kernel with smoothness parameter (
"sobolev2"
).
To deal with any discrete probability distribution, the categorical kernel ("categ"
) must be used.
Two kinds of kernels must be distinguished:
Parameter-free kernels: the dot-product kernel ("linear"
), the Sobolev kernel with smoothness parameter (
"sobolev1"
) and the Sobolev kernel with smoothness parameter (
"sobolev2"
).
One-parameter kernels: the categorical kernel ("categ"
), the covariance kernel of the fractional Brownian motion kernel ("dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the transformed exponential kernel ("laplace_anova"
), the Matern kernel (
"matern3"
), the transformed Matern kernel (
"matern3_anova"
), the Matern kernel (
"matern5"
), the transformed Matern kernel (
"matern5_anova"
), the rationale quadratic kernel ("raquad"
), the Gaussian kernel ("rbf"
) and the transformed Gaussian kernel ("rbf_anova"
).
A major issue related to one-parameter kernels is how to set the parameter. It mainly depends on the role played by the parameter in the kernel expression.
For translation-invariant kernels and their ANOVA variants (that is all one-parameter kernels except "categ"
and "dcov"
), the parameter may be interpreted as a correlation length (or a scale parameter). The rule of thumb is to compute the empirical standard deviation of the provided samples.
For the covariance kernel of the fractional Brownian motion ("dcov"
), the parameter is an exponent. Default value is .
For the categorical kernel ("categ"
), the parameter has no physical sense. It is just a kind of binary encoding.
means the user wants to use the basic categorical kernel.
means the user wants to use the weighted variant of the categorical kernel.
Let us assume that the output vector is composed of
random variables
.
A kernel is affected to each output variable
and this leads to embed the
-th output probability distribution in a RKHS denoted by
. Then, the tensorization of
allows to build the final RKHS, that is the RKHS where the
-variate output probability distribution describing the overall random behavior of
will be embedded. In this situation:
The final output kernel is the tensor product of all output kernels.
The final output Gram matrix is the Hadamard product of all output Gram matrices.
Once the final output Gram matrix is built, HSIC indices can be estimated, just as in the case of a scalar output.
In sensiHSIC
, three different methods are proposed in order to compute HSIC-based sensitivity indices in presence of functional outputs.
Dimension reduction
This approach was initially proposed by Da Veiga (2015). The key idea is to approximate the random functional output by the first terms of its Kharunen-Loeve expansion. This can be achived with a principal component analysis (PCA) that is carried out on the empirical covariance matrix.
The eigenvectors (or principal directions) allow to approximate the (deterministic) functional terms involved in the Kharunen-Loeve decomposition.
The eigenvalues allow to determine how many principal directions are sufficient in order to accurately represent the random function by means of its truncated Kharunen-Loeve expansion. The key idea behind dimension reduction is to keep as few principal directions as possible while preserving a prescribed level of explained variance.
The principal components are the coordinates of the functional output in the low-dimensional subspace resulting from PCA. There are computed for all output samples (time series observations). See Le Maitre and Knio (2010) for more detailed explanations.
The last step consists in constructing a kernel in the reduced subspace. One single kernel family is selected and affected to all principal directions. Moreover, all kernel parameters are computed automatically (with appropriate rules of thumb). Then, several strategies may be considered.
The initial method described in Da Veiga (2015) is based on a direct tensorization. One can also decide to sum kernels.
This approach was improved by El Amri and Marrel (2021). For each principal direction, a weight coefficient (equal the ratio between the eigenvalue and the sum of all selected eigenvalues) is computed.
The principal components are multiplied by their respective weight coefficients before summing kernels or tensorizing kernels.
The kernels can also be directly applied on the principal components before being linearly combined according to the weight coefficients.
In sensiHSIC, all these strategies correspond to the following specifications in kernelY
:
Direct tensorization:
kernelY=list(method="PCA", combi="prod", position="nowhere")
Direct sum:
kernelY=list(method="PCA", combi="sum", position="nowhere")
Rescaled tensorization:
kernelY=list(method="PCA", combi="prod", position="intern")
Rescaled sum:
kernelY=list(method="PCA", combi="sum", position="intern")
Weighted linear combination:
kernelY=list(method="PCA", combi="sum", position="extern")
Dynamic Time Warping (DTW)
The DTW algorithm developed by Sakoe and Chiba (1978) can be combined with a translation-invariant kernel in order to create a kernel function for times series. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
Suitable translation-invariant kernels include the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the Matern kernel (
"matern3"
), the Matern kernel (
"matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
The user is warned against the fact that DTW-based kernels are not positive definite functions. As a consequence, many theoretical properties do not hold anymore for HSIC indices.
For faster computations, sensiHSIC
is using the function dtw_dismat
from the package incDTW
.
Global Alignment Kernel (GAK)
Unlike DTW-based kernels, the GAK is a positive definite function. This time-series kernel was originally introduced in Cuturi et al. (2007) and further investigated in Cuturi (2011). It was used to compute HSIC indices on a simplified compartmental epidemiological model in Da Veiga (2021).
For faster computations, sensiHSIC
is using the function gak
from the package dtwclust
.
In sensiHSIC
, two GAK-related parameters may be tuned by the user with paramY
. They exactly correspond to the arguments sigma
and window.size
in the function gak
.
No doubt interpretability is the major drawback of HSIC indices. This shortcoming led Da Veiga (2021) to introduce a normalized version of . The so-called R2-HSIC index is thus defined as the ratio between
and the square root of a normalizing constant equal to
.
This normalized sensitivity measure is inspired from the distance correlation measure proposed by Szekely et al. (2007) and the resulting sensitivity indices are easier to interpret since they all fall in the interval .
T-HSIC indices were designed by Marrel and Chabridon (2021) for TSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The main idea of TSA is to measure the influence of each input variable on a modified version of
. To do so, a preliminary mathematical transform
(called the weight function) is applied on
. The collection of HSIC indices is then estimated with respect to
. Here are two examples of situations where TSA is particularly relevant:
How to measure the impact of on the upper values taken by
(for example the values above a given threshold
)?
To answer this question, one may take (zero-thresholding).
This can be specified in sensiHSIC
with target=list(c=T, type="zeroTh", upper=TRUE)
.
How to measure the influence of on the occurrence of the event
?
To answer this question, one may take (indicator-thresholding).
This can be specified in sensiHSIC
with target=list(c=T, type="indicTh", upper=FALSE)
.
In Marrel and Chabridon (2021), the two situations described above are referred to as "hard thresholding". To avoid using discontinuous weight functions, "smooth thresholding" may be used instead.
Spagnol et al. (2019): logistic transformation on both sides of the threshold .
Marrel and Chabridon (2021): exponential transformation above or below the threshold .
These two smooth relaxation functions depend on a tuning parameter that helps control smoothness. For further details, the user is invited to consult the documentation of the function weightTSA
.
Remarks:
When type="indicTh"
(indicator-thesholding), becomes a binary random variable. Accordingly, the output kernel selected in
kernelY
must be the categorical kernel.
In the spirit of R2-HSIC indices, T-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of .
T-HSIC indices can be very naturally combined with the HSIC-ANOVA decomposition proposed by Da Veiga (2021). As a consequence, the arguments target
and anova
in sensiHSIC
can be enabled simultaneously. Compared with basic HSIC indices, there are three main differences: the input variables must be mutually independent, ANOVA kernels must be used for all input variables and the output of interest is .
T-HSIC indices can be very naturally combined with the tests of independence proposed in testHSIC
. In this context, the null hypothesis is : "
and
are independent".
C-HSIC indices were designed by Marrel and Chabridon (2021) for CSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The idea is to measure the impact of each input variable on
when a specific event occurs. This conditioning event is defined on
thanks to a weight function
. In order to compute the conditioning weights,
is applied on the output samples and an empirical normalization is carried out (so that the overall sum of conditioning weights is equal to
). The conditioning weights are then combined with the simulated Gram matrices in order to estimate C-HSIC indices. All formulas can be found in Marrel and Chabridon (2021). Here is an exemple of a situation where CSA is particularly relevant:
Let us imagine that the event coincides with a system failure.
How to measure the influence of on
when failure occurs?
To answer this question, one may take (indicator-thresholding).
This can be specified in sensiHSIC
with cond=list(c=T, type="indicTh", upper=TRUE)
.
The three other weight functions proposed for TSA (namely "zeroTh"
, "logistic"
and "exp1side"
) can also be used but the role they play is less intuitive to understand. See Marrel and Chabridon (2021) for better explanations.
Remarks:
Unlike what is pointed out for TSA, when type="thresholding"
, the output of interest remains a continuous random variable. The categorical kernel is thus inappropriate. A continuous kernel must be used instead.
In the spirit of R2-HSIC indices, C-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of .
Only V-statistics are supported to estimate C-HSIC indices. The reason is because the normalized version of C-HSIC indices cannot always be estimated with U-statistics. In particular, the estimates of may be negative.
C-HSIC indices cannot be combined with the HSIC-ANOVA decomposition proposed in Da Veiga (2021). In fact, the conditioning operation is feared to introduce statistical dependence among input variables, which forbids using HSIC-ANOVA indices. As a consequence, the arguments cond
and anova
in sensiHSIC
cannot be enabled simultaneously.
C-HSIC indices can harly be combined with the tests of inpendence proposed in testHSIC
. This is only possible if type="indicTh"
. In this context, the null hypothesis is : "
and
are independent if the event described in
cond
occurs".
In comparison with HSIC indices, R2-HSIC indices are easier to interpret. However, in terms of interpretability, Sobol' indices remain much more convenient since they can be understood as shares of the total output variance. Such an interpretation is made possible by the Hoeffding decomposition, also known as ANOVA decomposition.
It was proved in Da Veiga (2021) that an ANOVA-like decomposition can be achived for HSIC indices under certain conditions:
The input variables must be mutually independent (which was not required to compute all other kinds of HSIC indices).
ANOVA kernels must be assigned to all input variables.
This ANOVA setup allows to establish a strict separation between main effects and interaction effects in the HSIC sense. The first-order and total-order HSIC-ANOVA indices are then defined in the same fashion than first-order and total-order Sobol' indices. It is worth noting that the HSIC-ANOVA normalizing constant is equal to and is thus different from the one used for R2-HSIC indices.
For a given probability measure , an ANOVA kernel
is a kernel that can rewritten
where
is an orthogonal kernel with respect to
. Among the well-known parametric families of probability distributions and kernel functions, there are very few examples of orthogonal kernels. One example is given by Sobolev kernels when there are matched with the uniform probability measure on [0,1]. See Wahba et al. (1995) for further details on Sobolev kernels.
Moreover, several strategies to construct orthogonal kernels from non-orthogonal kernels are recalled in Da Veiga (2021). One of them consists in translating the feature map so that the resulting kernel becomes centered at the prescribed probability measure . This can be done analytically for some basic kernels (Gaussian, exponential, Matern
and Matern
) when
is the uniform measure on
. See Section 9 in Ginsbourger et al. (2016) for the corresponding formulas.
In sensiHSIC
, ANOVA kernels are only available for the uniform probability measure on . This includes the Sobolev kernel with parameter
(
"sobolev1"
), the Sobolev kernel with parameter (
"sobolev2"
), the transformed Gaussian kernel ("rbf_anova"
), the transformed exponential kernel ("laplace_anova"
), the transformed Matern kernel (
"matern3_anova"
) and the transformed Matern kernel (
"matern5_anova"
).
As explained above, the HSIC-ANOVA indices can only be computed if all input variables are uniformly distributed on . Because of this limitation, a preliminary reformulation is needed if the GSA problem includes other kinds of input probability distributions. The probability integral transform (PIT) must be applied on each input variable
. In addition, all quantile functions must be encapsulated in the numerical model, which may lead to reconsider the way
model
is specified. In sensiHSIC
, if check=TRUE
is selected in anova
, it is checked that all input samples lie in . If this is not the case, a non-parametric rescaling (based on empirical distribution functions) is operated.
HSIC-ANOVA indices can be used for TSA. The only difference with GSA is the use of a weight function . On the contrary, CSA cannot be conducted with HSIC-ANOVA indices. Indeed, the conditioning operation is feared to introduce statistical independence among the input variables, which prevents using the HSIC-ANOVA approach.
sensiHSIC
returns a list of class "sensiHSIC"
. It contains all the input arguments detailed before, except sensi
which is not kept. It must be noted that some of them might have been altered, corrected or completed.
kernelX |
A vector of |
paramX |
A vector of |
kernelY |
A vector of
|
paramY |
A vector of values with output kernel parameters. Case 1:
Case 2:
Case 3:
Case 4:
|
More importantly, the list of class "sensiHSIC"
contains all expected results (output samples, sensitivity measures and conditioning weights).
call |
The matched call. |
y |
A |
HSICXY |
The estimated HSIC indices. |
S |
The estimated R2-HSIC indices (also called normalized HSIC indices). |
weights |
Only if |
Depending on what is specified in anova
, the list of class "sensiHSIC"
may also contain the following objects:
FO |
The estimated first-order HSIC-ANOVA indices. |
TO |
The estimated total-order HSIC-ANOVA indices. |
TO.num |
The estimated numerators of total-order HSIC-ANOVA indices. |
denom |
The estimated common denominator of HSIC-ANOVA indices. |
Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.
Borgonovo, E. and Plischke, E. (2016), Sensitivity analysis: a review of recent advances, European Journal of Operational Research, 248(3), 869-887.
Cuturi, M., Vert, J. P., Birkenes, O. and Matsui, T. (2007), A kernel for time series based on global alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 2, pp. II-413), IEEE.
Cuturi, M. (2011), Fast global alignment kernels, Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 929-936).
Da Veiga, S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283-1305.
Da Veiga, S. (2021). Kernel-based ANOVA decomposition and Shapley effects: application to global sensitivity analysis, arXiv preprint arXiv:2101.05487.
El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https:/cea.hal.science/cea-03406956/
Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.
Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N. and Lenz, N. (2016), On ANOVA decompositions of kernels and Gaussian random field paths, Monte Carlo and Quasi-Monte Carlo Methods (pp. 315-330), Springer, Cham.
Gretton, A., Bousquet, O., Smola, A., and Scholkopf, B. (2005), Measuring statistical dependence with Hilbert-Schmidt norms, International Conference on Algorithmic Learning Theory (pp. 63-77), Springer, Berlin, Heidelberg.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B. and Smola, A. (2006), A kernel method for the two-sample-problem, Advances in Neural Information Processing Systems, 19.
Le Maitre, O. and Knio, O. M. (2010), Spectral methods for uncertainty quantification with applications to computational fluid dynamics, Springer Science & Business Media.
Marrel, A. and Chabridon, V. (2021), Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214, 107711.
Sakoe, H. and Chiba, S. (1978), Dynamic programming algorithm optimization for spoken word recognition, IEEE International Conference on Acoustics, Speech and Signal, 26(1), 43-49.
Spagnol, A., Riche, R. L. and Veiga, S. D. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA Journal on Uncertainty Quantification, 7(2), 417-443.
Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2010), On the relation between universality, characteristic kernels and RKHS embedding of measures, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 773-780). JMLR Workshop and Conference Proceedings.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007), Measuring and testing dependence by correlation of distances, The Anals of Statistics, 35(6), 2769-2794.
Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995), Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy: the 1994 Neyman Memorial Lecture, The Annals of Statistics, 23(6), 1865-1895.
############################ ### HSIC indices for GSA ### ############################ # Test case 1: the Friedman function # --> 5 input variables ### GSA with a given model ### n <- 800 p <- 5 X <- matrix(runif(n*p), n, p) kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1") paramX <- c(0.2, 0.3, 0.4, NA, NA) # kernel for X1: Gaussian kernel with given parameter 0.2 # kernel for X2: Gaussian kernel with given parameter 0.3 # kernel for X3: exponential kernel with given parameter 0.4 # kernel for X4: exponential kernel with automatic computation of the parameter # kernel for X5: Sobolev kernel (r=1) with no parameter kernelY <- "raquad" paramY <- NA sensi <- sensiHSIC(model=friedman.fun, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY) print(sensi) plot(sensi) title("GSA for the Friedman function") ### GSA with given data ### Y <- friedman.fun(X) sensi <- sensiHSIC(model=NULL, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY) tell(sensi, y=Y) print(sensi) ### GSA from a prior object of class "sensiHSIC" ### new.sensi <- sensiHSIC(model=friedman.fun, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY, estimator.type="U-stat", sensi=sensi, save.GM=list(KX=FALSE, KY=FALSE)) print(new.sensi) # U-statistics are computed without rebuilding all Gram matrices. # Those Gram matrices are not saved a second time. ################################## ### HSIC-ANOVA indices for GSA ### ################################## # Test case 2: the Matyas function with Gaussian input variables # --> 3 input variables (including 1 dummy variable) n <- 10^3 p <- 2 X <- matrix(rnorm(n*p), n, p) # The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition. # Both first-order and total-order HSIC-ANOVA indices are expected. ### AUTOMATIC RESCALING ### kernelX <- "sobolev1" anova <- list(obj="both", is.uniform=FALSE) sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova) print(sensi.A) plot(sensi.A) title("GSA for the Matyas function") ### PROBLEM REFORMULATION ### U <- matrix(runif(n*p), n, p) new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) } kernelX <- "sobolev1" anova <- list(obj="both", is.uniform=TRUE) sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova) print(sensi.B) #################################### ### T-HSIC indices for target SA ### #################################### # Test case 3: the Sobol function # --> 8 input variables n <- 10^3 p <- 8 X <- matrix(runif(n*p), n, p) kernelY <- "categ" target <- list(c=0.4, type="indicTh") sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target) print(sensi) plot(sensi) title("TSA for the Sobol function") ######################################### ### C-HSIC indices for conditional SA ### ######################################### # Test case 3: the Sobol function # --> 8 input variables n <- 10^3 p <- 8 X <- matrix(runif(n*p), n, p) cond <- list(c=0.2, type="exp1side", upper=FALSE) sensi <- sensiHSIC(model=sobol.fun, X, cond=cond) print(sensi) plot(sensi) title("CSA for the Sobol function") ########################################## ### How to deal with discrete outputs? ### ########################################## # Test case 4: classification of the Ishigami output # --> 3 input variables # --> 3 categories classif <- function(X){ Ytemp <- ishigami.fun(X) Y <- rep(NA, n) Y[Ytemp<0] <- 0 Y[Ytemp>=0 & Ytemp<10] <- 1 Y[Ytemp>=10] <- 2 return(Y) } ### n <- 10^3 p <- 3 X <- matrix(runif(n*p, -pi, pi), n, p) kernelY <- "categ" paramY <- 0 sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY) print(sensi) plot(sensi) title("GSA for the classified Ishigami function") ############################################ ### How to deal with functional outputs? ### ############################################ # Test case 5: the arctangent temporal function # --> 3 input variables (including 1 dummy variable) n <- 500 p <- 3 X <- matrix(runif(n*p,-7,7), n, p) ### with a preliminary dimension reduction by PCA ### kernelY <- list(method="PCA", data.centering=TRUE, data.scaling=TRUE, fam="rbf", expl.var=0.95, combi="sum", position="extern") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("PCA-based GSA for the arctangent temporal function") ### with a kernel based on dynamic time warping ### kernelY <- list(method="DTW", fam="rbf") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("DTW-based GSA for the arctangent temporal function") ### with the global alignment kernel ### kernelY <- list(method="GAK") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("GAK-based GSA for the arctangent temporal function")
############################ ### HSIC indices for GSA ### ############################ # Test case 1: the Friedman function # --> 5 input variables ### GSA with a given model ### n <- 800 p <- 5 X <- matrix(runif(n*p), n, p) kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1") paramX <- c(0.2, 0.3, 0.4, NA, NA) # kernel for X1: Gaussian kernel with given parameter 0.2 # kernel for X2: Gaussian kernel with given parameter 0.3 # kernel for X3: exponential kernel with given parameter 0.4 # kernel for X4: exponential kernel with automatic computation of the parameter # kernel for X5: Sobolev kernel (r=1) with no parameter kernelY <- "raquad" paramY <- NA sensi <- sensiHSIC(model=friedman.fun, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY) print(sensi) plot(sensi) title("GSA for the Friedman function") ### GSA with given data ### Y <- friedman.fun(X) sensi <- sensiHSIC(model=NULL, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY) tell(sensi, y=Y) print(sensi) ### GSA from a prior object of class "sensiHSIC" ### new.sensi <- sensiHSIC(model=friedman.fun, X, kernelX=kernelX, paramX=paramX, kernelY=kernelY, paramY=paramY, estimator.type="U-stat", sensi=sensi, save.GM=list(KX=FALSE, KY=FALSE)) print(new.sensi) # U-statistics are computed without rebuilding all Gram matrices. # Those Gram matrices are not saved a second time. ################################## ### HSIC-ANOVA indices for GSA ### ################################## # Test case 2: the Matyas function with Gaussian input variables # --> 3 input variables (including 1 dummy variable) n <- 10^3 p <- 2 X <- matrix(rnorm(n*p), n, p) # The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition. # Both first-order and total-order HSIC-ANOVA indices are expected. ### AUTOMATIC RESCALING ### kernelX <- "sobolev1" anova <- list(obj="both", is.uniform=FALSE) sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova) print(sensi.A) plot(sensi.A) title("GSA for the Matyas function") ### PROBLEM REFORMULATION ### U <- matrix(runif(n*p), n, p) new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) } kernelX <- "sobolev1" anova <- list(obj="both", is.uniform=TRUE) sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova) print(sensi.B) #################################### ### T-HSIC indices for target SA ### #################################### # Test case 3: the Sobol function # --> 8 input variables n <- 10^3 p <- 8 X <- matrix(runif(n*p), n, p) kernelY <- "categ" target <- list(c=0.4, type="indicTh") sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target) print(sensi) plot(sensi) title("TSA for the Sobol function") ######################################### ### C-HSIC indices for conditional SA ### ######################################### # Test case 3: the Sobol function # --> 8 input variables n <- 10^3 p <- 8 X <- matrix(runif(n*p), n, p) cond <- list(c=0.2, type="exp1side", upper=FALSE) sensi <- sensiHSIC(model=sobol.fun, X, cond=cond) print(sensi) plot(sensi) title("CSA for the Sobol function") ########################################## ### How to deal with discrete outputs? ### ########################################## # Test case 4: classification of the Ishigami output # --> 3 input variables # --> 3 categories classif <- function(X){ Ytemp <- ishigami.fun(X) Y <- rep(NA, n) Y[Ytemp<0] <- 0 Y[Ytemp>=0 & Ytemp<10] <- 1 Y[Ytemp>=10] <- 2 return(Y) } ### n <- 10^3 p <- 3 X <- matrix(runif(n*p, -pi, pi), n, p) kernelY <- "categ" paramY <- 0 sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY) print(sensi) plot(sensi) title("GSA for the classified Ishigami function") ############################################ ### How to deal with functional outputs? ### ############################################ # Test case 5: the arctangent temporal function # --> 3 input variables (including 1 dummy variable) n <- 500 p <- 3 X <- matrix(runif(n*p,-7,7), n, p) ### with a preliminary dimension reduction by PCA ### kernelY <- list(method="PCA", data.centering=TRUE, data.scaling=TRUE, fam="rbf", expl.var=0.95, combi="sum", position="extern") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("PCA-based GSA for the arctangent temporal function") ### with a kernel based on dynamic time warping ### kernelY <- list(method="DTW", fam="rbf") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("DTW-based GSA for the arctangent temporal function") ### with the global alignment kernel ### kernelY <- list(method="GAK") sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY) print(sensi) plot(sensi) title("GAK-based GSA for the arctangent temporal function")
shapleyBlockEstimation
estimates the Shapley effects of a Gaussian linear model
when the parameters are unknown and when the number of inputs is large,
choosing the most likely block-diagonal structure of the covariance matrix.
shapleyBlockEstimationS(Beta, S, kappa=0, M=20, tol=10^(-6)) shapleyBlockEstimationX(X, Y, delta=NULL, M=20, tol=10^(-6))
shapleyBlockEstimationS(Beta, S, kappa=0, M=20, tol=10^(-6)) shapleyBlockEstimationX(X, Y, delta=NULL, M=20, tol=10^(-6))
Beta |
A vector containing the (estimated) coefficients of the linear model. |
S |
Empirical covariance matrix of the inputs. Has to be positive semi-definite matrix with same size that Beta. |
X |
Matrix containing an i.i.d. sample of the inputs. |
Y |
Vector containing the corresponding i.i.d. sample of the (noisy) output. |
kappa |
The positive penalization coefficient that promotes block-diagonal matrices. It is advised to choose |
delta |
Positive number that fixes the positive penalization coefficient
|
M |
Maximal size of the estimate of the block-diagonal structure. The computation time grows exponentially with |
tol |
A relative tolerance to detect zero singular values of Sigma. |
If kappa = 0
or if delta = NULL
, there is no penalization.
It is advised to choose M
smaller or equal than 20. For M
larger or equal than 25, the computation is very long.
shapleyBlockEstimationS
and shapleyblockEstimationX
return a list containing the following compopents:
label |
a vector containing the label of the group of each input variable. |
S_B |
the block-diagonal estimated covariance matrix of the inputs. |
Shapley |
a vector containing all the estimated Shapley effects. |
Baptiste Broto, CEA LIST
B. Broto, F. Bachoc, L. Clouvel and J-M Martinez, 2022,Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis, SIAM/ASA Journal on Uncertainty Quantification, 10, 379–403.
B. Broto, F. Bachoc, M. Depecker, and J-M. Martinez, 2019, Sensitivity indices for independent groups of variables, Mathematics and Computers in Simulation, 163, 19–31.
B. Iooss and C. Prieur, 2019, Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol' indices, numerical estimation and applications, International Journal of Uncertainty Quantification, 9, 493–514.
A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.
shapleyLinearGaussian, shapleyPermEx, shapleyPermRand, shapleySubsetMc
# packages for the plots of the matrices library(gplots) library(graphics) # the following function improves the plots of the matrices sig=function(x,alpha=0.4) { return(1/(1+exp(-x/alpha))) } # 1) we generate the parameters by groups in order K=4 # number or groups pk=rep(0,K) for(k in 1:K) { pk[k]=round(6+4*runif(1)) } p=sum(pk) Sigma_ord=matrix(0,nrow=p, ncol=p) ind_min=0 L=5 for(k in 1:K) { p_k=pk[k] ind=ind_min+(1:p_k) ind_min=ind_min+p_k A=2*matrix(runif(p_k*L),nrow=L,ncol=p_k)-1 Sigma_ord[ind,ind]=t(A)%*%A + 0.2*diag(rep(1,p_k)) } image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma_ord),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(Sigma["order"]), cex.main=3,ylab = "", xlab = "",axes=FALSE) box() Beta_ord=3*runif(p)+1 eta_ord=shapleyLinearGaussian(Beta=Beta_ord, Sigma=Sigma_ord) barplot(eta_ord,main=expression(eta["order"]),cex.axis = 2,cex.main=3) # 2) We sample the input variables to get an input vector more general samp=sample(1:p) Sigma=Sigma_ord[samp,samp] image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(Sigma), cex.main=3,ylab = "",xlab = "",axes=FALSE) box() Beta=Beta_ord[samp] eta=shapleyLinearGaussian(Beta=Beta, Sigma=Sigma) barplot(eta,main=expression(eta),cex.axis = 2,cex.main=3) # 3) We generate the observations with these parameters n=5*p #sample size C=chol(Sigma) X0=matrix(rnorm(p*n),ncol=p) X=X0%*%C S=var(X) #empirical covariance matrix image((0:p)+0.5,(0:p)+0.5,z=sig(S),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(S), cex.main=3,ylab = "", xlab = "",axes=FALSE) box() beta0=rnorm(1) Y=X%*%as.matrix(Beta)+beta0+0.2*rnorm(p) # 4) We estimate the block-diagonal covariance matrix # and the Shapley effects using the observations # We assume that we know that the groups are smaller than 15 Estim=shapleyBlockEstimationX(X,Y,delta=3/4, M=15, tol=10^(-6)) eta_hat=Estim$Shapley S_B=Estim$S_B image((0:p)+0.5,(0:p)+0.5,z=sig(S_B),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(S[hat(B)]), cex.main=3,ylab = "",xlab = "",axes=FALSE) box() barplot(eta_hat,main=expression(hat(eta)),cex.axis = 2,cex.main=3) sum(abs(eta_hat-eta))
# packages for the plots of the matrices library(gplots) library(graphics) # the following function improves the plots of the matrices sig=function(x,alpha=0.4) { return(1/(1+exp(-x/alpha))) } # 1) we generate the parameters by groups in order K=4 # number or groups pk=rep(0,K) for(k in 1:K) { pk[k]=round(6+4*runif(1)) } p=sum(pk) Sigma_ord=matrix(0,nrow=p, ncol=p) ind_min=0 L=5 for(k in 1:K) { p_k=pk[k] ind=ind_min+(1:p_k) ind_min=ind_min+p_k A=2*matrix(runif(p_k*L),nrow=L,ncol=p_k)-1 Sigma_ord[ind,ind]=t(A)%*%A + 0.2*diag(rep(1,p_k)) } image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma_ord),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(Sigma["order"]), cex.main=3,ylab = "", xlab = "",axes=FALSE) box() Beta_ord=3*runif(p)+1 eta_ord=shapleyLinearGaussian(Beta=Beta_ord, Sigma=Sigma_ord) barplot(eta_ord,main=expression(eta["order"]),cex.axis = 2,cex.main=3) # 2) We sample the input variables to get an input vector more general samp=sample(1:p) Sigma=Sigma_ord[samp,samp] image((0:p)+0.5,(0:p)+0.5,z=sig(Sigma),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(Sigma), cex.main=3,ylab = "",xlab = "",axes=FALSE) box() Beta=Beta_ord[samp] eta=shapleyLinearGaussian(Beta=Beta, Sigma=Sigma) barplot(eta,main=expression(eta),cex.axis = 2,cex.main=3) # 3) We generate the observations with these parameters n=5*p #sample size C=chol(Sigma) X0=matrix(rnorm(p*n),ncol=p) X=X0%*%C S=var(X) #empirical covariance matrix image((0:p)+0.5,(0:p)+0.5,z=sig(S),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(S), cex.main=3,ylab = "", xlab = "",axes=FALSE) box() beta0=rnorm(1) Y=X%*%as.matrix(Beta)+beta0+0.2*rnorm(p) # 4) We estimate the block-diagonal covariance matrix # and the Shapley effects using the observations # We assume that we know that the groups are smaller than 15 Estim=shapleyBlockEstimationX(X,Y,delta=3/4, M=15, tol=10^(-6)) eta_hat=Estim$Shapley S_B=Estim$S_B image((0:p)+0.5,(0:p)+0.5,z=sig(S_B),col=cm.colors(100), zlim=c(0,1), ylim=c(p+0.5,0.5), main=expression(S[hat(B)]), cex.main=3,ylab = "",xlab = "",axes=FALSE) box() barplot(eta_hat,main=expression(hat(eta)),cex.axis = 2,cex.main=3) sum(abs(eta_hat-eta))
shapleyLinearGaussian
implements the computation of
the Shapley effects in the linear Gaussian framework, using the linear model
(without the value at zero) and the covariance matrix of the inputs.
It uses the block-diagonal covariance trick of Broto et al. (2019) which allows
to go through high-dimensional cases (nb of inputs > 25).
It gives a warning in case of dim(block) > 25.
shapleyLinearGaussian(Beta, Sigma, tol=10^(-6))
shapleyLinearGaussian(Beta, Sigma, tol=10^(-6))
Beta |
a vector containing the coefficients of the linear model (without the value at zero). |
Sigma |
covariance matrix of the inputs. Has to be positive semi-definite matrix with same size that Beta. |
tol |
a relative tolerance to detect zero singular values of Sigma. |
shapleyLinearGaussian
returns a numeric vector containing all the Shapley effects.
Baptiste Broto
B. Broto, F. Bachoc, M. Depecker, and J-M. Martinez, 2019, Sensitivity indices for independent groups of variables, Mathematics and Computers in Simulation, 163, 19–31.
B. Broto, F. Bachoc, L. Clouvel and J-M Martinez, 2022,Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis, SIAM/ASA Journal on Uncertainty Quantification, 10, 379–403.
B. Iooss and C. Prieur, 2019, Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol' indices, numerical estimation and applications, International Journal for Uncertainty Quantification, 9, 493–514.
A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.
shapleyBlockEstimation, shapleyPermEx, shapleyPermRand, shapleySubsetMc, shapleysobol_knn, johnsonshap
library(MASS) library(igraph) # First example: p=5 #dimension A=matrix(rnorm(p^2),nrow=p,ncol=p) Sigma=t(A)%*%A Beta=runif(p) Shapley=shapleyLinearGaussian(Beta,Sigma) plot(Shapley) # Second Example, block-diagonal: K=5 #number of groups m=5 # number of variables in each group p=K*m Sigma=matrix(0,ncol=p,nrow=p) for(k in 1:K) { A=matrix(rnorm(m^2),nrow=m,ncol=m) Sigma[(m*(k-1)+1):(m*k),(m*(k-1)+1):(m*k)]=t(A)%*%A } # we mix the variables: samp=sample(1:p,p) Sigma=Sigma[samp,samp] Beta=runif(p) Shapley=shapleyLinearGaussian(Beta,Sigma) plot(Shapley)
library(MASS) library(igraph) # First example: p=5 #dimension A=matrix(rnorm(p^2),nrow=p,ncol=p) Sigma=t(A)%*%A Beta=runif(p) Shapley=shapleyLinearGaussian(Beta,Sigma) plot(Shapley) # Second Example, block-diagonal: K=5 #number of groups m=5 # number of variables in each group p=K*m Sigma=matrix(0,ncol=p,nrow=p) for(k in 1:K) { A=matrix(rnorm(m^2),nrow=m,ncol=m) Sigma[(m*(k-1)+1):(m*k),(m*(k-1)+1):(m*k)]=t(A)%*%A } # we mix the variables: samp=sample(1:p,p) Sigma=Sigma[samp,samp] Beta=runif(p) Shapley=shapleyLinearGaussian(Beta,Sigma) plot(Shapley)
shapleyPermEx
implements the Monte Carlo estimation of
the Shapley effects (Owen, 2014) and their standard errors by examining all
permutations of inputs (Song et al., 2016; Iooss and Prieur, 2019). It also
estimates full first order and independent total Sobol' indices
(Mara et al., 2015). The function also allows the estimations of all these
sensitivity indices in case of dependent inputs. The total cost of this
algorithm is
model evaluations.
shapleyPermEx(model = NULL, Xall, Xset, d, Nv, No, Ni = 3, colnames = NULL, ...) ## S3 method for class 'shapleyPermEx' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'shapleyPermEx' print(x, ...) ## S3 method for class 'shapleyPermEx' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'shapleyPermEx' ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL, ..., environment = parent.frame())
shapleyPermEx(model = NULL, Xall, Xset, d, Nv, No, Ni = 3, colnames = NULL, ...) ## S3 method for class 'shapleyPermEx' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'shapleyPermEx' print(x, ...) ## S3 method for class 'shapleyPermEx' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'shapleyPermEx' ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL, ..., environment = parent.frame())
model |
a function, or a model with a |
Xall |
Xall(n) is a function to generate a n-sample of a d-dimensional input vector (following the required joint distribution). |
Xset |
Xset(n, Sj, Sjc, xjc) is a function to generate a n-sample of a d-dimensional input vector corresponding to the indices in Sj conditional on the input values xjc with the index set Sjc (following the required joint distribution). |
d |
number of inputs. |
Nv |
Monte Carlo sample size to estimate the output variance. |
No |
Outer Monte Carlo sample size to estimate the expectation of the conditional variance of the model output. |
Ni |
Inner Monte Carlo sample size to estimate the conditional variance of the model output. |
colnames |
Optional: A vector containing the names of the inputs. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
title |
a title of the plot with ggplot() function. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
This function requires R package "gtools".
The default values Ni = 3 is the optimal one obtained by the theoretical analysis of Song et al., 2016.
The computations of the standard errors (and then the confidence intervals) come from Iooss and prieur (2019). Based on the outer Monte carlo loop (calculation of expectation of conditional variance), the variance of the Monte carlo estimate is divided by No. The standard error is then averaged over the exact permutation loop. The confidence intervals at 95% correspond to +- 1.96 standard deviations.
shapleyPermEx
returns a list of class "shapleyPermEx"
, containing
all the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used. |
E |
the estimation of the ouput mean. |
V |
the estimation of the ouput variance. |
Shapley |
the estimations of the Shapley effects. |
SobolS |
the estimations of the full first-order Sobol' indices. |
SobolT |
the estimations of the independent total sensitivity Sobol' indices. |
Users can ask more ouput variables with the argument return.var
(for example, the list of permutations perms
).
Bertrand Iooss, Eunhye Song, Barry L. Nelson, Jeremy Staum
B. Iooss and C. Prieur, 2019, Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol' indices, numerical estimation and applications, International Journal for Uncertainty Quantification, 9, 493–514.
T. Mara, S. Tarantola, P. Annoni, 2015, Non-parametric methods for global sensitivity analysis of model output with dependent inputs, Environmental Modeling & Software 72, 173–183.
A.B. Owen, 2014, Sobol' indices and Shapley value, SIAM/ASA Journal of Uncertainty Quantification, 2, 245–251.
A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.
E. Song, B.L. Nelson, and J. Staum, 2016, Shapley effects for global sensitivity analysis: Theory and computation, SIAM/ASA Journal of Uncertainty Quantification, 4, 1060–1083.
shapleyPermRand, shapleyLinearGaussian, shapleySubsetMc, shapleysobol_knn
, lmg
################################## # Test case : the Ishigami function (3 uniform independent inputs) # See Iooss and Prieur (2019) library(gtools) d <- 3 Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d) Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj)) x <- shapleyPermEx(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3) print(x) plot(x) library(ggplot2) ggplot(x) ################################## # Test case : Linear model (3 Gaussian inputs including 2 dependent) # See Iooss and Prieur (2019) library(ggplot2) library(gtools) library(mvtnorm) # Multivariate Gaussian variables library(condMVNorm) # Conditional multivariate Gaussian variables modlin <- function(X) apply(X,1,sum) d <- 3 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) Xset <- function(n, Sj, Sjc, xjc){ if (is.null(Sjc)){ if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj])) } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])} } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, X.given=xjc)}} x <- shapleyPermEx(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3) print(x) ggplot(x)
################################## # Test case : the Ishigami function (3 uniform independent inputs) # See Iooss and Prieur (2019) library(gtools) d <- 3 Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d) Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj)) x <- shapleyPermEx(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3) print(x) plot(x) library(ggplot2) ggplot(x) ################################## # Test case : Linear model (3 Gaussian inputs including 2 dependent) # See Iooss and Prieur (2019) library(ggplot2) library(gtools) library(mvtnorm) # Multivariate Gaussian variables library(condMVNorm) # Conditional multivariate Gaussian variables modlin <- function(X) apply(X,1,sum) d <- 3 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) Xset <- function(n, Sj, Sjc, xjc){ if (is.null(Sjc)){ if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj])) } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])} } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, X.given=xjc)}} x <- shapleyPermEx(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e4, No = 1e3, Ni = 3) print(x) ggplot(x)
shapleyPermRand
implements the Monte Carlo estimation of
the Shapley effects (Owen, 2014) and their standard errors by randomly sampling
permutations of inputs (Song et al., 2016). It also estimates full first order
and independent total Sobol' indices (Mara et al., 2015), and their standard errors.
The function also allows the estimations of all these sensitivity indices in case
of dependent inputs.
The total cost of this algorithm is model evaluations.
shapleyPermRand(model = NULL, Xall, Xset, d, Nv, m, No = 1, Ni = 3, colnames = NULL, ...) ## S3 method for class 'shapleyPermRand' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'shapleyPermRand' print(x, ...) ## S3 method for class 'shapleyPermRand' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'shapleyPermRand' ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL, ..., environment = parent.frame())
shapleyPermRand(model = NULL, Xall, Xset, d, Nv, m, No = 1, Ni = 3, colnames = NULL, ...) ## S3 method for class 'shapleyPermRand' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'shapleyPermRand' print(x, ...) ## S3 method for class 'shapleyPermRand' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'shapleyPermRand' ggplot(data, mapping = aes(), ylim = c(0, 1), title = NULL, ..., environment = parent.frame())
model |
a function, or a model with a |
Xall |
Xall(n) is a function to generate a n-sample of a d-dimensional input vector (following the required joint distribution). |
Xset |
Xset(n, Sj, Sjc, xjc) is a function to generate a n-sample of a d-dimensional input vector corresponding to the indices in Sj conditional on the input values xjc with the index set Sjc (following the required joint distribution). |
d |
number of inputs. |
Nv |
Monte Carlo sample size to estimate the output variance. |
m |
Number of randomly sampled permutations. |
No |
Outer Monte Carlo sample size to estimate the expectation of the conditional variance of the model output. |
Ni |
Inner Monte Carlo sample size to estimate the conditional variance of the model output. |
colnames |
Optional: A vector containing the names of the inputs. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
title |
a title of the plot with ggplot() function. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
This function requires R package "gtools".
The default values No = 1 and Ni = 3 are the optimal ones obtained by the theoretical analysis of Song et al., 2016.
The computations of the standard errors do not consider the samples to estimate expectation of conditional variances. They are only made regarding the random permutations and are based on the variance of the Monte carlo estimates divided by m. The confidence intervals at 95% correspond to +- 1.96 standard deviations.
shapleyPermRand
returns a list of class "shapleyPermRand"
, containing
all the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used. |
E |
the estimation of the ouput mean. |
V |
the estimation of the ouput variance. |
Shapley |
the estimations of the Shapley effects. |
SobolS |
the estimations of the full first-order Sobol' indices. |
SobolT |
the estimations of the independent total sensitivity Sobol' indices. |
Users can ask more ouput variables with the argument return.var
(for example, the list of permutations perms
).
Bertrand Iooss, Eunhye Song, Barry L. Nelson, Jeremy Staum
B. Iooss and C. Prieur, 2019, Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol' indices, numerical estimation and applications, International Journal of Uncertainty Quantification, 9, 493–514.
T. Mara, S. Tarantola, P. Annoni, 2015, Non-parametric methods for global sensitivity analysis of model output with dependent inputs, Environmental Modeling & Software 72, 173–183.
A.B. Owen, 2014, Sobol' indices and Shapley value, SIAM/ASA Journal of Uncertainty Quantification, 2, 245–251.
A.B. Owen and C. Prieur, 2016, On Shapley value for measuring importance of dependent inputs, SIAM/ASA Journal of Uncertainty Quantification, 5, 986–1002.
E. Song, B.L. Nelson, and J. Staum, 2016, Shapley effects for global sensitivity analysis: Theory and computation, SIAM/ASA Journal of Uncertainty Quantification, 4, 1060–1083.
shapleyPermEx, shapleyLinearGaussian, shapleySubsetMc, shapleysobol_knn
################################## # Test case : the Ishigami function # See Iooss and Prieur (2019) library(gtools) d <- 3 Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d) Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj)) x <- shapleyPermRand(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, m=1e4, No = 1, Ni = 3) print(x) plot(x) library(ggplot2) ggplot(x) ################################## # Test case : Linear model (3 Gaussian inputs including 2 dependent) # See Iooss and Prieur (2019) library(ggplot2) library(gtools) library(mvtnorm) # Multivariate Gaussian variables library(condMVNorm) # Conditional multivariate Gaussian variables modlin <- function(X) apply(X,1,sum) d <- 3 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) Xset <- function(n, Sj, Sjc, xjc){ if (is.null(Sjc)){ if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj])) } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])} } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, X.given=xjc)}} m <- 1e3 # put m)1e4 for more precised results x <- shapleyPermRand(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e3, m = m, No = 1, Ni = 3) print(x) ggplot(x)
################################## # Test case : the Ishigami function # See Iooss and Prieur (2019) library(gtools) d <- 3 Xall <- function(n) matrix(runif(d*n,-pi,pi),nc=d) Xset <- function(n, Sj, Sjc, xjc) matrix(runif(n*length(Sj),-pi,pi),nc=length(Sj)) x <- shapleyPermRand(model = ishigami.fun, Xall=Xall, Xset=Xset, d=d, Nv=1e4, m=1e4, No = 1, Ni = 3) print(x) plot(x) library(ggplot2) ggplot(x) ################################## # Test case : Linear model (3 Gaussian inputs including 2 dependent) # See Iooss and Prieur (2019) library(ggplot2) library(gtools) library(mvtnorm) # Multivariate Gaussian variables library(condMVNorm) # Conditional multivariate Gaussian variables modlin <- function(X) apply(X,1,sum) d <- 3 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) Xset <- function(n, Sj, Sjc, xjc){ if (is.null(Sjc)){ if (length(Sj) == 1){ rnorm(n,mu[Sj],sqrt(Covmat[Sj,Sj])) } else{ mvtnorm::rmvnorm(n,mu[Sj],Covmat[Sj,Sj])} } else{ condMVNorm::rcmvnorm(n, mu, Covmat, dependent.ind=Sj, given.ind=Sjc, X.given=xjc)}} m <- 1e3 # put m)1e4 for more precised results x <- shapleyPermRand(model = modlin, Xall=Xall, Xset=Xset, d=d, Nv=1e3, m = m, No = 1, Ni = 3) print(x) ggplot(x)
shapleysobol_knn
implements the estimation of several sensitivity indices using
only N model evaluations via ranking (following Gamboa et al. (2020) and Chatterjee (2019))
or nearest neighbour search (Broto et al. (2020) and Azadkia & Chatterjee (2020)).
Parallelized computations are possible to accelerate the estimation process.
It can be used with categorical inputs (which are transformed with one-hot encoding),
dependent inputs and multiple outputs. Sensitivity indices of any group of inputs can be computed,
which means that in particular (full) first-order, (independent) total Sobol indices
and Shapley effects are accessible. For large sample sizes, the nearest neightbour algorithm
can be significantly accelerated by using approximate nearest neighbour search.
It is also possible to estimate Shapley effects with the random permutation approach of
Castro et al.(2009), where all the terms are obtained with ranking or nearest neighbours.
shapleysobol_knn(model=NULL, X, method = "knn", n.knn = 2, n.limit = 2000, U = NULL, n.perm = NULL, noise = F, rescale = F, nboot = NULL, boot.level = 0.8, conf=0.95, parl=NULL, ...) ## S3 method for class 'shapleysobol_knn' tell(x, y, ...) ## S3 method for class 'shapleysobol_knn' extract(x, ...) ## S3 method for class 'shapleysobol_knn' print(x, ...) ## S3 method for class 'shapleysobol_knn' plot(x, ylim = c(0,1), ...) ## S3 method for class 'shapleysobol_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame()) ## S3 method for class 'sobol_knn' print(x, ...) ## S3 method for class 'sobol_knn' plot(x, ylim = c(0,1), ...)
shapleysobol_knn(model=NULL, X, method = "knn", n.knn = 2, n.limit = 2000, U = NULL, n.perm = NULL, noise = F, rescale = F, nboot = NULL, boot.level = 0.8, conf=0.95, parl=NULL, ...) ## S3 method for class 'shapleysobol_knn' tell(x, y, ...) ## S3 method for class 'shapleysobol_knn' extract(x, ...) ## S3 method for class 'shapleysobol_knn' print(x, ...) ## S3 method for class 'shapleysobol_knn' plot(x, ylim = c(0,1), ...) ## S3 method for class 'shapleysobol_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame()) ## S3 method for class 'sobol_knn' print(x, ...) ## S3 method for class 'sobol_knn' plot(x, ylim = c(0,1), ...)
model |
a function defining the model to analyze, taking X as an argument. |
X |
a matrix or data frame containing the observed inputs. |
method |
the algorithm to be used for estimation, either "rank" or "knn",
see details. Default is |
n.knn |
the number of nearest neighbours used for estimation. |
n.limit |
sample size limit above which approximate nearest neighbour search is activated. |
U |
an integer equal to 0 (total Sobol indices) or 1 (first-order Sobol indices)
or a list of vector indices defining the subsets of inputs whose sensitivity indices
must be computed or a matrix of 0s and 1s where each row encodes a subset of inputs
whose sensitivity indices must be computed (see examples). Default value is |
n.perm |
an integer, indicating the number of random permutations used
for the Shapley effects' estimation. Default is |
noise |
a logical which is TRUE if the model or the output sample is noisy. See details. |
rescale |
a logical indicating if continuous inputs must be rescaled before distance computations.
If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure
(cf. whiten() function in package |
nboot |
the number of bootstrap resamples for the bootstrap estimate of confidence intervals. See details. |
boot.level |
a numeric between 0 and 1 for the proportion of the bootstrap sample size. |
conf |
the confidence level of the bootstrap confidence intervals. |
parl |
number of cores on which to parallelize the computation. If
|
x |
the object returned by |
data |
the object returned by |
y |
a numeric univariate vector containing the observed outputs. |
ylim |
the y-coordinate limits for plotting. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
additional arguments to be passed to |
For method="rank"
, the estimator is defined in Gamboa et al. (2020)
following Chatterjee (2019). For first-order indices it is based on an input
ranking (same algorithm as in sobolrank
) while for higher orders,
it uses an approximate heuristic solution of the traveling salesman problem
applied to the input sample distances (cf. TSP() function in package
TSP
). For method="knn"
, ranking and TSP are replaced by a
nearest neighbour search as proposed in Broto et al. (2020) and in Azadkia
& Chatterjee (2020) for a similar coefficient.
The computation is done using the subset procedure, defined in Broto, Bachoc and Depecker (2020), that is computing all the Sobol' closed indices for all possible sub-models first, and then affecting the Shapley weights.
It is the same algorithm as sobolshap_knn
with method = "knn"
with a slight computational improvement (the search for weight affectations is
done on much smaller matrices, stored in a list indexed by their order), and
ability to perform parallel computation and boostrap confidence interval
estimates.
Since boostrap creates ties which are not accounted for in the algorithm,
confidence intervals are obtained by sampling without replacement with a
proportion of the total sample size boot.level
, drawn uniformly.
If the outputs are noisy, the argument noise
can be used: it only has
an impact on the estimation of one specific sensitivity index, namely
. If there is no noise this index is equal
to 1, while in the presence of noise it must be estimated.
The distance used for subsets with mixed inputs (continuous and categorical) is the Euclidean distance, thanks to a one-hot encoding of categorical inputs.
If too many cores for the machine are passed on to the parl
argument,
the chosen number of cores is defaulted to the available cores minus one.
If argument U
is specified, only the estimated first-order or total
Sobol' indices are returned, or the estimated closed Sobol' indices for the
selected subsets. The Shapley effects are not computed, and thus, not returned.
The extract
method can be used for extracting first-order and total
Sobol' indices, after the Shapley effects have been computed. It returns a list
containing both sensitivity indices.
shapleysobol_knn
returns a list of class "shapleysobol_knn"
if U=NULL
,
containing the following components:
call |
the matched call. |
Shap |
the estimations of the Shapley effect indices. |
VE |
the estimations of the closed Sobol' indices for all possible sub-models. |
indices |
list of all subsets corresponding to the structure of VE. |
method |
which estimation method has been used. |
n.perm |
number of random permutations. |
w |
the Shapley weights. |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
n.knn |
value of the |
n.limit |
value of the |
U |
value of the |
rescale |
wheter the design matrix has been rescaled. |
n.limit |
maximum number of sample before nearest-neighbor approximation. |
boot.level |
value of the |
noise |
wheter the Shapley values must sum up to one or not. |
boot |
logical, wheter bootstrap confidence interval estimates have been performed. |
nboot |
value of the |
parl |
value of the |
conf |
value of the |
shapleysobol_knn
returns a list of class "sobol_knn"
if U
,
is specified, containing the following components:
call |
the matched call. |
Sobol |
the estimations of the Sobol' indices. |
indices |
list of all subsets corresponding to the structure of VE. |
method |
which estimation method has been used. |
conf_int |
a matrix containing the estimations, biais and confidence
intervals by bootstrap (if |
X |
the observed covariates. |
y |
the observed outcomes. |
U |
value of the |
n.knn |
value of the |
rescale |
wheter the design matrix has been rescaled. |
n.limit |
value of the |
boot.level |
value of the |
boot |
logical, wheter bootstrap confidence interval estimates have been performed. |
nboot |
value of the |
parl |
value of the |
conf |
value of the |
Marouane Il Idrissi, Sebastien Da Veiga
Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.
Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.
Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.
Broto B., Bachoc F. and Depecker M. (2020) Variance Reduction for Estimation of Shapley Effects and Adaptation to Unknown Input Distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2).
Castro J., Gomez D, Tejada J. (2009). Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726-1730.
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs. Environmental Modelling & Software, 143, 105115.
M. Il Idrissi, V. Chabridon and B. Iooss (2021). Mesures d'importance relative par decompositions de la performance de modeles de regression, Preprint, 52emes Journees de Statistiques de la Societe Francaise de Statistique (SFdS), pp. 497-502, Nice, France, Juin 2021
sobolrank
, sobolshap_knn
, shapleyPermEx
,
shapleySubsetMc
, johnsonshap
, lmg
, pme_knn
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(RANN) ########################################################### # Linear Model with Gaussian correlated inputs library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-shapleysobol_knn(model=NULL, X=X, n.knn=3, noise=TRUE) tell(x,y) print(x) plot(x) #Using the extract method to get first-order and total Sobol' indices extract(x) # With Boostrap confidence intervals x<-shapleysobol_knn(model=NULL, X=X, nboot=10, n.knn=3, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################### # Extracting Sobol' indices with Bootstrap confidence intervals nboot <- 10 # put nboot=50 for consistency #Total Sobol' indices x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=0, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) #First-order Sobol' indices x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=1, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) #Closed Sobol' indices for specific subsets (list) x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=list(c(1,2), c(1,2,3), 2), noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################################################### # Test case: the non-monotonic Sobol g-function # Example with a call to a numerical model # First compute first-order indices with ranking n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- shapleysobol_knn(model = sobol.fun, X = X, U = 1, method = "rank") print(x) plot(x) library(ggplot2) ; ggplot(x) # We can use the output sample generated for this estimation to compute total indices # without additional calls to the model x2 <- shapleysobol_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5) tell(x2,x$y) plot(x2) ggplot(x2) ##################################################### # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- shapleysobol_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, n.limit = 2000) tell(x,Y) plot(x) library(ggplot2) ; ggplot(x) # Extract first-order and total Sobol indices x1 <- extract(x) ; print(x1) ###################################################### # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- shapleysobol_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, rescale = TRUE, n.limit = 2000) print(x) plot(x)
library(parallel) library(doParallel) library(foreach) library(gtools) library(boot) library(RANN) ########################################################### # Linear Model with Gaussian correlated inputs library(mvtnorm) set.seed(1234) n <- 1000 beta<-c(1,-1,0.5) sigma<-matrix(c(1,0,0, 0,1,-0.8, 0,-0.8,1), nrow=3, ncol=3) X <-rmvnorm(n, rep(0,3), sigma) colnames(X)<-c("X1","X2", "X3") y <- X%*%beta + rnorm(n,0,2) # Without Bootstrap confidence intervals x<-shapleysobol_knn(model=NULL, X=X, n.knn=3, noise=TRUE) tell(x,y) print(x) plot(x) #Using the extract method to get first-order and total Sobol' indices extract(x) # With Boostrap confidence intervals x<-shapleysobol_knn(model=NULL, X=X, nboot=10, n.knn=3, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################### # Extracting Sobol' indices with Bootstrap confidence intervals nboot <- 10 # put nboot=50 for consistency #Total Sobol' indices x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=0, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) #First-order Sobol' indices x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=1, noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) #Closed Sobol' indices for specific subsets (list) x<-shapleysobol_knn(model=NULL, X=X, nboot=nboot, n.knn=3, U=list(c(1,2), c(1,2,3), 2), noise=TRUE, boot.level=0.7, conf=0.95) tell(x,y) print(x) plot(x) ##################################################### # Test case: the non-monotonic Sobol g-function # Example with a call to a numerical model # First compute first-order indices with ranking n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- shapleysobol_knn(model = sobol.fun, X = X, U = 1, method = "rank") print(x) plot(x) library(ggplot2) ; ggplot(x) # We can use the output sample generated for this estimation to compute total indices # without additional calls to the model x2 <- shapleysobol_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5) tell(x2,x$y) plot(x2) ggplot(x2) ##################################################### # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- shapleysobol_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, n.limit = 2000) tell(x,Y) plot(x) library(ggplot2) ; ggplot(x) # Extract first-order and total Sobol indices x1 <- extract(x) ; print(x1) ###################################################### # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- shapleysobol_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, rescale = TRUE, n.limit = 2000) print(x) plot(x)
shapleySubsetMc
implements the estimation of
the Shapley effects from data using some nearest neighbors method
to generate according to the conditional distributions of the inputs.
It can be used with categorical inputs.
shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, noise=FALSE) ## S3 method for class 'shapleySubsetMc' plot(x, ylim = c(0, 1), ...)
shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL, noise=FALSE) ## S3 method for class 'shapleySubsetMc' plot(x, ylim = c(0, 1), ...)
X |
a matrix or a dataframe of the input sample |
Y |
a vector of the output sample |
Ntot |
an integer of the approximate cost wanted |
Ni |
the number of nearest neighbours taken for each point |
cat |
a vector giving the indices of the input categorical variables |
weight |
a vector with the same length of |
discrete |
a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values |
noise |
logical. If FALSE (the default), the variable Y is a function of X |
x |
a list of class |
ylim |
y-coordinate plotting limits |
... |
any other arguments for plotting |
If weight = NULL
, all the categorical variables will have the same weight 1.
If Ntot = NULL
, the nearest neighbours will be compute for all the points,
where n is the length of the sample. The estimation can be very long with this parameter.
shapleySubsetMc
returns a list of class "shapleySubsetMc"
,
containing:
shapley |
the Shapley effects estimates. |
cost |
the real total cost of these estimates: the total number of points for which the nearest neighbours were computed. |
names |
the labels of the input variables. |
Baptiste Broto
B. Broto, F. Bachoc, M. Depecker, 2020, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.
shapleyPermEx, shapleyPermRand, shapleyLinearGaussian, sobolrank, shapleysobol_knn
# First example: the linear Gaussian framework # we generate a covariance matrice Sigma p <- 4 #dimension A <- matrix(rnorm(p^2),nrow=p,ncol=p) Sigma <- t(A)%*%A # it means t(A)%*%A C <- chol(Sigma) n <- 500 #sample size (put n=2000 for more consistency) Z=matrix(rnorm(p*n),nrow=n,ncol=p) X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma Y=rowSums(X) Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000) plot(Shap) #Second example: The Sobol model with heterogeneous inputs p=8 #dimension A=matrix(rnorm(p^2),nrow=p,ncol=p) Sigma=t(A)%*%A C=chol(Sigma) n=500 #sample size (put n=5000 for more consistency) Z=matrix(rnorm(p*n),nrow=n,ncol=p) X=Z #we create discrete and categorical variables X[,1]=round(X[,1]/2) X[,2]=X[,2]>2 X[,4]=-2*round(X[,4])+4 X[(X[,6]>0 &X[,6]<1),6]=1 cat=c(1,2) # we choose to take X1 and X2 as categorical variables # (with the discrete distance) discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value Y=sobol.fun(X) Ntot <- 2000 # put Ntot=20000 for more consistency Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10) plot(Shap)
# First example: the linear Gaussian framework # we generate a covariance matrice Sigma p <- 4 #dimension A <- matrix(rnorm(p^2),nrow=p,ncol=p) Sigma <- t(A)%*%A # it means t(A)%*%A C <- chol(Sigma) n <- 500 #sample size (put n=2000 for more consistency) Z=matrix(rnorm(p*n),nrow=n,ncol=p) X=Z%*%C # X is a gaussian vector with zero mean and covariance Sigma Y=rowSums(X) Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000) plot(Shap) #Second example: The Sobol model with heterogeneous inputs p=8 #dimension A=matrix(rnorm(p^2),nrow=p,ncol=p) Sigma=t(A)%*%A C=chol(Sigma) n=500 #sample size (put n=5000 for more consistency) Z=matrix(rnorm(p*n),nrow=n,ncol=p) X=Z #we create discrete and categorical variables X[,1]=round(X[,1]/2) X[,2]=X[,2]>2 X[,4]=-2*round(X[,4])+4 X[(X[,6]>0 &X[,6]<1),6]=1 cat=c(1,2) # we choose to take X1 and X2 as categorical variables # (with the discrete distance) discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value Y=sobol.fun(X) Ntot <- 2000 # put Ntot=20000 for more consistency Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete, Ntot=Ntot, Ni=10) plot(Shap)
sobol
implements the Monte Carlo estimation of
the Sobol' sensitivity indices (standard estimator). This method allows the estimation of
the indices of the variance decomposition, sometimes referred to as
functional ANOVA decomposition, up to a given order, at a total cost
of where
is the number
of indices to estimate. This function allows also the estimation of
the so-called subset (or group) indices, i.e. the first-order indices with respect to
single multidimensional inputs.
sobol(model = NULL, X1, X2, order = 1, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol' print(x, ...) ## S3 method for class 'sobol' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobol(model = NULL, X1, X2, order = 1, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol' print(x, ...) ## S3 method for class 'sobol' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
order |
either an integer, the maximum order in the ANOVA decomposition (all indices up to this order will be computed), or a list of numeric vectors, the multidimensional compounds of the wanted subset indices. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
sobol
returns a list of class "sobol"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
V |
the estimations of Variances of the Conditional Expectations (VCE) with respect to one factor or one group of factors. |
D |
the estimations of the terms of the ANOVA decomposition (not for subset indices). |
S |
the estimations of the Sobol' sensitivity indices (not for subset indices). |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
D.boot
and S.boot
).
Gilles Pujol
I. M. Sobol, 1993, Sensitivity analysis for non-linear mathematical model, Math. Modelling Comput. Exp., 1, 407–414.
sobol2002, sobolSalt, sobol2007, soboljansen,
sobolmartinez
,sobolEff, sobolSmthSpl, sobolmara,
sobolroalhs, fast99, sobolGP
,sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol(model = sobol.fun, X1 = X1, X2 = X2, order = 2, nboot = 100) print(x) #plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol(model = sobol.fun, X1 = X1, X2 = X2, order = 2, nboot = 100) print(x) #plot(x) library(ggplot2) ggplot(x)
sobol2002
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices at the same
time (alltogether indices), at a total cost of
model evaluations. These are called the Saltelli estimators.
sobol2002(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol2002' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol2002' print(x, ...) ## S3 method for class 'sobol2002' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2002' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2002' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobol2002(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol2002' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol2002' print(x, ...) ## S3 method for class 'sobol2002' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2002' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2002' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
BE CAREFUL! This estimator suffers from a conditioning problem when estimating
the variances behind the indices computations. This can seriously affect the
Sobol' indices estimates in case of largely non-centered output. To avoid this
effect, you have to center the model output before applying "sobol2002"
.
Functions "sobolEff"
, "soboljansen"
and "sobolmartinez"
do not suffer from this problem.
sobol2002
returns a list of class "sobol2002"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used |
V |
the estimations of Variances of the Conditional Expectations
(VCE) with respect to each factor and also with respect to the
complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
S.boot
and T.boot
).
Gilles Pujol
A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Computer Physics Communication, 145, 580–297.
sobol, sobolSalt, sobol2007, soboljansen, sobolmartinez, sobolEff, sobolmara, sobolGP, sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol2002(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol2002(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
sobol2007
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices at the same
time (alltogether indices), at a total cost of
model evaluations. These are called the Mauntz estimators.
sobol2007(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol2007' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol2007' print(x, ...) ## S3 method for class 'sobol2007' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2007' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2007' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobol2007(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobol2007' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobol2007' print(x, ...) ## S3 method for class 'sobol2007' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2007' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobol2007' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
This estimator is good for small first-order and total indices.
BE CAREFUL! This estimator suffers from a conditioning problem when estimating
the variances behind the indices computations. This can seriously affect the
Sobol' indices estimates in case of largely non-centered output. To avoid this
effect, you have to center the model output before applying "sobol2007"
.
Functions "sobolEff"
, "soboljansen"
and "sobolmartinez"
do not suffer from this problem.
sobol2007
returns a list of class "sobol2007"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used |
V |
the estimations of Variances of the Conditional Expectations
(VCE) with respect to each factor and also with respect to the
complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
S.boot
and T.boot
).
Bertrand Iooss
I.M. Sobol, S. Tarantola, D. Gatelli, S.S. Kucherenko and W. Mauntz, 2007, Estimating the approximation errors when fixing unessential factors in global sensitivity analysis, Reliability Engineering and System Safety, 92, 957–960.
A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto and S. Tarantola, 2010, Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index, Computer Physics Communications 181, 259–270.
sobol, sobol2002, sobolSalt, soboljansen, sobolmartinez, sobolEff, sobolmara, sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol2007(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobol2007(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
sobolEff
implements the Monte Carlo estimation of the Sobol' sensitivity indices using the asymptotically efficient formulas in section 4.2.4.2 of Monod et al. (2006). Either all first-order indices or all total-effect indices are estimated at a cost of model calls or all closed second-order indices are estimated at a cost of
model calls.
sobolEff(model = NULL, X1, X2, order=1, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolEff' tell(x, y = NULL, ...) ## S3 method for class 'sobolEff' print(x, ...) ## S3 method for class 'sobolEff' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolEff' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolEff(model = NULL, X1, X2, order=1, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolEff' tell(x, y = NULL, ...) ## S3 method for class 'sobolEff' print(x, ...) ## S3 method for class 'sobolEff' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolEff' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
order |
an integer specifying the indices to estimate: 0 for total effect indices,1 for first-order indices and 2 for closed second-order indices. |
nboot |
the number of bootstrap replicates, or zero to use asymptotic standard deviation estimates given in Janon et al. (2012). |
conf |
the confidence level for confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
The estimator used by sobolEff is defined in Monod et al. (2006), Section 4.2.4.2 and studied under the name T_N in Janon et al. (2012). This estimator is good for large first-order indices.
sobolEff
returns a list of class "sobolEff"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
S |
the estimations of the Sobol' sensitivity indices. |
Alexandre Janon, Laurent Gilquin
Monod, H., Naud, C., Makowski, D. (2006), Uncertainty and sensitivity analysis for crop models in Working with Dynamic Crop Models: Evaluation, Analysis, Parameterization, and Applications, Elsevier.
A. Janon, T. Klein, A. Lagnoux, M. Nodet, C. Prieur (2014), Asymptotic normality and efficiency of two Sobol index estimators, ESAIM: Probability and Statistics, 18:342-364.
sobol, sobol2002, sobolSalt, sobol2007, soboljansen, sobolmartinez,
sobolSmthSpl
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolEff(model = sobol.fun, X1 = X1, X2 = X2, nboot = 0) print(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolEff(model = sobol.fun, X1 = X1, X2 = X2, nboot = 0) print(x) library(ggplot2) ggplot(x)
Perform a kriging-based global sensitivity analysis taking into account both the meta-model and the Monte-Carlo errors. The Sobol indices are estimated with a Monte-Carlo integration and the true function is substituted by a kriging model. It is built thanks to the function km
of the package DiceKriging
.
The complete conditional predictive distribution of the kriging model is considered (not only the predictive mean).
sobolGP( model, type="SK", MCmethod="sobol", X1, X2, nsim=100, nboot=1, conf = 0.95, sequential = FALSE, candidate, sequential.tot=FALSE, max_iter = 1000) ## S3 method for class 'sobolGP' ask(x, tot = FALSE, ...) ## S3 method for class 'sobolGP' tell(x, y=NULL, xpoint=NULL, newcandidate=NULL, ...) ## S3 method for class 'sobolGP' print(x, ...) ## S3 method for class 'sobolGP' plot(x,...)
sobolGP( model, type="SK", MCmethod="sobol", X1, X2, nsim=100, nboot=1, conf = 0.95, sequential = FALSE, candidate, sequential.tot=FALSE, max_iter = 1000) ## S3 method for class 'sobolGP' ask(x, tot = FALSE, ...) ## S3 method for class 'sobolGP' tell(x, y=NULL, xpoint=NULL, newcandidate=NULL, ...) ## S3 method for class 'sobolGP' print(x, ...) ## S3 method for class 'sobolGP' plot(x,...)
model |
an object of class |
type |
a character string giving the type of the considered kriging model. |
MCmethod |
a character string specifying the Monte-Carlo procedure used to estimate the Sobol indices. The avaible methods are : |
X1 |
a matrix representing the first random sample. |
X2 |
a matrix representing the second random sample. |
nsim |
an integer giving the number of samples for the conditional Gaussian process. It is used to quantify the uncertainty due to the kriging approximation. |
nboot |
an integer representing the number of bootstrap replicates. It is used to quantify the uncertainty due to the Monte-Carlo integrations. We recommend to set |
conf |
a numeric representing the confidence intervals taking into account the uncertainty due to the bootstrap procedure and the Gaussian process samples. |
sequential |
a boolean. If |
candidate |
a matrix representing the candidate points where the best new point to be simulated is selected. The lines represent the points and the columns represent the dimension. |
sequential.tot |
a boolean. If |
max_iter |
a numeric giving the maximal number of iterations for the propagative Gibbs sampler. It is used to simulate the realizations of the Gaussian process. |
x |
an object of class S3 |
tot |
a boolean. If |
xpoint |
a matrix representing a new point added to the kriging model. |
y |
a numeric giving the response of the function at |
newcandidate |
a matrix representing the new candidate points where the best point to be simulated is selected. If |
... |
any other arguments to be passed |
The function ask
provides the new point where the function should be simulated. Furthermore, the function tell
performs a new kriging-based sensitivity analysis when the point x
with the corresponding observation y
is added.
An object of class S3 sobolGP
.
call : a list containing the arguments of the function sobolGP
:
X1 : X1
X2 : X2
conf : conf
nboot : nboot
candidate : candidate
sequential : sequential
max_iter : max_iter
sequential.tot : sequential.tot
model : model
tot : tot
method : MCmethod
type : type
nsim : nsim
S : a list containing the results of the kriging-based sensitivity analysis for the MAIN effects:
mean : a matrix giving the mean of the Sobol index estimates.
var : a matrix giving the variance of the Sobol index estimates.
ci : a matrix giving the confidence intervals of the Sobol index estimates according to conf
.
varPG : a matrix giving the variance of the Sobol index estimates due to the Gaussian process approximation.
varMC : a matrix giving the variance of the Sobol index estimates due to the Monte-Carlo integrations.
xnew : if sequential=TRUE
, a matrix giving the point in candidate
which is the best to simulate.
xnewi : if sequential=TRUE
, an integer giving the index of the point in candidate
which is the best to simulate.
T : a list containing the results of the kriging-based sensitivity analysis for the TOTAL effects:
mean : a matrix giving the mean of the Sobol index estimates.
var : a matrix giving the variance of the Sobol index estimates.
ci : a matrix giving the confidence intervals of the Sobol index estimates according to conf
.
varPG : a matrix giving the variance of the Sobol index estimates due to the Gaussian process approximation.
varMC : a matrix giving the variance of the Sobol index estimates due to the Monte-Carlo integrations.
xnew : if sequential.tot=TRUE
, a matrix giving the point in candidate
which is the best to simulate.
xnewi : if sequential.tot=TRUE
, an integer giving the index of the point in candidate
which is the best to simulate.
Loic Le Gratiet, EDF R&D
L. Le Gratiet, C. Cannamela and B. Iooss (2014), A Bayesian approach for global sensitivity analysis of (multifidelity) computer codes, SIAM/ASA J. Uncertainty Quantification 2-1, pp. 336-363.
sobol
, sobol2002
, sobol2007
, sobolEff
, soboljansen
, sobolMultOut, km
library(DiceKriging) #--------------------------------------# # kriging model building #--------------------------------------# d <- 2; n <- 16 design.fact <- expand.grid(x1=seq(0,1,length=4), x2=seq(0,1,length=4)) y <- apply(design.fact, 1, branin) m <- km(design=design.fact, response=y) #--------------------------------------# # sobol samples & candidate points #--------------------------------------# n <- 1000 X1 <- data.frame(matrix(runif(d * n), nrow = n)) X2 <- data.frame(matrix(runif(d * n), nrow = n)) candidate <- data.frame(matrix(runif(d * 100), nrow = 100)) #--------------------------------------# # Kriging-based Sobol #--------------------------------------# nsim <- 10 # put nsim <- 100 nboot <- 10 # put nboot <- 100 res <- sobolGP( model = m, type="UK", MCmethod="sobol", X1, X2, nsim = nsim, conf = 0.95, nboot = nboot, sequential = TRUE, candidate, sequential.tot=FALSE, max_iter = 1000 ) res plot(res) x <- ask(res) y <- branin(x) # The following line doesn't work (uncorrected bug: # unused argument in km(), passed by update(), eval(), tell.sobolGP() ??) #res.new <- tell(res,y,x) #res.new
library(DiceKriging) #--------------------------------------# # kriging model building #--------------------------------------# d <- 2; n <- 16 design.fact <- expand.grid(x1=seq(0,1,length=4), x2=seq(0,1,length=4)) y <- apply(design.fact, 1, branin) m <- km(design=design.fact, response=y) #--------------------------------------# # sobol samples & candidate points #--------------------------------------# n <- 1000 X1 <- data.frame(matrix(runif(d * n), nrow = n)) X2 <- data.frame(matrix(runif(d * n), nrow = n)) candidate <- data.frame(matrix(runif(d * 100), nrow = 100)) #--------------------------------------# # Kriging-based Sobol #--------------------------------------# nsim <- 10 # put nsim <- 100 nboot <- 10 # put nboot <- 100 res <- sobolGP( model = m, type="UK", MCmethod="sobol", X1, X2, nsim = nsim, conf = 0.95, nboot = nboot, sequential = TRUE, candidate, sequential.tot=FALSE, max_iter = 1000 ) res plot(res) x <- ask(res) y <- branin(x) # The following line doesn't work (uncorrected bug: # unused argument in km(), passed by update(), eval(), tell.sobolGP() ??) #res.new <- tell(res,y,x) #res.new
soboljansen
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices at the same
time (alltogether indices), at a total cost of
model evaluations. These are called the Jansen estimators.
soboljansen(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'soboljansen' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'soboljansen' print(x, ...) ## S3 method for class 'soboljansen' plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'soboljansen' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'soboljansen' ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ..., environment = parent.frame())
soboljansen(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'soboljansen' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'soboljansen' print(x, ...) ## S3 method for class 'soboljansen' plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'soboljansen' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'soboljansen' ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
y_col |
an integer defining the index of the column of |
y_dim3 |
an integer defining the index in the third dimension of
|
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
for |
This estimator is good for large first-order indices, and (large and small) total indices.
This version of soboljansen
also supports matrices and three-dimensional
arrays as output of model
. If the model output is a matrix or an array,
V
, S
and T
are matrices or arrays as well (depending on the
type of y
and the value of nboot
).
The bootstrap outputs V.boot
, S.boot
and T.boot
can only be
returned if the model output is a vector (using argument return.var
). For
matrix or array output, these objects can't be returned.
soboljansen
returns a list of class "soboljansen"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
either a vector, a matrix or a three-dimensional array of model
responses (depends on the output of |
V |
the estimations of Variances of the Conditional Expectations
(VCE) with respect to each factor and also with respect to the
complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
S.boot
and T.boot
).
Bertrand Iooss, with contributions from Frank Weber (2016)
M.J.W. Jansen, 1999, Analysis of variance designs for model output, Computer Physics Communication, 117, 35–43.
A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto and S. Tarantola, 2010, Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index, Computer Physics Communications 181, 259–270.
sobol, sobol2002, sobolSalt, sobol2007, sobolmartinez, sobolEff, sobolmara, sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- soboljansen(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) # Only for demonstration purposes: a model function returning a matrix sobol.fun_matrix <- function(X){ res_vector <- sobol.fun(X) cbind(res_vector, 2 * res_vector) } x_matrix <- soboljansen(model = sobol.fun_matrix, X1, X2) plot(x_matrix, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array sobol.fun_array <- function(X){ res_vector <- sobol.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x_array <- soboljansen(model = sobol.fun_array, X1, X2) plot(x_array, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- soboljansen(model = sobol.fun, X1, X2, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x) # Only for demonstration purposes: a model function returning a matrix sobol.fun_matrix <- function(X){ res_vector <- sobol.fun(X) cbind(res_vector, 2 * res_vector) } x_matrix <- soboljansen(model = sobol.fun_matrix, X1, X2) plot(x_matrix, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array sobol.fun_array <- function(X){ res_vector <- sobol.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x_array <- soboljansen(model = sobol.fun_array, X1, X2) plot(x_array, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
sobolmara
implements the Monte Carlo estimation of
the first-order Sobol' sensitivity indices using the formula of Mara and Joseph (2008), called the
Mara estimator.
This method allows the estimation of all first-order p indices at a cost of
2N model calls (the random sample size), then independently of p (the number of inputs).
sobolmara(model = NULL, X1, ...) ## S3 method for class 'sobolmara' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobolmara' print(x, ...) ## S3 method for class 'sobolmara' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolmara' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolmara' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolmara(model = NULL, X1, ...) ## S3 method for class 'sobolmara' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobolmara' print(x, ...) ## S3 method for class 'sobolmara' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolmara' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolmara' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the random sample. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
The estimator used by sobolmara is based on rearragement of a unique matrix via random permutations (see Mara and Joseph, 2008). Bootstrap confidence intervals are not available.
sobolmara
returns a list of class "sobolmara"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
S |
the estimations of the Sobol' sensitivity indices. |
Bertrand Iooss
Mara, T. and Joseph, O.R. (2008), Comparison of some efficient methods to evaluate the main effect of computer model factors, Journal of Statistical Computation and Simulation, 78:167–178
sobolroalhs, sobol, sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobolmara requires 1 sample # (there are 8 factors, all following the uniform distribution on [0,1]) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolmara(model = sobol.fun, X1 = X1) print(x) plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobolmara requires 1 sample # (there are 8 factors, all following the uniform distribution on [0,1]) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolmara(model = sobol.fun, X1 = X1) print(x) plot(x) library(ggplot2) ggplot(x)
sobolmartinez
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices using
correlation coefficients-based formulas, at a total cost of
model evaluations.
These are called the Martinez estimators.
sobolmartinez(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolmartinez' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobolmartinez' print(x, ...) ## S3 method for class 'sobolmartinez' plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'sobolmartinez' ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ..., environment = parent.frame())
sobolmartinez(model = NULL, X1, X2, nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolmartinez' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'sobolmartinez' print(x, ...) ## S3 method for class 'sobolmartinez' plot(x, ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ...) ## S3 method for class 'sobolmartinez' ggplot(data, mapping = aes(), ylim = c(0, 1), y_col = NULL, y_dim3 = NULL, ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
nboot |
the number of bootstrap replicates, or zero to use theoretical formulas based on confidence interfaces of correlation coefficient (Martinez, 2011). |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
y_col |
an integer defining the index of the column of |
y_dim3 |
an integer defining the index in the third dimension of
|
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
for |
This estimator supports missing values (NA or NaN) which can occur during the simulation of the model on the design of experiments (due to code failure) even if Sobol' indices are no more rigorous variance-based sensitivity indices if missing values are present. In this case, a warning is displayed.
This version of sobolmartinez
also supports matrices and
three-dimensional arrays as output of model
. Bootstrapping (including
bootstrap confidence intervals) is also supported for matrix or array output.
However, theoretical confidence intervals (for nboot = 0
) are only
supported for vector output. If the model output is a matrix or an array,
V
, S
and T
are matrices or arrays as well (depending on the
type of y
and the value of nboot
).
The bootstrap outputs V.boot
, S.boot
and T.boot
can only be
returned if the model output is a vector (using argument return.var
). For
matrix or array output, these objects can't be returned.
sobolmartinez
returns a list of class "sobolmartinez"
,
containing all the input arguments detailed before, plus the following
components:
call |
the matched call. |
X |
a |
y |
either a vector, a matrix or a three-dimensional array of model
responses (depends on the output of |
V |
the estimations of normalized variances of the Conditional
Expectations (VCE) with respect to each factor and also with respect
to the complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
S.boot
and T.boot
).
Bertrand Iooss, with contributions from Frank Weber (2016)
J-M. Martinez, 2011, Analyse de sensibilite globale par decomposition de la variance, Presentation in the meeting of GdR Ondes and GdR MASCOT-NUM, January, 13th, 2011, Institut Henri Poincare, Paris, France.
M. Baudin, K. Boumhaout, T. Delage, B. Iooss and J-M. Martinez, 2016, Numerical stability of Sobol' indices estimation formula, Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016
sobol, sobol2002, sobolSalt, sobol2007, soboljansen, soboltouati, sobolMultOut
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolmartinez(model = sobol.fun, X1, X2, nboot = 0) print(x) plot(x) library(ggplot2) ggplot(x) # Only for demonstration purposes: a model function returning a matrix sobol.fun_matrix <- function(X){ res_vector <- sobol.fun(X) cbind(res_vector, 2 * res_vector) } x_matrix <- sobolmartinez(model = sobol.fun_matrix, X1, X2) plot(x_matrix, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array sobol.fun_array <- function(X){ res_vector <- sobol.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x_array <- sobolmartinez(model = sobol.fun_array, X1, X2) plot(x_array, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolmartinez(model = sobol.fun, X1, X2, nboot = 0) print(x) plot(x) library(ggplot2) ggplot(x) # Only for demonstration purposes: a model function returning a matrix sobol.fun_matrix <- function(X){ res_vector <- sobol.fun(X) cbind(res_vector, 2 * res_vector) } x_matrix <- sobolmartinez(model = sobol.fun_matrix, X1, X2) plot(x_matrix, y_col = 2) title(main = "y_col = 2") # Also only for demonstration purposes: a model function returning a # three-dimensional array sobol.fun_array <- function(X){ res_vector <- sobol.fun(X) res_matrix <- cbind(res_vector, 2 * res_vector) array(data = c(res_matrix, 5 * res_matrix), dim = c(length(res_vector), 2, 2)) } x_array <- sobolmartinez(model = sobol.fun_array, X1, X2) plot(x_array, y_col = 2, y_dim3 = 2) title(main = "y_col = 2, y_dim3 = 2")
sobolMultOut
implements the aggregated Sobol' indices for
multiple outputs. It consists in averaging all the Sobol indices weighted
by the variance of their corresponding output. Moreover, this function computes and plots
the functional (unidimensional) Sobol' indices for functional (unidimensional)
model output via plotMultOut
. Sobol' indices for both first-order and total indices are estimated
by Monte Carlo formulas.
sobolMultOut(model = NULL, q = 1, X1, X2, MCmethod = "sobol", ubiquitous = FALSE, ...) ## S3 method for class 'sobolMultOut' print(x, ...) ## S3 method for class 'sobolMultOut' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolMultOut' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolMultOut' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolMultOut(model = NULL, q = 1, X1, X2, MCmethod = "sobol", ubiquitous = FALSE, ...) ## S3 method for class 'sobolMultOut' print(x, ...) ## S3 method for class 'sobolMultOut' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolMultOut' plotMultOut(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolMultOut' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
q |
dimension of the model output vector. |
X1 |
the first random sample. |
X2 |
the second random sample. |
MCmethod |
a character string specifying the Monte-Carlo procedure used
to estimate the Sobol indices. The avaible methods are : |
ubiquitous |
if TRUE, 1D functional Sobol indices are computed (default=FALSE). |
x |
a list of class |
data |
a list of class |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
For this function, there are several gaps: the bootstrap estimation of confidence
intervals is not avalaible and the tell function does not work.
Aggregated Sobol' indices can be plotted with the S3 method plot
and ubiquitous Sobol' indices can be visualized with the S3 method plotMultOut
(does not work for the "sobolGP"
method).
sobolMultOut
returns a list of class MCmethod
, containing all
its input arguments, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used |
V |
the estimations of the aggregated Variances of the Conditional Expectations
(VCE) with respect to each factor and also with respect to the
complementary set of each factor ("all but |
S |
the estimations of the aggregated Sobol' first-order indices. |
T |
the estimations of the aggregated Sobol' total sensitivity indices. |
Sfct |
the estimations of the functional Sobol' first-order indices (if ubiquitous=TRUE and plot.fct=TRUE). |
Tfct |
the estimations of the functional Sobol' total sensitivity indices (if ubiquitous=TRUE and plot.fct=TRUE). |
Bertrand Iooss
M. Lamboni, H. Monod and D. Makowski, 2011, Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models, Reliability Engineering and System Safety, 96:450-459.
F. Gamboa, A. Janon, T. Klein and A. Lagnoux, 2014, Sensitivity indices for multivariate outputs, Electronic Journal of Statistics, 8:575-603.
sobol, sobol2002, sobol2007, soboljansen,
sobolmara, sobolGP
# Tests on the functional toy fct 'Arctangent temporal function' y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2)) #plot(y0[1,],type="l") #apply(y0,1,lines) n <- 100 X <- matrix(c(runif(2*n,-7,7)),ncol=2) y <- atantemp.fun(X) plot(y0[2,],ylim=c(-2,2),type="l") apply(y,1,lines) # Sobol indices computations n <- 1000 X1 <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) X2 <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) sa <- sobolMultOut(model=atantemp.fun, q=100, X1, X2, MCmethod="soboljansen", ubiquitous=TRUE) print(sa) plot(sa) plotMultOut(sa) library(ggplot2) ggplot(sa)
# Tests on the functional toy fct 'Arctangent temporal function' y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2)) #plot(y0[1,],type="l") #apply(y0,1,lines) n <- 100 X <- matrix(c(runif(2*n,-7,7)),ncol=2) y <- atantemp.fun(X) plot(y0[2,],ylim=c(-2,2),type="l") apply(y,1,lines) # Sobol indices computations n <- 1000 X1 <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) X2 <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) sa <- sobolMultOut(model=atantemp.fun, q=100, X1, X2, MCmethod="soboljansen", ubiquitous=TRUE) print(sa) plot(sa) plotMultOut(sa) library(ggplot2) ggplot(sa)
sobolowen
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices at the same
time (alltogether indices). Take as input 3 independent matrices.
These are called the Owen estimators.
sobolowen(model = NULL, X1, X2, X3, nboot = 0, conf = 0.95, varest = 2, ...) ## S3 method for class 'sobolowen' tell(x, y = NULL, return.var = NULL, varest = 2, ...) ## S3 method for class 'sobolowen' print(x, ...) ## S3 method for class 'sobolowen' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolowen' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolowen(model = NULL, X1, X2, X3, nboot = 0, conf = 0.95, varest = 2, ...) ## S3 method for class 'sobolowen' tell(x, y = NULL, return.var = NULL, varest = 2, ...) ## S3 method for class 'sobolowen' print(x, ...) ## S3 method for class 'sobolowen' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolowen' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
X3 |
the third random sample. |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
varest |
choice for the variance estimator for the denominator of the Sobol' indices. varest=1 is for a classical estimator. varest=2 (default) is for the estimator proposed in Janon et al. (2012). |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
sobolowen
returns a list of class "sobolowen"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used |
V |
the estimations of Variances of the Conditional Expectations
(VCE) with respect to each factor and also with respect to the
complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Users can ask more ouput variables with the argument
return.var
(for example, bootstrap outputs V.boot
,
S.boot
and T.boot
).
Taieb Touati and Bernardo Ramos
A. Owen, 2013, Better estimations of small Sobol' sensitivity indices, ACM Transactions on Modeling and Computer Simulations (TOMACS), 23(2), 11.
Janon, A., Klein T., Lagnoux A., Nodet M., Prieur C. (2012), Asymptotic normality and efficiency of two Sobol index estimators. Accepted in ESAIM: Probability and Statistics.
sobol, sobol2002, sobolSalt, sobol2007, soboljansen, sobolmartinez, sobolEff
# Test case : the non-monotonic Sobol g-function # The method of sobolowen requires 3 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) X3 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolowen(model = sobol.fun, X1, X2, X3, nboot = 10) # put nboot=100 print(x) plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobolowen requires 3 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) X3 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolowen(model = sobol.fun, X1, X2, X3, nboot = 10) # put nboot=100 print(x) plot(x) library(ggplot2) ggplot(x)
sobolrank
implements the estimation of all first-order indices using only N model evaluations
via ranking following Gamboa et al. (2020) and inspired by Chatterjee (2019).
sobolrank(model = NULL, X, nboot = 0, conf = 0.95, nsample = round(0.8*nrow(X)), ...) ## S3 method for class 'sobolrank' tell(x, y = NULL, ...) ## S3 method for class 'sobolrank' print(x, ...) ## S3 method for class 'sobolrank' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolrank' ggplot(data, mapping = aes(), ..., environment = parent.frame(), ylim = c(0, 1))
sobolrank(model = NULL, X, nboot = 0, conf = 0.95, nsample = round(0.8*nrow(X)), ...) ## S3 method for class 'sobolrank' tell(x, y = NULL, ...) ## S3 method for class 'sobolrank' print(x, ...) ## S3 method for class 'sobolrank' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'sobolrank' ggplot(data, mapping = aes(), ..., environment = parent.frame(), ylim = c(0, 1))
model |
a function, or a model with a |
X |
a random sample of the inputs. |
nboot |
the number of bootstrap replicates, see details. |
conf |
the confidence level for confidence intervals, see details. |
nsample |
the size of the bootstrap sample, see details. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
The estimator used by sobolrank is defined in Gamboa et al. (2020).
It is based on ranking the inputs as was first proposed by Chatterjee (2019) for a Cramer-Von Mises based estimator.
All first-order indices can be estimated with a single sample of size N.
Since boostrap creates ties which are not accounted for in the algorithm, confidence intervals are obtained by
sampling without replacement with a sample size nsample
.
sobolrank
returns a list of class "sobolrank"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
S |
the estimations of the Sobol' sensitivity indices. |
Sebastien Da Veiga
Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.
Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.
sobol, sobol2002, sobolSalt, sobol2007, soboljansen, sobolmartinez,
sobolSmthSpl, sobolEff, sobolshap_knn
# Test case : the non-monotonic Sobol g-function # Example with a call to a numerical model library(boot) n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- sobolrank(model = sobol.fun, X = X, nboot = 100) print(x) library(ggplot2) ggplot(x) # Test case : the Ishigami function # Example with given data n <- 500 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- sobolrank(model = NULL, X) tell(x,Y) print(x) ggplot(x)
# Test case : the non-monotonic Sobol g-function # Example with a call to a numerical model library(boot) n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- sobolrank(model = sobol.fun, X = X, nboot = 100) print(x) library(ggplot2) ggplot(x) # Test case : the Ishigami function # Example with given data n <- 500 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- sobolrank(model = NULL, X) tell(x,Y) print(x) ggplot(x)
sobolrec
implements a recursive version of the procedure introduced by Tissot & Prieur (2015) using two replicated nested designs. This function estimates either all first-order indices or all closed second-order indices at a total cost of model evaluations where
is the size of each replicated nested design.
sobolrec(model=NULL, factors, layers, order, precision, method=NULL, tail=TRUE, ...) ## S3 method for class 'sobolrec' ask(x, index, ...) ## S3 method for class 'sobolrec' tell(x, y = NULL, index, ...) ## S3 method for class 'sobolrec' print(x, ...) ## S3 method for class 'sobolrec' plot(x, ylim = c(0,1), ...)
sobolrec(model=NULL, factors, layers, order, precision, method=NULL, tail=TRUE, ...) ## S3 method for class 'sobolrec' ask(x, index, ...) ## S3 method for class 'sobolrec' tell(x, y = NULL, index, ...) ## S3 method for class 'sobolrec' print(x, ...) ## S3 method for class 'sobolrec' plot(x, ylim = c(0,1), ...)
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
layers |
If |
order |
an integer specifying which indices to estimate: |
precision |
a vector containing:
|
tail |
a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages"). |
method |
If
Set to |
x |
a list of class |
index |
an integer specifying the step of the recursion |
y |
the model response. |
ylim |
y-coordinate plotting limits. |
... |
any other arguments for |
For first-order indices, layers
is a vector:
specifying the number of layers of the nested design whose respective size are given by:
For closed second-order indices, layers
directly specifies the size of all layers.
For each Sobol' index the stopping criterion writes:
This criterion is tested for the last steps (including the current one).
and
are respectively the target precision and the number of steps of the stopping criterion specified in
precision
.
sobolrec
uses either an algebraic or an accept-rejet method
to construct the orthogonal arrays for the estimation of closed second-order indices. The algebraic method is less precise than the accept-reject method but offers more steps when the number of factors
is small.
sobolrec
automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be performed before the call to tell().
sobolrec
returns a list of class "sobolrec"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a list of the response used at each step. |
V |
a list of the model variance estimated at each step. |
S |
a list of the Sobol' indices estimated at each step. |
steps |
the number of steps performed. |
N |
the size of each replicated nested design. |
layers
is not the square of a prime number. It has been replaced by: "When order=2
, the value of layers
must be the square of a prime power number. This warning message indicates that it was not the case and the value has been replaced depending on tail
. If tail=TRUE
(resp. tail=FALSE
) the new value of layers
is equal to the square of the prime number preceding (resp. following) the square root of layers
.
layers
is not satisfying the constraint. It has been replaced by: "the value for
layers
must satisfied the constraint where
is the number of factors. This warning message indicates that
N
was replaced by the square of the prime number following (or equals to) .
A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.
L. Gilquin, E. Arnaud, H. Monod and C. Prieur, 2021, Recursive estimation procedure of Sobol' indices based on replicated designs, Computational and Applied Mathematics, 40:1–23.
# Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) # first-order indices estimation x <- sobolrec(model = sobol.fun, factors = 8, layers=rep(2,each=15), order=1, precision = c(5*10^(-2),2), method=NULL, tail=TRUE) print(x) # closed second-order indices estimation x <- sobolrec(model = sobol.fun, factors = 8, layers=11^2, order=2, precision = c(10^(-2),3), method="al", tail=TRUE) print(x) # Test case: dealing with external model # put in comment because of bug with ask use ! #x <- sobolrec(model = NULL, factors = 8, layers=rep(2,each=15), order=1, # precision = c(5*10^(-2),2), method=NULL, tail=TRUE) #toy <- sobol.fun #k <- 1 #stop_crit <- FALSE #while(!(stop_crit) & (k<length(x$layers))){ # ask(x, index=k) # y <- toy(x$block) # tell(x, y, index=k) # stop_crit <- x$stop_crit # k <- k+1 #} #print(x)
# Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) # first-order indices estimation x <- sobolrec(model = sobol.fun, factors = 8, layers=rep(2,each=15), order=1, precision = c(5*10^(-2),2), method=NULL, tail=TRUE) print(x) # closed second-order indices estimation x <- sobolrec(model = sobol.fun, factors = 8, layers=11^2, order=2, precision = c(10^(-2),3), method="al", tail=TRUE) print(x) # Test case: dealing with external model # put in comment because of bug with ask use ! #x <- sobolrec(model = NULL, factors = 8, layers=rep(2,each=15), order=1, # precision = c(5*10^(-2),2), method=NULL, tail=TRUE) #toy <- sobol.fun #k <- 1 #stop_crit <- FALSE #while(!(stop_crit) & (k<length(x$layers))){ # ask(x, index=k) # y <- toy(x$block) # tell(x, y, index=k) # stop_crit <- x$stop_crit # k <- k+1 #} #print(x)
sobolrep
generalizes the estimation of the Sobol' sensitivity indices introduced by Tissot & Prieur (2015) using two replicated orthogonal arrays. This function estimates either
all first-order and second-order indices at a total cost of model evaluations,
or all first-order, second-order and total-effect indices at a total cost of model evaluations,
where and
is a prime number corresponding to the number of levels of each orthogonal array.
sobolrep(model = NULL, factors, N, tail=TRUE, conf=0.95, nboot=0, nbrep=1, total=FALSE, ...) ## S3 method for class 'sobolrep' tell(x, y = NULL, ...) ## S3 method for class 'sobolrep' print(x, ...) ## S3 method for class 'sobolrep' plot(x, ylim = c(0,1), choice, ...)
sobolrep(model = NULL, factors, N, tail=TRUE, conf=0.95, nboot=0, nbrep=1, total=FALSE, ...) ## S3 method for class 'sobolrep' tell(x, y = NULL, ...) ## S3 method for class 'sobolrep' print(x, ...) ## S3 method for class 'sobolrep' plot(x, ylim = c(0,1), choice, ...)
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
N |
an integer giving the size of each replicated design (for a total of |
tail |
a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages"). |
conf |
the confidence level for confidence intervals. |
nboot |
the number of bootstrap replicates. |
nbrep |
the number of times the estimation procedure is repeated (see "Details"). |
total |
a boolean specifying whether or not total effect indices are estimated. |
x |
a list of class |
y |
the model response. |
ylim |
y-coordinate plotting limits. |
choice |
an integer specifying which indices to plot: |
... |
any other arguments for |
sobolrep
automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be performed before the call to tell() (see "Examples").
nbrep
specifies the number of times the estimation procedure is repeated. Each repetition makes use of the orthogonal array structure to obtain a new set of Sobol' indices. It is important to note that no additional model evaluations are performed (the cost of the procedure remains the same).
sobolrep
returns a list of class "sobolrep"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used. |
RP |
the matrix of permutations. |
V |
the model variance. |
S |
a data.frame containing estimations of the first-order Sobol' indices plus confidence intervals if specified. |
S2 |
a data.frame containing estimations of the second-order Sobol' indices plus confidence intervals if specified. |
T |
a data.frame containing estimations of the total-effect indices plus confidence intervals if specified. |
N
is not the square of a prime number. It has been replaced by: "the number of levels q
of each orthogonal array must be a prime number. If N
is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail
. If tail=TRUE
(resp. tail=FALSE
) the new value of N
is equal to the square of the prime number preceding (resp. following) the square root of N
.
N
is not satisfying the constraint
. It has been replaced by: "the following constraint must be satisfied where
is the number of factors. This warning message indicates that
N
was replaced by the square of the prime number following (or equals to) .
A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.
J-Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.
# Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) x <- sobolrep(model = sobol.fun, factors = 8, N = 1000, nboot=100, nbrep=1, total=FALSE) print(x) plot(x,choice=1) plot(x,choice=2) # Test case: dealing with non-uniform distributions x <- sobolrep(model = NULL, factors = 3, N = 1000, nboot=0, nbrep=1, total=FALSE) # X1 follows a log-normal distribution: x$X[,1] <- qlnorm(x$X[,1]) # X2 follows a standard normal distribution: x$X[,2] <- qnorm(x$X[,2]) # X3 follows a gamma distribution: x$X[,3] <- qgamma(x$X[,3],shape=0.5) # toy example toy <- function(x){rowSums(x)} y <- toy(x$X) tell(x, y) print(x) plot(x,choice=1) plot(x,choice=2)
# Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) x <- sobolrep(model = sobol.fun, factors = 8, N = 1000, nboot=100, nbrep=1, total=FALSE) print(x) plot(x,choice=1) plot(x,choice=2) # Test case: dealing with non-uniform distributions x <- sobolrep(model = NULL, factors = 3, N = 1000, nboot=0, nbrep=1, total=FALSE) # X1 follows a log-normal distribution: x$X[,1] <- qlnorm(x$X[,1]) # X2 follows a standard normal distribution: x$X[,2] <- qnorm(x$X[,2]) # X3 follows a gamma distribution: x$X[,3] <- qgamma(x$X[,3],shape=0.5) # toy example toy <- function(x){rowSums(x)} y <- toy(x$X) tell(x, y) print(x) plot(x,choice=1) plot(x,choice=2)
sobolroalhs
implements the estimation of the Sobol' sensitivity indices introduced by Tissot & Prieur (2015) using two replicated designs (Latin hypercubes or orthogonal arrays). This function estimates either all first-order indices or all closed second-order indices at a total cost of model evaluations. For closed second-order indices
where
is a prime number corresponding to the number of levels of the orthogonal array, and where
indicates the number of factors.
sobolroalhs(model = NULL, factors, N, p=1, order, tail=TRUE, conf=0.95, nboot=0, ...) ## S3 method for class 'sobolroalhs' tell(x, y = NULL, ...) ## S3 method for class 'sobolroalhs' print(x, ...) ## S3 method for class 'sobolroalhs' plot(x, ylim = c(0,1), ...) ## S3 method for class 'sobolroalhs' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolroalhs(model = NULL, factors, N, p=1, order, tail=TRUE, conf=0.95, nboot=0, ...) ## S3 method for class 'sobolroalhs' tell(x, y = NULL, ...) ## S3 method for class 'sobolroalhs' print(x, ...) ## S3 method for class 'sobolroalhs' plot(x, ylim = c(0,1), ...) ## S3 method for class 'sobolroalhs' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
N |
an integer giving the size of each replicated design (for a total of |
p |
an integer giving the number of model outputs. |
order |
an integer giving the order of the indices (1 or 2). |
tail |
a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages"). |
conf |
the confidence level for confidence intervals. |
nboot |
the number of bootstrap replicates. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
sobolroalhs
automatically assigns a uniform distribution on [0,1] to each input. Transformations of distributions (between U[0,1] and the wanted distribution) have to be realized before the call to tell() (see "Examples").
Missing values (i.e NA
values) in outputs are automatically handled by the function.
This function also supports multidimensional outputs (matrices in y
or as output of model
). In this case, aggregated Sobol' indices are returned (see sobolMultOut
).
sobolroalhs
returns a list of class "sobolroalhs"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the responses used. |
OA |
the orthogonal array constructed ( |
V |
the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor. |
S |
the estimations of the Sobol' indices. |
N
is not the square of a prime number. It has been replaced by: "when order
, the number of levels of the orthogonal array must be a prime number. If
N
is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail
. If tail=TRUE
(resp. tail=FALSE
) the new value of N
is equal to the square of the prime number preceding (resp. following) the square root of N
.
N
is not satisfying the constraint
. It has been replaced by: "when order
, the following constraint must be satisfied
where
is the number of factors. This warning message indicates that
N
was replaced by the square of the prime number following (or equals to) .
Laurent Gilquin
A.S. Hedayat, N.J.A. Sloane and J. Stufken, 1999, Orthogonal Arrays: Theory and Applications, Springer Series in Statistics.
F. Gamboa, A. Janon, T. Klein and A. Lagnoux, 2014, Sensitivity indices for multivariate outputs, Electronic Journal of Statistics, 8:575-603.
J.Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.
sobolmara
,
sobolroauc
,
sobolMultOut
library(boot) library(numbers) #################### # Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) # first-order sensitivity indices x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 1, nboot=100) print(x) plot(x) library(ggplot2) ggplot(x) # closed second-order sensitivity indices x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 2, nboot=100) print(x) ggplot(x) #################### # Test case: dealing with non-uniform distributions x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0) # X1 follows a log-normal distribution: x$X[,1] <- qlnorm(x$X[,1]) # X2 follows a standard normal distribution: x$X[,2] <- qnorm(x$X[,2]) # X3 follows a gamma distribution: x$X[,3] <- qgamma(x$X[,3],shape=0.5) # toy example toy <- function(x){rowSums(x)} y <- toy(x$X) tell(x, y) print(x) ggplot(x) #################### # Test case : multidimensional outputs toy <- function(x){cbind(x[,1]+x[,2]+x[,1]*x[,2],2*x[,1]+3*x[,1]*x[,2]+x[,2])} x <- sobolroalhs(model = toy, factors = 3, N = 1000, p=2, order =1, nboot=100) print(x) ggplot(x)
library(boot) library(numbers) #################### # Test case: the non-monotonic Sobol g-function # The method of sobol requires 2 samples # (there are 8 factors, all following the uniform distribution on [0,1]) # first-order sensitivity indices x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 1, nboot=100) print(x) plot(x) library(ggplot2) ggplot(x) # closed second-order sensitivity indices x <- sobolroalhs(model = sobol.fun, factors = 8, N = 1000, order = 2, nboot=100) print(x) ggplot(x) #################### # Test case: dealing with non-uniform distributions x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0) # X1 follows a log-normal distribution: x$X[,1] <- qlnorm(x$X[,1]) # X2 follows a standard normal distribution: x$X[,2] <- qnorm(x$X[,2]) # X3 follows a gamma distribution: x$X[,3] <- qgamma(x$X[,3],shape=0.5) # toy example toy <- function(x){rowSums(x)} y <- toy(x$X) tell(x, y) print(x) ggplot(x) #################### # Test case : multidimensional outputs toy <- function(x){cbind(x[,1]+x[,2]+x[,1]*x[,2],2*x[,1]+3*x[,1]*x[,2]+x[,2])} x <- sobolroalhs(model = toy, factors = 3, N = 1000, p=2, order =1, nboot=100) print(x) ggplot(x)
sobolroauc
deals with the estimation of Sobol' sensitivity indices when there exists one or multiple sets of constrained factors. Constraints within a set are expressed as inequality constraints (simplex constraint). This function generalizes the procedure of Tissot and Prieur (2015) to estimate either all first-order indices or all closed second-order indices at a total cost of model evaluations. For closed second-order indices
where
is a prime number denoting the number of levels of the orthogonal array, and where
indicates the number of independent factors or sets of factors.
sobolroauc(model = NULL, factors, constraints = NULL, N, p = 1, order, tail = TRUE, conf = 0.95, nboot = 0, ...) ## S3 method for class 'sobolroauc' tell(x, y = NULL, ...) ## S3 method for class 'sobolroauc' print(x, ...) ## S3 method for class 'sobolroauc' plot(x, ylim = c(0,1), ...) ## S3 method for class 'sobolroauc' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
sobolroauc(model = NULL, factors, constraints = NULL, N, p = 1, order, tail = TRUE, conf = 0.95, nboot = 0, ...) ## S3 method for class 'sobolroauc' tell(x, y = NULL, ...) ## S3 method for class 'sobolroauc' print(x, ...) ## S3 method for class 'sobolroauc' plot(x, ylim = c(0,1), ...) ## S3 method for class 'sobolroauc' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
factors |
an integer giving the number of factors, or a vector of character strings giving their names. |
constraints |
a list giving the sets of constrained factors (see "Details"). |
N |
an integer giving the size of each replicated design (for a total of |
p |
an integer giving the number of model outputs. |
order |
an integer giving the order of the indices (1 or 2). |
tail |
a boolean specifying the method used to choose the number of levels of the orthogonal array (see "Warning messages"). |
conf |
the confidence level for confidence intervals. |
nboot |
the number of bootstrap replicates. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
constraints
list the sets of factors depending on each other through inequality constraints (see "Examples"). A same factor is not allowed to appear in multiple sets. Factors not appearing in constraints
are assumed to be independent and follow each a uniform distribution on [0,1]. One Sobol' index is estimated for each independent factor or set of factors.
Missing values (i.e NA
values) in the model responses are automatically handled by the function.
This function also supports multidimensional outputs (matrices in y
or as output of model
).
In this case, aggregated Sobol' indices are returned (see sobolMultOut
).
sobolroauc
returns a list of class "sobolroauc"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the responses used. |
OA |
the orthogonal array constructed ( |
V |
the estimations of Variances of the Conditional Expectations (VCE) with respect to each factor. |
S |
the estimations of the Sobol' indices. |
N
is not the square of a prime number. It has been replaced by: "when order
, the number of levels of the orthogonal array must be a prime number. If
N
is not a square of a prime number, then this warning message indicates that it was replaced depending on the value of tail
. If tail=TRUE
(resp. tail=FALSE
) the new value of N
is equal to the square of the prime number preceding (resp. following) the square root of N
.
N
is not satisfying the constraint
. It has been replaced by: "when order
, the following constraint must be satisfied
where
is the number of independent factors or sets of factors. This warning message indicates that
N
was replaced by the square of the prime number following (or equals to) .
Laurent Gilquin
L. Devroye, 1986, Non-Uniform Random Variate Generation. Springer-Verlag.
J. Jacques, C. Lavergne and N. Devictor, 2006, Sensitivity Analysis in presence of model uncertainty and correlated inputs. Reliability Engineering & System Safety, 91:1126-1134.
L. Gilquin, C. Prieur and E. Arnaud, 2015, Replication procedure for grouped Sobol' indices estimation in dependent uncertainty spaces, Information and Inference, 4:354-379.
J.Y. Tissot and C. Prieur, 2015, A randomized orthogonal orray-based procedure for the estimation of first- and second-order Sobol' indices, J. Statist. Comput. Simulation, 85:1358-1381.
library(boot) library(numbers) # Test case: the non-monotonic Sobol g-function # (there are 8 factors, all following the uniform distribution on [0,1]) # Suppose we have the inequality constraints: X1 <= X3 and X4 <= X6. # first-order sensitivity indices x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), N = 1000, order = 1, nboot=100) print(x) plot(x) library(ggplot2) ggplot(x) # closed second-order sensitivity indices x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), N = 1000, order = 2, nboot=100) print(x) ggplot(x)
library(boot) library(numbers) # Test case: the non-monotonic Sobol g-function # (there are 8 factors, all following the uniform distribution on [0,1]) # Suppose we have the inequality constraints: X1 <= X3 and X4 <= X6. # first-order sensitivity indices x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), N = 1000, order = 1, nboot=100) print(x) plot(x) library(ggplot2) ggplot(x) # closed second-order sensitivity indices x <- sobolroauc(model = sobol.fun, factors = 8, constraints = list(c(1,3),c(4,6)), N = 1000, order = 2, nboot=100) print(x) ggplot(x)
sobolSalt
implements the Monte Carlo estimation of
the Sobol' indices for either both first-order and total effect indices at the same
time (alltogether indices) at a total cost of
model evaluations; or first-order, second-order and total indices at the same time (alltogether
indices) at a total cost of
model evaluations.
sobolSalt(model = NULL, X1, X2, scheme="A", nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolSalt' tell(x, y = NULL, ...) ## S3 method for class 'sobolSalt' print(x, ...) ## S3 method for class 'sobolSalt' plot(x, ylim = c(0, 1), choice, ...) ## S3 method for class 'sobolSalt' ggplot(data, mapping = aes(), ylim = c(0, 1), choice, ..., environment = parent.frame())
sobolSalt(model = NULL, X1, X2, scheme="A", nboot = 0, conf = 0.95, ...) ## S3 method for class 'sobolSalt' tell(x, y = NULL, ...) ## S3 method for class 'sobolSalt' print(x, ...) ## S3 method for class 'sobolSalt' plot(x, ylim = c(0, 1), choice, ...) ## S3 method for class 'sobolSalt' ggplot(data, mapping = aes(), ylim = c(0, 1), choice, ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample (containing |
X2 |
the second random sample (containing |
scheme |
a letter |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level for bootstrap confidence intervals. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
choice |
an integer specifying which indices to plot: |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
The estimators used are the one implemented in "sobolEff"
.
scheme
specifies which Saltelli's scheme is to be used: "A"
to estimate both first-order and total effect indices, "B"
to estimate first-order, second-order and total effect indices.
sobolSalt
returns a list of class "sobolSalt"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
the response used. |
V |
the model variance. |
S |
the estimations of the Sobol' first-order indices. |
S2 |
the estimations of the Sobol' second-order indices (only for scheme |
T |
the estimations of the Sobol' total sensitivity indices. |
Laurent Gilquin
A. Janon, T. Klein, A. Lagnoux, M. Nodet, C. Prieur (2014), Asymptotic normality and efficiency of two Sobol index estimators, ESAIM: Probability and Statistics, 18:342-364.
A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Computer Physics Communication, 145:580-297.
sobol, sobol2007, soboljansen, sobolmartinez, sobolEff
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolSalt(model = sobol.fun, X1, X2, scheme="A", nboot = 100) print(x) plot(x, choice=1) library(ggplot2) ggplot(x, choice=1)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- sobolSalt(model = sobol.fun, X1, X2, scheme="A", nboot = 100) print(x) plot(x, choice=1) library(ggplot2) ggplot(x, choice=1)
WARNING: DEPRECATED function: use shapleysobol_knn
instead.
sobolshap_knn
implements the estimation of several sensitivity indices using
only N model evaluations via ranking (following Gamboa et al. (2020) and Chatterjee (2019))
or nearest neighbour search (Broto et al. (2020) and Azadkia & Chatterjee (2020)).
It can be used with categorical inputs (which are transformed with one-hot encoding),
dependent inputs and multiple outputs. Sensitivity indices of any group of inputs can be computed,
which means that in particular first-order/total Sobol indices and Shapley effects are accessible.
For large sample sizes, the nearest neightbour algorithm can be significantly accelerated
by using approximate nearest neighbour search. It is also possible to estimate Shapley effects
with the random permutation approach of Castro et al.(2009), where all the terms are obtained
with ranking or nearest neighbours.
sobolshap_knn(model = NULL, X, id.cat = NULL, U = NULL, method = "knn", n.knn = 2, return.shap = FALSE, randperm = FALSE, n.perm = 1e4, rescale = FALSE, n.limit = 2000, noise = FALSE, ...) ## S3 method for class 'sobolshap_knn' tell(x, y = NULL, ...) ## S3 method for class 'sobolshap_knn' extract(x, ...) ## S3 method for class 'sobolshap_knn' print(x, ...) ## S3 method for class 'sobolshap_knn' plot(x, ylim = c(0, 1), type.multout = "lines", ...) ## S3 method for class 'sobolshap_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), type.multout = "lines", ..., environment = parent.frame())
sobolshap_knn(model = NULL, X, id.cat = NULL, U = NULL, method = "knn", n.knn = 2, return.shap = FALSE, randperm = FALSE, n.perm = 1e4, rescale = FALSE, n.limit = 2000, noise = FALSE, ...) ## S3 method for class 'sobolshap_knn' tell(x, y = NULL, ...) ## S3 method for class 'sobolshap_knn' extract(x, ...) ## S3 method for class 'sobolshap_knn' print(x, ...) ## S3 method for class 'sobolshap_knn' plot(x, ylim = c(0, 1), type.multout = "lines", ...) ## S3 method for class 'sobolshap_knn' ggplot(data, mapping = aes(), ylim = c(0, 1), type.multout = "lines", ..., environment = parent.frame())
model |
a function, or a model with a |
X |
a random sample of the inputs. |
id.cat |
a vector with the indices of the categorical inputs. |
U |
an integer equal to 0 (total Sobol indices) or 1 (first-order Sobol indices) or a list of vector indices defining the subsets of inputs whose sensitivity indices must be computed or a matrix of 0s and 1s where each row encodes a subset of inputs whose sensitivity indices must be computed (see examples) or NULL (all possible subsets). |
method |
the algorithm to be used for estimation, either "rank" or "knn", see details. |
n.knn |
the number of nearest neighbours used for estimation if |
return.shap |
a logical indicating if Shapley effects must be estimated,
can only be TRUE if |
randperm |
a logical indicating if random permutations are used to estimate Shapley effects,
only if |
n.perm |
the number of random permutations used for estimation if |
rescale |
a logical indicating if continuous inputs must be rescaled before distance computations.
If TRUE, continuous inputs are first whitened with the ZCA-cor whitening procedure
(cf. whiten() function in package |
n.limit |
the sample size limit above which approximate nearest neighbour search is activated,
only used if |
noise |
a logical which is TRUE if the model or the output sample is noisy, see details. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
ylim |
y-coordinate plotting limits. |
type.multout |
the plotting method in the case of multiple outputs, either "points" or "lines", see examples. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
For method="rank"
, the estimator is defined in Gamboa et al. (2020)
following Chatterjee (2019). For first-order indices it is based on an input ranking
(same algorithm as in sobolrank
) while for higher orders,
it uses an approximate heuristic solution of the traveling salesman problem
applied to the input sample distances (cf. TSP() function in package TSP
).
For method="knn"
, ranking and TSP are replaced by a nearest neighbour search
as proposed in Broto et al. (2020) and in Azadkia & Chatterjee (2020) for a similar coefficient.
The algorithm is the same as in shapleySubsetMc
but with an optimized implementation.
In particular, the distance used for subsets with mixed inputs (continuous and categorical)
are the same but here the additional one-hot encoding of categorical variables makes it possible to
work only with Euclidean distances. Furthermore, a fast approximate nearest neighbour search is also
available, which is strongly recommended for large sample sizes. The main difference
with shapleySubsetMc
is that here we use the entire N sample to compute all indices,
while in shapleySubsetMc
the user can specify a total cost Ntot which performs
a specific allocation of sample sizes to the estimation of each index.
In addition, the weights
option is not available here yet.
If the outputs are noisy, the argument noise
can be used: it only has an impact on the
estimation of one specific sensitivity index, namely .
If there is no noise this index is equal to 1, while in the presence of noise it must be estimated.
When randperm=TRUE
, Shapley effects are no longer estimated by computing all the possible
subsets of variables but only on subsets obtained with random permutations as proposed in Castro et al.(2009).
This is useful for problems with a large number of inputs, since the number of subsets increases exponentially
with dimension.
The extract
method is useful if in a first step the Shapley effects have been computed
and thus sensitivity indices for all possible subsets are available.
The resulting sobolshap_knn
object can be post-treated by extract
to get first-order and total Sobol indices very easily.
sobolshap_knn
returns a list of class "sobolshap_knn"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
U |
the subsets of inputs for which sensitivity indices have been computed. |
S |
the estimations of the Sobol sensitivity indices (see details). |
Shap |
the estimations of Shapley effects, if return.shap was set to TRUE. |
order |
0 (total indices), 1 (first-order indices) or NULL. Used for plotting defaults. |
Sebastien Da Veiga
Azadkia M., Chatterjee S., 2021), A simple measure of conditional dependence, Ann. Statist. 49(6):3070-3102.
Broto B., Bachoc F., Depecker M. (2020), Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, SIAM/ASA Journal of Uncertainty Quantification, 8:693-716.
Castro J., Gomez D, Tejada J. (2009). Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726-1730.
Chatterjee, S., 2021, A new coefficient of correlation, Journal of the American Statistical Association, 116:2009-2022.
Gamboa, F., Gremaud, P., Klein, T., & Lagnoux, A., 2022, Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics, Bernoulli 28: 2345-2374.
sobolrank, shapleysobol_knn, shapleySubsetMc
# Test case: the non-monotonic Sobol g-function # Example with a call to a numerical model # First compute first-order indices with ranking n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- sobolshap_knn(model = sobol.fun, X = X, U = 1, method = "rank") print(x) library(ggplot2) ggplot(x) # We can use the output sample generated for this estimation to compute # total indices without additional calls to the model x2 <- sobolshap_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5) tell(x2,x$y) ggplot(x2) # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search library(RANN) n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, n.limit = 2000) tell(x,Y) library(ggplot2) ggplot(x) # We can also extract first-order and total Sobol indices x1 <- extract(x) print(x1) # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- sobolshap_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, rescale = TRUE, n.limit = 2000) print(x) # Test case: functional toy fct 'Arctangent temporal function' n <- 3000 X <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) Y <- atantemp.fun(X) x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, n.limit = 2000) tell(x,Y) library(ggplot2) library(reshape2) ggplot(x, type.multout="lines")
# Test case: the non-monotonic Sobol g-function # Example with a call to a numerical model # First compute first-order indices with ranking n <- 1000 X <- data.frame(matrix(runif(8 * n), nrow = n)) x <- sobolshap_knn(model = sobol.fun, X = X, U = 1, method = "rank") print(x) library(ggplot2) ggplot(x) # We can use the output sample generated for this estimation to compute # total indices without additional calls to the model x2 <- sobolshap_knn(model = NULL, X = X, U = 0, method = "knn", n.knn = 5) tell(x2,x$y) ggplot(x2) # Test case: the Ishigami function # Example with given data and the use of approximate nearest neighbour search library(RANN) n <- 5000 X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n)) Y <- ishigami.fun(X) x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, n.limit = 2000) tell(x,Y) library(ggplot2) ggplot(x) # We can also extract first-order and total Sobol indices x1 <- extract(x) print(x1) # Test case : Linear model (3 Gaussian inputs including 2 dependent) with scaling # See Iooss and Prieur (2019) library(mvtnorm) # Multivariate Gaussian variables library(whitening) # For scaling modlin <- function(X) apply(X,1,sum) d <- 3 n <- 10000 mu <- rep(0,d) sig <- c(1,1,2) ro <- 0.9 Cormat <- matrix(c(1,0,0,0,1,ro,0,ro,1),d,d) Covmat <- ( sig %*% t(sig) ) * Cormat Xall <- function(n) mvtnorm::rmvnorm(n,mu,Covmat) X <- Xall(n) x <- sobolshap_knn(model = modlin, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, rescale = TRUE, n.limit = 2000) print(x) # Test case: functional toy fct 'Arctangent temporal function' n <- 3000 X <- data.frame(matrix(runif(2*n,-7,7), nrow = n)) Y <- atantemp.fun(X) x <- sobolshap_knn(model = NULL, X = X, U = NULL, method = "knn", n.knn = 5, return.shap = TRUE, n.limit = 2000) tell(x,Y) library(ggplot2) library(reshape2) ggplot(x, type.multout="lines")
Determines the Si coefficient for singular parameters through B-spline smoothing with roughness penalty.
sobolSmthSpl(Y, X)
sobolSmthSpl(Y, X)
Y |
vector of model responses. |
X |
matrix having as rows the input vectors corresponding to the responses in Y. |
WARNING: This function can give bad results for reasons that have not been yet investigated.
sobolSmthSpl returns a list of class "sobolSmthSpl" containing the following components:
call |
the matched call. |
X |
the provided input matrix. |
Y |
the provided matrix of model responses. |
S |
a matrix having the following columns: Si (the estimated first order Sobol' indices), Si.e (the standard errors for the estimated first order Sobol' indices) and q0.05 (the 0.05 quantiles assuming for the Si indices Normal distributions centred on the Si estimates and with standard deviations the calculated standard errors) |
Filippo Monari
Saltelli, A; Ratto, M; Andres, T; Campolongo, F; Cariboni, J; Gatelli, D; Saisana, M & Tarantola, S. Global Sensitivity Analysis: The Primer Wiley-Interscience, 2008
M Ratto and A. Pagano, 2010, Using recursive algorithms for the efficient identification of smoothing spline ANOVA models, Advances in Statistical Analysis, 94, 367–388.
X = matrix(runif(5000), ncol = 10) Y = sobol.fun(X) sa = sobolSmthSpl(Y, X) plot(sa)
X = matrix(runif(5000), ncol = 10) Y = sobol.fun(X) sa = sobolSmthSpl(Y, X) plot(sa)
sobolTIIlo
implements the asymptotically efficient formula of Liu and Owen (2006) for the estimation of total interaction indices as described e.g. in Section 3.4 of Fruth et al. (2014). Total interaction indices (TII) are superset indices of pairs of variables, thus give the total influence of each second-order interaction. The total cost of the method is where
is the number
of indices to estimate. Asymptotic confidence intervals are provided. Via
plotFG
(which uses functions of the package igraph
), the TIIs can be visualized in a so-called FANOVA graph as described in section 2.2 of Muehlenstaedt et al. (2012).
sobolTIIlo(model = NULL, X1, X2, conf = 0.95, ...) ## S3 method for class 'sobolTIIlo' tell(x, y = NULL, ...) ## S3 method for class 'sobolTIIlo' print(x, ...) ## S3 method for class 'sobolTIIlo' plot(x, ylim = NULL, ...) ## S3 method for class 'sobolTIIlo' ggplot(data, mapping = aes(), ylim = NULL, ..., environment = parent.frame()) ## S3 method for class 'sobolTIIlo' plotFG(x)
sobolTIIlo(model = NULL, X1, X2, conf = 0.95, ...) ## S3 method for class 'sobolTIIlo' tell(x, y = NULL, ...) ## S3 method for class 'sobolTIIlo' print(x, ...) ## S3 method for class 'sobolTIIlo' plot(x, ylim = NULL, ...) ## S3 method for class 'sobolTIIlo' ggplot(data, mapping = aes(), ylim = NULL, ..., environment = parent.frame()) ## S3 method for class 'sobolTIIlo' plotFG(x)
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
conf |
the confidence level for asymptotic confidence intervals, defaults to 0.95. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
ylim |
optional, the y limits of the plot. |
sobolTIIlo
returns a list of class "sobolTIIlo"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
V |
the estimation of the overall variance. |
tii.unscaled |
the unscaled estimations of the TIIs. |
tii.scaled |
the scaled estimations of the TIIs together with asymptotic confidence intervals. |
Jana Fruth
R. Liu, A. B. Owen, 2006, Estimating mean dimensionality of analysis of variance decompositions, JASA, 101 (474), 712–721.
J. Fruth, O. Roustant, S. Kuhnt, 2014, Total interaction index: A variance-based sensitivity index for second-order interaction screening, J. Stat. Plan. Inference, 147, 212–223.
T. Muehlenstaedt, O. Roustant, L. Carraro, S. Kuhnt, 2012, Data-driven Kriging models based on FANOVA-decomposition, Stat. Comput., 22 (3), 723–738.
# Test case : the Ishigami function # The method requires 2 samples n <- 1000 X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) # sensitivity analysis (the true values of the scaled TIIs are 0, 0.244, 0) x <- sobolTIIlo(model = ishigami.fun, X1 = X1, X2 = X2) print(x) # plot of tiis and FANOVA graph plot(x) library(ggplot2) ggplot(x) library(igraph) plotFG(x)
# Test case : the Ishigami function # The method requires 2 samples n <- 1000 X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) # sensitivity analysis (the true values of the scaled TIIs are 0, 0.244, 0) x <- sobolTIIlo(model = ishigami.fun, X1 = X1, X2 = X2) print(x) # plot of tiis and FANOVA graph plot(x) library(ggplot2) ggplot(x) library(igraph) plotFG(x)
sobolTIIpf
implements the pick-freeze estimation of total interaction indices as described in Section 3.3 of Fruth et al. (2014). Total interaction indices (TII) are superset indices of pairs of variables, thus give the total influence of each second-order interaction. The pick-freeze estimation enables the strategy to reuse evaluations of Saltelli (2002). The total costs are where
is the number of indices to estimate. Via
plotFG
, the TIIs can be visualized in a so-called FANOVA graph as described in section 2.2 of Muehlenstaedt et al. (2012).
sobolTIIpf(model = NULL, X1, X2, ...) ## S3 method for class 'sobolTIIpf' tell(x, y = NULL, ...) ## S3 method for class 'sobolTIIpf' print(x, ...) ## S3 method for class 'sobolTIIpf' plot(x, ylim = NULL, ...) ## S3 method for class 'sobolTIIpf' ggplot(data, mapping = aes(), ylim = NULL, ..., environment = parent.frame()) ## S3 method for class 'sobolTIIpf' plotFG(x)
sobolTIIpf(model = NULL, X1, X2, ...) ## S3 method for class 'sobolTIIpf' tell(x, y = NULL, ...) ## S3 method for class 'sobolTIIpf' print(x, ...) ## S3 method for class 'sobolTIIpf' plot(x, ylim = NULL, ...) ## S3 method for class 'sobolTIIpf' ggplot(data, mapping = aes(), ylim = NULL, ..., environment = parent.frame()) ## S3 method for class 'sobolTIIpf' plotFG(x)
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
ylim |
optional, the y limits of the plot. |
sobolTIIpf
returns a list of class "sobolTIIpf"
, containing all
the input arguments detailed before, plus the following components:
call |
the matched call. |
X |
a |
y |
a vector of model responses. |
V |
the estimation of the overall variance. |
tii.unscaled |
the unscaled estimations of the TIIs together. |
tii.scaled |
the scaled estimations of the TIIs. |
Jana Fruth
J. Fruth, O. Roustant, S. Kuhnt, 2014, Total interaction index: A variance-based sensitivity index for second-order interaction screening, J. Stat. Plan. Inference, 147, 212–223.
A. Saltelli, 2002, Making best use of model evaluations to compute sensitivity indices, Comput. Phys. Commun., 145, 580-297.
T. Muehlenstaedt, O. Roustant, L. Carraro, S. Kuhnt, 2012, Data-driven Kriging models based on FANOVA-decomposition, Stat. Comput., 22 (3), 723–738.
# Test case : the Ishigami function # The method requires 2 samples n <- 1000 X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) # sensitivity analysis (the true values are 0, 0.244, 0) x <- sobolTIIpf(model = ishigami.fun, X1 = X1, X2 = X2) print(x) # plot of tiis and FANOVA graph plot(x) library(ggplot2) ggplot(x) library(igraph) plotFG(x)
# Test case : the Ishigami function # The method requires 2 samples n <- 1000 X1 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) X2 <- data.frame(matrix(runif(3 * n, -pi, pi), nrow = n)) # sensitivity analysis (the true values are 0, 0.244, 0) x <- sobolTIIpf(model = ishigami.fun, X1 = X1, X2 = X2) print(x) # plot of tiis and FANOVA graph plot(x) library(ggplot2) ggplot(x) library(igraph) plotFG(x)
soboltouati
implements the Monte Carlo estimation of
the Sobol' indices for both first-order and total indices using
correlation coefficients-based formulas, at a total cost of
model evaluations.
These are called the Martinez estimators. It also computes their
confidence intervals based on asymptotic properties of empirical
correlation coefficients.
soboltouati(model = NULL, X1, X2, conf = 0.95, ...) ## S3 method for class 'soboltouati' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'soboltouati' print(x, ...) ## S3 method for class 'soboltouati' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'soboltouati' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
soboltouati(model = NULL, X1, X2, conf = 0.95, ...) ## S3 method for class 'soboltouati' tell(x, y = NULL, return.var = NULL, ...) ## S3 method for class 'soboltouati' print(x, ...) ## S3 method for class 'soboltouati' plot(x, ylim = c(0, 1), ...) ## S3 method for class 'soboltouati' ggplot(data, mapping = aes(), ylim = c(0, 1), ..., environment = parent.frame())
model |
a function, or a model with a |
X1 |
the first random sample. |
X2 |
the second random sample. |
conf |
the confidence level for confidence intervals, or zero to avoid their computation if they are not needed. |
x |
a list of class |
data |
a list of class |
y |
a vector of model responses. |
return.var |
a vector of character strings giving further
internal variables names to store in the output object |
ylim |
y-coordinate plotting limits. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
any other arguments for |
This estimator supports missing values (NA or NaN) which can occur during the simulation of the model on the design of experiments (due to code failure) even if Sobol' indices are no more rigorous variance-based sensitivity indices if missing values are present. In this case, a warning is displayed.
soboltouati
returns a list of class "soboltouati"
,
containing all the input arguments detailed before, plus the following
components:
call |
the matched call. |
X |
a |
y |
the response used |
V |
the estimations of normalized variances of the Conditional
Expectations (VCE) with respect to each factor and also with respect
to the complementary set of each factor ("all but |
S |
the estimations of the Sobol' first-order indices. |
T |
the estimations of the Sobol' total sensitivity indices. |
Taieb Touati, Khalid Boumhaout
J-M. Martinez, 2011, Analyse de sensibilite globale par decomposition de la variance, Presentation in the meeting of GdR Ondes and GdR MASCOT-NUM, January, 13th, 2011, Institut Henri Poincare, Paris, France.
T. Touati, 2016, Confidence intervals for Sobol' indices. Proceedings of the SAMO 2016 Conference, Reunion Island, France, December 2016.
T. Touati, 2017, Intervalles de confiance pour les indices de Sobol, 49emes Journees de la SFdS, Avignon, France, Juin 2017.
sobol, sobol2002, sobolSalt, sobol2007, soboljansen, sobolmartinez
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- soboltouati(model = sobol.fun, X1, X2) print(x) plot(x) library(ggplot2) ggplot(x)
# Test case : the non-monotonic Sobol g-function # The method of sobol requires 2 samples # There are 8 factors, all following the uniform distribution # on [0,1] library(boot) n <- 1000 X1 <- data.frame(matrix(runif(8 * n), nrow = n)) X2 <- data.frame(matrix(runif(8 * n), nrow = n)) # sensitivity analysis x <- soboltouati(model = sobol.fun, X1, X2) print(x) plot(x) library(ggplot2) ggplot(x)
This function provides two estimators of a squared expectation. The first one, naive, is the square of the sample mean. It is positively biased. The second one is a U-statistics, and unbiased. The two are equivalent for large sample sizes.
squaredIntEstim(x, method = "unbiased")
squaredIntEstim(x, method = "unbiased")
x |
A vector of observations supposed to be drawn independently from a square integrable random variable |
method |
If "unbiased", computes the U-statistics, otherwise the square of the sample mean is computed |
Let X1, ..., Xn be i.i.d. random variables. The aim is to estimate t = E(Xi)^2. The naive estimator is the square of the sample mean: T1 = [(X1 + ... + Xn)/n]^2. It is positively biased, and the bias is equal to s^2/n, where s^2 = var(X1). The U-statistics estimator is the average of Xi * Xj over all unordered pairs (i,j). Equivalently, it is equal to T1 minus the (unbiased) sample variance divided by n.
A real number, corresponding to the estimated value of the squared integral.
O. Roustant
O. Roustant, F. Gamboa and B. Iooss, Parseval inequalities and lower bounds for variance-based sensitivity indices, Electronic Journal of Statistics, 14:386-412, 2020
Van der Vaart, A. W. Asymptotic statistics. Vol. 3. Cambridge university press, 2000.
n <- 100 # sample size nsim <- 100 # number of simulations mu <- 0 T <- Tunb <- rep(NA, nsim) theta <- mu^2 # E(X)^2, with X following N(mu, 1) for (i in 1:nsim){ x <- rnorm(n, mean = mu, sd = 1) T[i] <- squaredIntEstim(x, method = "biased") Tunb[i] <- squaredIntEstim(x, method = "unbiased") } par(mfrow = c(1, 1)) boxplot(cbind(T, Tunb)) abline(h = theta, col = "red") abline(h = c(mean(T), mean(Tunb)), col = c("blue", "cyan"), lty = "dotted") # look at the difference between median and mean
n <- 100 # sample size nsim <- 100 # number of simulations mu <- 0 T <- Tunb <- rep(NA, nsim) theta <- mu^2 # E(X)^2, with X following N(mu, 1) for (i in 1:nsim){ x <- rnorm(n, mean = mu, sd = 1) T[i] <- squaredIntEstim(x, method = "biased") Tunb[i] <- squaredIntEstim(x, method = "unbiased") } par(mfrow = c(1, 1)) boxplot(cbind(T, Tunb)) abline(h = theta, col = "red") abline(h = c(mean(T), mean(Tunb)), col = c("blue", "cyan"), lty = "dotted") # look at the difference between median and mean
src
computes the Standardized Regression Coefficients
(SRC), or the Standardized Rank Regression Coefficients (SRRC), which
are sensitivity indices based on linear or monotonic assumptions in
the case of independent factors.
src(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'src' print(x, ...) ## S3 method for class 'src' plot(x, ylim = c(-1,1), ...) ## S3 method for class 'src' ggplot(data, mapping = aes(), ylim = c(-1, 1), ..., environment = parent.frame())
src(X, y, rank = FALSE, logistic = FALSE, nboot = 0, conf = 0.95) ## S3 method for class 'src' print(x, ...) ## S3 method for class 'src' plot(x, ylim = c(-1,1), ...) ## S3 method for class 'src' ggplot(data, mapping = aes(), ylim = c(-1, 1), ..., environment = parent.frame())
X |
a data frame (or object coercible by |
y |
a vector containing the responses corresponding to the design of experiments (model output variables). |
rank |
logical. If |
logistic |
logical. If |
nboot |
the number of bootstrap replicates. |
conf |
the confidence level of the bootstrap confidence intervals. |
x |
the object returned by |
data |
the object returned by |
ylim |
the y-coordinate limits of the plot. |
mapping |
Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot. |
environment |
[Deprecated] Used prior to tidy evaluation. |
... |
arguments to be passed to methods, such as graphical
parameters (see |
Logistic regression model (logistic = TRUE
) and rank-based indices
(rank = TRUE
) are incompatible.
src
returns a list of class "src"
, containing the following
components:
call |
the matched call. |
SRC |
a data frame containing the estimations of the SRC
indices, bias and confidence intervals (if |
SRRC |
a data frame containing the estimations of the SRRC
indices, bias and confidence intervals (if |
Gilles Pujol and Bertrand Iooss
L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2023, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Preprint. https://hal.science/hal-04102053
B. Iooss, V. Chabridon and V. Thouvenot, Variance-based importance measures for machine learning model interpretability, Congres lambda-mu23, Saclay, France, 10-13 octobre 2022 https://hal.science/hal-03741384
A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.
# a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1 + X2 + X3 y <- with(X, X1 + X2 + X3) # sensitivity analysis x <- src(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
# a 100-sample with X1 ~ U(0.5, 1.5) # X2 ~ U(1.5, 4.5) # X3 ~ U(4.5, 13.5) library(boot) n <- 100 X <- data.frame(X1 = runif(n, 0.5, 1.5), X2 = runif(n, 1.5, 4.5), X3 = runif(n, 4.5, 13.5)) # linear model : Y = X1 + X2 + X3 y <- with(X, X1 + X2 + X3) # sensitivity analysis x <- src(X, y, nboot = 100) print(x) plot(x) library(ggplot2) ggplot(x)
Function to estimate the first-order and total support index functions (Fruth et al., 2016).
support(model, X, Xnew = NULL, fX = NULL, gradfX = NULL, h = 1e-06, ...)
support(model, X, Xnew = NULL, fX = NULL, gradfX = NULL, h = 1e-06, ...)
model |
a function, or a model with a predict method, defining the model to analyze. |
X |
a random sample. |
Xnew |
an optional set of points where to visualize the support indices. If missing, |
fX |
an optional vector containing the evaluations of |
gradfX |
an optional vector containing the evaluations of the gradient of |
h |
a small number for computing finite differences |
... |
optional arguments to be passed to |
The first-order support index of f(X)
relative to X_i
is the squared conditional expectation of its partial derivative with respect to X_i
.
The total support index of f(X)
relative to X_i
is the conditional expectation of its squared partial derivative with respect to X_i
.
These two functions measure the local influence of X_i
, in the global space of the other input variables.
Up to square transformations, support indices can be viewed as regression curves of partial derivatives df(X)/dX_i
with respect to X_i
.
Estimation is performed by smoothing from the diagonal scatterplots (X_i, df/dX_i)
with the function smooth.spline{stats}
with the default options.
For the sake of comparison, support index functions may be normalized. The proposed normalization is the sum of the DGSM, equal to the sum of the overall means of total support functions.
Normalized support index functions can be plotted with the S3 method plot
, as well as the underlying diagonal scatterplots of derivatives (S3 method scatterplot
).
main |
a matrix whose columns contain the first-order support index functions, estimated at |
total |
a matrix whose columns contain the total support index functions, estimated at |
DGSM |
a vector containing an estimation of DGSM. |
X |
... |
Xnew |
... |
fX |
... |
gradfX |
... see 'arguments' section. |
O. Roustant
J. Fruth, O. Roustant, S. Kuhnt, 2019, Support indices: Measuring the effects of input variables over their support, Reliability Engineering and System Safety, 187:17-27.
S3 methods plot
and scatterplot
: plot.support
# ----------------- # ishigami function # ----------------- n <- 5000 n.points <- 1000 d <- 3 set.seed(0) X <- matrix(runif(d*n, min = -pi, max = pi), n, d) Xnew <- matrix(seq(from = -pi, to = pi, length=n.points), n.points, d) b <- support(model = ishigami.fun, X, Xnew) # plot method (x-axis in probability scale), of the normalized support index functions plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = TRUE, p = 'punif', p.arg = list(min = -pi, max = pi), ylim = c(0, 2)) # below : diagonal scatterplots of the gradient, # on which are based the estimation by smoothing scatterplot(b, xprob = TRUE) # now with normal margins # ----------------------- X <- matrix(rnorm(d*n), n, d) Xnew <- matrix(rnorm(d*n.points), n.points, d) b <- support(model = ishigami.fun, X, Xnew) plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = FALSE) scatterplot(b, xprob = FALSE, type = "histogram", bins = 10, cex = 1, cex.lab = 1.5)
# ----------------- # ishigami function # ----------------- n <- 5000 n.points <- 1000 d <- 3 set.seed(0) X <- matrix(runif(d*n, min = -pi, max = pi), n, d) Xnew <- matrix(seq(from = -pi, to = pi, length=n.points), n.points, d) b <- support(model = ishigami.fun, X, Xnew) # plot method (x-axis in probability scale), of the normalized support index functions plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = TRUE, p = 'punif', p.arg = list(min = -pi, max = pi), ylim = c(0, 2)) # below : diagonal scatterplots of the gradient, # on which are based the estimation by smoothing scatterplot(b, xprob = TRUE) # now with normal margins # ----------------------- X <- matrix(rnorm(d*n), n, d) Xnew <- matrix(rnorm(d*n.points), n.points, d) b <- support(model = ishigami.fun, X, Xnew) plot(b, col = c("lightskyblue4", "lightskyblue1", "black"), xprob = FALSE) scatterplot(b, xprob = FALSE, type = "histogram", bins = 10, cex = 1, cex.lab = 1.5)
template.replace
replaces keys within special markups with
values in a so-called template file. Pieces of R code can be put into
the markups of the template file, and are evaluated during the
replacement.
template.replace(text, replacement, eval = FALSE, key.pattern = NULL, code.pattern = NULL)
template.replace(text, replacement, eval = FALSE, key.pattern = NULL, code.pattern = NULL)
text |
vector of character strings, the template text. |
replacement |
the list values to replace in |
eval |
boolean, |
key.pattern |
custom pattern for key replacement (see below) |
code.pattern |
custom pattern for code replacement (see below) |
In most cases, a computational code reads its inputs from a text file. A template file is like an input file, but where some missing values, identified with generic keys, will be replaced by specific values.
By default, the keys are enclosed into markups of the form $(KEY)
.
Code to be interpreted with R can be put in the template text. Pieces
of code must be enclosed into markups of the form
@{CODE}
. This is useful for example for formating the key
values (see example). For interpreting the code, set eval = TRUE
.
Users can define custom patterns. These patterns must be
perl-compatible regular expressions (see regexpr
.
The default ones are:
key.pattern = "\\$\\(KEY\\)" code.pattern = "@\\{CODE\\}"
Note that special characters have to be escaped both (one for perl, one for R).
Gilles Pujol
txt <- c("Hello $(name)!", "$(a) + $(b) = @{$(a)+$(b)}", "pi = @{format(pi,digits=5)}") replacement <- list(name = "world", a = 1, b = 2) # 1. without code evaluation: txt.rpl1 <- template.replace(txt, replacement) print(txt.rpl1) # 2. with code evalutation: txt.rpl2 <- template.replace(txt, replacement, eval = TRUE) print(txt.rpl2)
txt <- c("Hello $(name)!", "$(a) + $(b) = @{$(a)+$(b)}", "pi = @{format(pi,digits=5)}") replacement <- list(name = "world", a = 1, b = 2) # 1. without code evaluation: txt.rpl1 <- template.replace(txt, replacement) print(txt.rpl1) # 2. with code evalutation: txt.rpl2 <- template.replace(txt, replacement, eval = TRUE) print(txt.rpl2)
testHSIC
allows to test independence among all input-output pairs after a preliminary sensitivity analysis based on HSIC indices.
testHSIC
takes an object of class sensiHSIC
(produced by a prior call to the function sensiHSIC
that estimates HSIC indices) and it returns the estimated p-values after testing independence among all input-output pairs. For each input-output pair, having access to the p-value helps the user decide whether the null hypothesis : "
and
are independent" must be accepted or rejected. If the kernels selected in
sensiHSIC
are all characteristic, can be rewritten "
" and this paves the way to several test procedures.
Depending on the sample size and the chosen test statistic (either a U-statistic or a V-statistic), there are up to four different methods to test . The asymptotic test is recommended when the sample size
is around a few hundreds (or more). When
is smaller, a permutation-based test must be considered instead. As a general rule, permutation-based tests can always be applied but a much heavier computational load is to be expected. However, if HSIC indices were initially estimated with V-statistics, the Gamma test is a parametric method that offers an enticing tradeoff.
testHSIC(sensi, test.method = "Asymptotic", B = 3000, seq.options = list(criterion = "screening", alpha = 0.05, Bstart = 200, Bfinal = 5000, Bbatch = 100, Bconv = 200, graph = TRUE) ) ## S3 method for class 'testHSIC' print(x, ...) ## S3 method for class 'testHSIC' plot(x, ylim = c(0, 1), err, ...)
testHSIC(sensi, test.method = "Asymptotic", B = 3000, seq.options = list(criterion = "screening", alpha = 0.05, Bstart = 200, Bfinal = 5000, Bbatch = 100, Bconv = 200, graph = TRUE) ) ## S3 method for class 'testHSIC' print(x, ...) ## S3 method for class 'testHSIC' plot(x, ylim = c(0, 1), err, ...)
sensi |
An object of class |
test.method |
A string specifying the numerical procedure used to estimate the p-values of the HSIC-based independence tests. Available procedure include
|
B |
Number of random permutations carried out on the output samples before the non-parametric estimation of p-values. Only relevant if |
seq.options |
A list of options guiding the sequential procedure.
Only relevant if
|
x |
An object of class |
ylim |
A vector of two values specifying the y-coordinate plotting limits. |
err |
A scalar value (between |
... |
Additional options. |
For a given input-output pair of variables, the Hilbert-Schmidt independence criterion (HSIC) is a dissimilarity measure between the joint bivariate distribution and the product of marginal distributions. Dissimilarity between those two distributions is measured through the squared norm of the distance between their respective embeddings in a reproducing kernel Hilbert space (RKHS) that directly depends on the selected input kernel and the selected output kernel
.
It must always be kept in mind that this criterion allows to detect independence within the pair provided that the two kernels are characteristic.
If both kernels are characteristic, : "
and
are independent" is equivalent to
: "
" and any estimator of
emerges as a relevant test statistic.
If they are not, testing : "
" is no longer sufficient for testing
: "
and
are independent".
The reader is referred to Fukumizu et al. (2004) for the mathematical definition of a characteristic kernel and to Sriperumbur et al. (2010) for an overview of the major related results.
Responsability for kernel selection is left to the user while calling the function sensiHSIC
. Let us simply recall that:
The Gaussian kernel, the exponential kernel, the Matern kernel and the Matern
kernel (all defined on
) are characteristic. They remain characteristic when they are restricted to a compact domain
within
.
The transformed versions of the four abovementioned kernels (all defined on ) are characteristic.
All Sobolev kernels (defined on ) are characteristic.
The categorical kernel (defined on any discrete probability space) is characteristic.
The test statistic for the pair is either the U-statistic or the V-statistic associated to
.
If a V-statistic was used in sensiHSIC
, four different test methods can be considered.
The asymptotic test can be used if the sample size is large enough (at least a hundred of samples). The asymptotic distribution of the test statistic is approximated by a Gamma distribution whose parameters are estimated with the method of moments. See Gretton et al. (2007) for more details about how to estimate the first two moments of the asymptotic Gamma distribution.
The permutation-based test is more expensive in terms of computational cost but it can be used whatever the sample size is. The initial output samples (stored in the object of class
sensiHSIC
) are randomly permuted times and the test statistic is recomputed as many times. This allows to simulate
observations of the test statistic under
and to estimate the p-value in a non-parametric way. See Meynaoui (2019) for more details on how to correctly estimate the p-value in order to preserve the expected level of the test.
The sequential permutation-based test is a goal-oriented variant of the previous test. The main idea is to reduce the computational cost by stopping permutations as soon as the estimation of the p-value has sufficiently converged so that it can be compared to a reference threshold or be given a final ranking. See El Amri and Marrel (2022) for more details on how to implement this sequential approach for the three stopping criteria (namely "ranking"
, "screening"
or "both"
).
The Gamma test is a parametric alternative to permutation-based tests when is not large enough to resort to the asymptotic test. The permutation-based test reveals the test statistic under
follows a unimodal distribution having significant positive skewness. Thus, it seems quite natural to estimate the p-value with a Gamma distribution, especially in view of the fact that the asymptotic distribution is properly approximated by this parametric family. See El Amri and Marrel (2021) for more details on how to estimate the parameters of the Gamma distribution with the method of moments. In particular, the first two moments of the test statistic under
are computed thanks to the formulas that were initially provided in Kazi-Aoual et al. (1995).
If a U-statistic was used in sensiHSIC
, the estimated value of may be negative.
The asymptotic test can no longer be conducted with a Gamma distribution (whose support is limited to ). It is replaced by a Pearson III distribution (which is a left-shifted Gamma distribution).
The permutation-based test and the sequential permutation-based test can be applied directly.
The Gamma test has no longer any theoretical justification.
In Marrel and Chabridon (2021), HSIC indices were adapted to target sensitivity analysis (thus becoming T-HSIC indices) and to conditional sensitivity analysis (thus becoming C-HSIC indices). Tests of independence can still be useful after estimating T-HSIC indices or C-HSIC indices.
For T-HSIC indices, the null hypothesis is : "
and
are independent" where
is the weight function selected in
target
and passed to the function . Everything works just as for basic HSIC indices (apart from the fact that
is applied on the original output variable
). Available test methods include
"Asymptotic"
, "Permutation"
, "Seq_Permutation"
and "Gamma"
(for V-statistics only).
For C-HSIC indices, the null hypothesis is : "
and
are independent if the event described in
cond
occurs". In this specific context, testing conditional independence is only relevant if the weight function is an indicator function. For this reason, if conditional independence has to be tested, the user must select type="indicTh"
in cond
while calling the function sensiHSIC
. Let us recall that only V-statistic estimators can be used for C-HSIC indices. As a result, available test methods include "Asymptotic"
, "Permutation"
, "Seq_Permutation"
and "Gamma"
.
testHSIC
returns a list of class "testHSIC"
. It contains test.method
, B
(for the permutation-based test), seq.options
(for the sequential permutation-based test) and the following objects:
call |
The matched call. |
pval |
The estimated p-values after testing independence for all input-output pairs. |
prop |
A vector of two strings.
|
family |
Only if |
param |
Only if |
Hperm |
Only if |
paths |
Only if |
Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.
El Amri, M. R. and Marrel, A. (2022), Optimized HSIC-based tests for sensitivity analysis: application to thermalhydraulic simulation of accidental scenario on nuclear reactor, Quality and Reliability Engineering International, 38(3), 1386-1403.
El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https://cea.hal.science/cea-03406956/
Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.
Gretton, A., Fukumizu, K., Teo, C., Song, L., Scholkopf, B. and Smola, A. (2007), A kernel statistical test of independence, Advances in Neural Information Processing Systems, 20.
Kazi-Aoual, F., Hitier, S., Sabatier, R. and Lebreton, J. D. (1995), Refined approximations to permutation tests for multivariate inference, Computational Statistics & Data Analysis, 20(6), 643-656.
Marrel, A. and Chabridon, V. (2021), Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214, 107711.
Meynaoui, A. (2019), New developments around dependence measures for sensitivity analysis: application to severe accident studies for generation IV reactors (Doctoral dissertation, INSA de Toulouse).
Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2010), On the relation between universality, characteristic kernels and RKHS embedding of measures, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 773-780). JMLR Workshop and Conference Proceedings.
# Test case: the Ishigami function. n <- 20 # very few input-output samples p <- 3 # nb of input variables ######################################## ### PRELIMINARY SENSITIVITY ANALYSIS ### ######################################## X <- matrix(runif(n*p), n, p) sensi <- sensiHSIC(model=ishigami.fun, X) print(sensi) plot(sensi) title("GSA for the Ishigami function") ############################# ### TESTS OF INDEPENDENCE ### ############################# test.asymp <- testHSIC(sensi) test.perm <- testHSIC(sensi, test.method="Permutation") test.seq.screening <- testHSIC(sensi, test.method="Seq_Permutation") test.seq.ranking <- testHSIC(sensi, test.method="Seq_Permutation", seq.options=list(criterion="ranking")) test.seq.both <- testHSIC(sensi, test.method="Seq_Permutation", seq.options=list(criterion="both")) test.gamma <- testHSIC(sensi, test.method="Gamma") # comparison of p-values res <- rbind( t(as.matrix(test.asymp$pval)), t(as.matrix(test.perm$pval)), t(as.matrix(test.seq.screening$pval)), t(as.matrix(test.seq.ranking$pval)), t(as.matrix(test.seq.both$pval)), t(as.matrix(test.gamma$pval)) ) rownames(res) <- c("asymp", "perm", "seq_perm_screening", "seq_perm_ranking", "seq_perm_both", "gamma") res # Conclusion: n is too small for the asymptotic test. # Take n=200 and all four test methods will provide very close p-values. ##################### ### VISUALIZATION ### ##################### # simulated values of HSIC indices under H0 (random permutations) Hperm <- t(unname(test.perm$Hperm)) for(i in 1:p){ # histogram of the test statistic under H0 (random permutations) title <- paste0("Histogram of S", i, " = HSIC(X", i, ",Y)") hist(Hperm[,i], probability=TRUE, nclass=70, main=title, xlab="", ylab="", col="cyan") # asymptotic Gamma distribution shape.asymp <- test.asymp$param[i, "shape"] scale.asymp <- test.asymp$param[i, "scale"] xx <- seq(0, max(Hperm[,i]), length.out=200) dens.asymp <- dgamma(xx, shape=shape.asymp, scale=scale.asymp) lines(xx, dens.asymp, lwd=2, col="darkorchid") # finite-sample Gamma distribution shape.perm <- test.gamma$param[i, "shape"] scale.perm <- test.gamma$param[i, "scale"] dens.perm <- dgamma(xx, shape=shape.perm, scale=scale.perm) lines(xx, dens.perm, lwd=2, col="blue") all.cap <- c("Asymptotic Gamma distribution", "Finite-sample Gamma distribution") all.col <- c("darkorchid", "blue") legend("topright", legend=all.cap, col=all.col, lwd=2, y.intersp=1.3) }
# Test case: the Ishigami function. n <- 20 # very few input-output samples p <- 3 # nb of input variables ######################################## ### PRELIMINARY SENSITIVITY ANALYSIS ### ######################################## X <- matrix(runif(n*p), n, p) sensi <- sensiHSIC(model=ishigami.fun, X) print(sensi) plot(sensi) title("GSA for the Ishigami function") ############################# ### TESTS OF INDEPENDENCE ### ############################# test.asymp <- testHSIC(sensi) test.perm <- testHSIC(sensi, test.method="Permutation") test.seq.screening <- testHSIC(sensi, test.method="Seq_Permutation") test.seq.ranking <- testHSIC(sensi, test.method="Seq_Permutation", seq.options=list(criterion="ranking")) test.seq.both <- testHSIC(sensi, test.method="Seq_Permutation", seq.options=list(criterion="both")) test.gamma <- testHSIC(sensi, test.method="Gamma") # comparison of p-values res <- rbind( t(as.matrix(test.asymp$pval)), t(as.matrix(test.perm$pval)), t(as.matrix(test.seq.screening$pval)), t(as.matrix(test.seq.ranking$pval)), t(as.matrix(test.seq.both$pval)), t(as.matrix(test.gamma$pval)) ) rownames(res) <- c("asymp", "perm", "seq_perm_screening", "seq_perm_ranking", "seq_perm_both", "gamma") res # Conclusion: n is too small for the asymptotic test. # Take n=200 and all four test methods will provide very close p-values. ##################### ### VISUALIZATION ### ##################### # simulated values of HSIC indices under H0 (random permutations) Hperm <- t(unname(test.perm$Hperm)) for(i in 1:p){ # histogram of the test statistic under H0 (random permutations) title <- paste0("Histogram of S", i, " = HSIC(X", i, ",Y)") hist(Hperm[,i], probability=TRUE, nclass=70, main=title, xlab="", ylab="", col="cyan") # asymptotic Gamma distribution shape.asymp <- test.asymp$param[i, "shape"] scale.asymp <- test.asymp$param[i, "scale"] xx <- seq(0, max(Hperm[,i]), length.out=200) dens.asymp <- dgamma(xx, shape=shape.asymp, scale=scale.asymp) lines(xx, dens.asymp, lwd=2, col="darkorchid") # finite-sample Gamma distribution shape.perm <- test.gamma$param[i, "shape"] scale.perm <- test.gamma$param[i, "scale"] dens.perm <- dgamma(xx, shape=shape.perm, scale=scale.perm) lines(xx, dens.perm, lwd=2, col="blue") all.cap <- c("Asymptotic Gamma distribution", "Finite-sample Gamma distribution") all.col <- c("darkorchid", "blue") legend("topright", legend=all.cap, col=all.col, lwd=2, y.intersp=1.3) }
These functions are standard testcases for sensitivity analysis benchmarks. For a scalar output (see Saltelli et al. 2000 and https://www.sfu.ca/~ssurjano/):
the g-function of Sobol' with 8 inputs, X ~ U[0,1];
the function of Ishigami with 3 inputs, X ~ U[-pi,pi];
the function of Morris with 20 inputs, X ~ U[0,1];
the Linkletter decreasing coefficients function, X ~ U[0,1] (Linkletter et al. (2006));
the heterdisc function with 4 inputs, X ~ U[0,20];
the Friedman function with 5 inputs, X ~ U[0,1] (Friedman, 1991);
the Matyas function with 2 inputs, X ~ U[0,1].
For functional output cases:
the Arctangent temporal function with 2 inputs, X ~ U[-7,7] (Auder, 2011). The functional support is on [0,2pi];
the Cambell1D function with 4 inputs, X ~U[-1,5] (Campbell et al. 2006). The functional support is on [-90,90].
sobol.fun(X) ishigami.fun(X) morris.fun(X) atantemp.fun(X, q = 100) campbell1D.fun(X, theta = -90:90) linkletter.fun(X) heterdisc.fun(X) friedman.fun(X) matyas.fun(X)
sobol.fun(X) ishigami.fun(X) morris.fun(X) atantemp.fun(X, q = 100) campbell1D.fun(X, theta = -90:90) linkletter.fun(X) heterdisc.fun(X) friedman.fun(X) matyas.fun(X)
X |
a matrix (or |
q |
for the atantemp() function: the number of discretization steps of the functional output |
theta |
for the campbell1D() function: the discretization steps (angles in degrees) |
A vector of function responses.
Gilles Pujol and Bertrand Iooss
A. Saltelli, K. Chan and E. M. Scott eds, 2000, Sensitivity Analysis, Wiley.
# Examples for the functional toy fonctions # atantemp function y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2)) plot(y0[1,],type="l") apply(y0,1,lines) n <- 100 X <- matrix(c(runif(2*n,-7,7)),ncol=2) y <- atantemp.fun(X) plot(y0[2,],ylim=c(-2,2),type="l") apply(y,1,lines) # campbell1D function N1=100 # nombre de simulations pour courbes 1D min=-1 ; max=5 nominal=(max+min)/2 X1 = NULL ; y1 = NULL Xnom=matrix(nominal,nr=1,nc=4) ynom=campbell1D.fun(Xnom,theta=-90:90) plot(ynom,ylim=c(8,30),type="l",col="red") for (i in 1:N1){ X=matrix(runif(4,min=min,max=max),nr=1,nc=4) rbind(X1,X) y=campbell1D.fun(X,theta=-90:90) rbind(y1,y) lines(y) }
# Examples for the functional toy fonctions # atantemp function y0 <- atantemp.fun(matrix(c(-7,0,7,-7,0,7),ncol=2)) plot(y0[1,],type="l") apply(y0,1,lines) n <- 100 X <- matrix(c(runif(2*n,-7,7)),ncol=2) y <- atantemp.fun(X) plot(y0[2,],ylim=c(-2,2),type="l") apply(y,1,lines) # campbell1D function N1=100 # nombre de simulations pour courbes 1D min=-1 ; max=5 nominal=(max+min)/2 X1 = NULL ; y1 = NULL Xnom=matrix(nominal,nr=1,nc=4) ynom=campbell1D.fun(Xnom,theta=-90:90) plot(ynom,ylim=c(8,30),type="l",col="red") for (i in 1:N1){ X=matrix(runif(4,min=min,max=max),nr=1,nc=4) rbind(X1,X) y=campbell1D.fun(X,theta=-90:90) rbind(y1,y) lines(y) }
dnorm.trunc
, pnorm.trunc
, qnorm.trunc
and
rnorm.trunc
are functions for the Truncated Normal Distribution.
dgumbel.trunc
, pgumbel.trunc
, qgumbel.trunc
and
rgumbel.trunc
are functions for the Truncated Gumbel Distribution.
dnorm.trunc(x, mean = 0, sd = 1, min = -1e6, max = 1e6) pnorm.trunc(q, mean = 0, sd = 1, min = -1e6, max = 1e6) qnorm.trunc(p, mean = 0, sd = 1, min = -1e6, max = 1e6) rnorm.trunc(n, mean = 0, sd = 1, min = -1e6, max = 1e6) dgumbel.trunc(x, loc = 0, scale = 1, min = -1e6, max = 1e6) pgumbel.trunc(q, loc = 0, scale = 1, min = -1e6, max = 1e6) qgumbel.trunc(p, loc = 0, scale = 1, min = -1e6, max = 1e6) rgumbel.trunc(n, loc = 0, scale = 1, min = -1e6, max = 1e6)
dnorm.trunc(x, mean = 0, sd = 1, min = -1e6, max = 1e6) pnorm.trunc(q, mean = 0, sd = 1, min = -1e6, max = 1e6) qnorm.trunc(p, mean = 0, sd = 1, min = -1e6, max = 1e6) rnorm.trunc(n, mean = 0, sd = 1, min = -1e6, max = 1e6) dgumbel.trunc(x, loc = 0, scale = 1, min = -1e6, max = 1e6) pgumbel.trunc(q, loc = 0, scale = 1, min = -1e6, max = 1e6) qgumbel.trunc(p, loc = 0, scale = 1, min = -1e6, max = 1e6) rgumbel.trunc(n, loc = 0, scale = 1, min = -1e6, max = 1e6)
x , q
|
vector of quantiles |
p |
vector of probabilities |
n |
number of observations |
mean , sd
|
means and standard deviation parameters |
loc , scale
|
location and scale parameters |
min |
vector of minimal bound values |
max |
vector of maximal bound values |
See dnorm
for details on the Normal distribution.
The Gumbel distribution comes from the evd package.
See dgumbel
for details on the Gumbel distribution.
dnorm.trunc
and dgumbel.trunc
give the density, pnorm
and pgumbel.trunc
give the distribution function, qnorm
and qgumbel.trunc
give the quantile function, rnorm
and rgumbel.trunc
generate random deviates.
Gilles Pujol and Bertrand Iooss
Transformation function of one variable (vector sample)
weightTSA(Y, c, upper = TRUE, type="indicTh", param=1)
weightTSA(Y, c, upper = TRUE, type="indicTh", param=1)
Y |
The output vector |
c |
The threshold |
upper |
TRUE for upper threshold and FALSE for lower threshold |
type |
The weight function type ("indicTh", "zeroTh", logistic", "exp1side"):
|
param |
The parameter value for "logistic" and "exp1side" types |
The weight functions depend on a threshold and/or a smooth relaxation. These functions are defined as follows
if type = "indicTh": (upper threshold) and
(lower threshold),
if type = "zeroTh": (upper threshold) and
(lower threshold),
if type = "logistic":
(upper threshold) and
(lower threshold),
if type = "exp1side":
(upper threshold) and
(lower threshold), where is an estimation of the standard deviation of Y and
is a parameter tuning the smoothness.
The vector sample of the transformed variable
B. Iooss
H. Raguet and A. Marrel, Target and conditional sensitivity analysis with emphasis on dependence measures, Preprint, https://hal.archives-ouvertes.fr/hal-01694129
A. Marrel and V. Chabridon, 2021, Statistical developments for target and conditional sensitivity analysis: Application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214:107711.
A. Spagnol, Kernel-based sensitivity indices for high-dimensional optimization problems, PhD Thesis, Universite de Lyon, 2020
Spagnol A., Le Riche R., Da Veiga S. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA J. Uncertainty Quantification, 7(2), 417–443.
n <- 100 # sample size c <- 1.5 Y <- rnorm(n) Yt <- weightTSA(Y, c)
n <- 100 # sample size c <- 1.5 Y <- rnorm(n) Yt <- weightTSA(Y, c)