Package 'RVCompare' reference manual

Title:	Compare Real Valued Random Variables
Description:	A framework with tools to compare two random variables via stochastic dominance. See the README.md at <https://github.com/EtorArza/RVCompare> for a quick start guide. It can compute the Cp and Cd of two probability distributions and the Cumulative Difference Plot as explained in E. Arza (2022) <doi:10.1080/10618600.2022.2084405>. Uses bootstrap or DKW-bounds to compute the confidence bands of the cumulative distributions. These two methods are described in B. Efron. (1979) <doi:10.1214/aos/1176344552> and P. Massart (1990) <doi:10.1214/aop/1176990746>.
Authors:	Etor Arza
Maintainer:	Etor Arza <[email protected]>
License:	CC0
Version:	0.1.8
Built:	2025-03-08 05:32:17 UTC
Source:	https://github.com/etorarza/rvcompare

The dominance rate of X_A over X_B given the density functions.

Description

Returns a real number in the interval [0,1] that represents the dominance rate of X_A over X_B. Basically, we are measuring the amount of mass of X_A in which the cumulative distribution of X_A is higher minus the amount of mass of X_B in which the cumulative distribution of X_B is higher.

Usage

CdFromDensities(densityX_A, densityX_B, xlims, EPSILON = 0.001)
CdFromDensities(densityX_A, densityX_B, xlims, EPSILON = 0.001)

Arguments

`densityX_A`	The probability density function of the random variable X_A.
`densityX_B`	The probability density function of the random variable X_B.
`xlims`	an interval that represents the domain of definition the density functions.
`EPSILON`	(optional, default = 1e-3) minimum difference between two values.

Value

Returns the dominance rate of X_A over X_B.

Examples


# If two symmetric distributions are centered in the same point (x = 0 in
# this case), then their Cd will be 0.5.

densityX_A <- normalDensity(0,1)
densityX_B <- uniformDensity(c(-2,2))
CdFromDensities(densityX_A, densityX_B, c(-5,5))

### Example 2 ###
# If two distributions are equal, Cd will be 0.5.  Cd(X_A,X_A) = 0.5
CdFromDensities(densityX_A, densityX_A, c(-10,10))


### Example 3 ###
# example on https://etorarza.github.io/pages/2021-interactive-comparing-RV.html
tau <- 0.11
densityX_A <- normalDensity(0.05,0.0015)
densityX_B <- mixtureDensity(c(normalDensity(0.05025,0.0015),
                               normalDensity(0.04525, 0.0015)),
                               weights = c(1 - tau, tau))
plot(densityX_A, from=0.03, to=0.07, type="l",  col="red", xlab="x", ylab="probability density")
curve(densityX_B, add=TRUE, col="blue", type="l", lty=2)
Cd <- CdFromDensities(densityX_A, densityX_B, c(.03,.07))
mtext(paste("Cd(X_A, X_B) =", format(round(Cd, 3), nsmall = 3)), side=3) # add Cd to plot as text
legend(x = c(0.0325, 0.045), y = c(200, 250),legend=c("X_A", "X_B"),
                                             col=c("red", "blue"),
                                             lty=1:2,
                                             cex=0.8) # add legend


### Example 4 ###
# The dominance factor ignores the mass of the probability where the
# distribution functinos are equal.
densityX_A <- uniformDensity(c(0.1, 0.3))
densityX_B <- uniformDensity(c(-0.2,0.5))
CdFromDensities(densityX_A, densityX_B, xlims = c(-2,2))

densityX_A <- mixtureDensity(c(uniformDensity(c(0.1,0.3)), uniformDensity(c(-1,-0.5))))
densityX_B <- mixtureDensity(c(uniformDensity(c(-0.2,0.5)), uniformDensity(c(-1,-0.5))))
CdFromDensities(densityX_A, densityX_B, xlims = c(-2,2))

# If two symmetric distributions are centered in the same point (x = 0 in
# this case), then their Cd will be 0.5.

densityX_A <- normalDensity(0,1)
densityX_B <- uniformDensity(c(-2,2))
CdFromDensities(densityX_A, densityX_B, c(-5,5))

### Example 2 ###
# If two distributions are equal, Cd will be 0.5.  Cd(X_A,X_A) = 0.5
CdFromDensities(densityX_A, densityX_A, c(-10,10))


### Example 3 ###
# example on https://etorarza.github.io/pages/2021-interactive-comparing-RV.html
tau <- 0.11
densityX_A <- normalDensity(0.05,0.0015)
densityX_B <- mixtureDensity(c(normalDensity(0.05025,0.0015),
                               normalDensity(0.04525, 0.0015)),
                               weights = c(1 - tau, tau))
plot(densityX_A, from=0.03, to=0.07, type="l",  col="red", xlab="x", ylab="probability density")
curve(densityX_B, add=TRUE, col="blue", type="l", lty=2)
Cd <- CdFromDensities(densityX_A, densityX_B, c(.03,.07))
mtext(paste("Cd(X_A, X_B) =", format(round(Cd, 3), nsmall = 3)), side=3) # add Cd to plot as text
legend(x = c(0.0325, 0.045), y = c(200, 250),legend=c("X_A", "X_B"),
                                             col=c("red", "blue"),
                                             lty=1:2,
                                             cex=0.8) # add legend


### Example 4 ###
# The dominance factor ignores the mass of the probability where the
# distribution functinos are equal.
densityX_A <- uniformDensity(c(0.1, 0.3))
densityX_B <- uniformDensity(c(-0.2,0.5))
CdFromDensities(densityX_A, densityX_B, xlims = c(-2,2))

densityX_A <- mixtureDensity(c(uniformDensity(c(0.1,0.3)), uniformDensity(c(-1,-0.5))))
densityX_B <- mixtureDensity(c(uniformDensity(c(-0.2,0.5)), uniformDensity(c(-1,-0.5))))
CdFromDensities(densityX_A, densityX_B, xlims = c(-2,2))

The dominance rate of X_A over X_B for discrete distributions, given the probability mass functions.

Description

Returns a real number in the interval [0,1] that represents the dominance rate of X_A over X_B.

Usage

CdFromProbMassFunctions(pMassA, pMassB)
CdFromProbMassFunctions(pMassA, pMassB)

Arguments

`pMassA`	The probability mass function where pMassA[[i]] is the probability of x_i, p_A(x_i).
`pMassB`	The probability mass function where pMassB[[i]] is the probability of x_i, p_B(x_i).

Value

Returns the dominance rate of X_A over X_B for discrete random variables.

Examples

CdFromProbMassFunctions(c(0.2,0.6,0.2), c(0.3,0.3,0.4))
# > 0.6
# Notice how adding additional mass with the same cumulative distribution in both
# random variables does not change the result.
CdFromProbMassFunctions(c(0.2,0.6,0.2,0.2,0.2)/1.4, c(0.3,0.3,0.4,0.2,0.2)/1.4)
# > 0.6
CdFromProbMassFunctions(c(0.2,0.6,0.2), c(0.3,0.3,0.4))
# > 0.6
# Notice how adding additional mass with the same cumulative distribution in both
# random variables does not change the result.
CdFromProbMassFunctions(c(0.2,0.6,0.2,0.2,0.2)/1.4, c(0.3,0.3,0.4,0.2,0.2)/1.4)
# > 0.6

The probability that X_A < X_B given the density functions.

Description

Returns a real number in the interval [0,1] that represents the probability that a sample observed from X_A is lower than a sample observed from X_B.

Usage

CpFromDensities(densityX_A, densityX_B, xlims)
CpFromDensities(densityX_A, densityX_B, xlims)

Arguments

`densityX_A`	The probability density function of the random variable X_A.
`densityX_B`	The probability density function of the random variable X_B.
`xlims`	an interval that represents the domain of definition the density functions.

Value

Returns the probability that X_A < X_B.

Examples

### Example 1 ###
# If two symmetric distributions are centered in the same point (x = 0 in
# this case), then their Cp will be 0.5.
densityX_A <- normalDensity(0,1)
densityX_B <- uniformDensity(c(-2,2))
Cp = CpFromDensities(densityX_A, densityX_B, c(-5,5))
plot(densityX_A, from=-5, to=5, type="l",  col="red", xlab="x", ylab="probability density")
curve(densityX_B, add=TRUE, col="blue", type="l", lty=2)
mtext(paste("Cp(X_A, X_B) =", format(round(Cp, 3), nsmall = 3)), side=3) # add Cp to plot as text
legend(x = c(-4.5, -2), y = c(0.325, 0.4),legend=c("X_A", "X_B"),
                                          col=c("red", "blue"),
                                          lty=1:2, cex=0.8) # add legend


### Example 2 ###
# If two distributions are equal, Cp will be 0.5.  Cp(X_A,X_A) = 0.5
CpFromDensities(densityX_A, densityX_A, c(-10,10))


### Example 3 ###
densityX_A <- normalDensity(-2,1)
densityX_B <- uniformDensity(c(-2,2))
# Cp(X_A,X_B) = 1 - Cp(X_B, X_A)
CpFromDensities(densityX_A, densityX_B, c(-8,4))
1 - CpFromDensities(densityX_B, densityX_A, c(-8,4))
### Example 1 ###
# If two symmetric distributions are centered in the same point (x = 0 in
# this case), then their Cp will be 0.5.
densityX_A <- normalDensity(0,1)
densityX_B <- uniformDensity(c(-2,2))
Cp = CpFromDensities(densityX_A, densityX_B, c(-5,5))
plot(densityX_A, from=-5, to=5, type="l",  col="red", xlab="x", ylab="probability density")
curve(densityX_B, add=TRUE, col="blue", type="l", lty=2)
mtext(paste("Cp(X_A, X_B) =", format(round(Cp, 3), nsmall = 3)), side=3) # add Cp to plot as text
legend(x = c(-4.5, -2), y = c(0.325, 0.4),legend=c("X_A", "X_B"),
                                          col=c("red", "blue"),
                                          lty=1:2, cex=0.8) # add legend


### Example 2 ###
# If two distributions are equal, Cp will be 0.5.  Cp(X_A,X_A) = 0.5
CpFromDensities(densityX_A, densityX_A, c(-10,10))


### Example 3 ###
densityX_A <- normalDensity(-2,1)
densityX_B <- uniformDensity(c(-2,2))
# Cp(X_A,X_B) = 1 - Cp(X_B, X_A)
CpFromDensities(densityX_A, densityX_B, c(-8,4))
1 - CpFromDensities(densityX_B, densityX_A, c(-8,4))

Helper function for from_ranks_to_integrable_values

Description

Helper function for from_ranks_to_integrable_values

Usage

cpp_helper_from_ranks_to_integrable_values(rank_interval_mult, j_max)
cpp_helper_from_ranks_to_integrable_values(rank_interval_mult, j_max)

Arguments

`rank_interval_mult`	the value of the normalized density of Y_A.
`j_max`	maximum number of sample points.

Generate the cumulative difference-plot

Description

Generate the cumulative difference-plot given the observed samples using the bootstrap method.

Usage

cumulative_difference_plot(
  X_A_observed,
  X_B_observed,
  isMinimizationProblem,
  labelA = "X_A",
  labelB = "X_B",
  alpha = 0.05,
  EPSILON = 1e-20,
  nOfBootstrapSamples = 1000,
  ignoreMinimumLengthCheck = FALSE
)
cumulative_difference_plot(
  X_A_observed,
  X_B_observed,
  isMinimizationProblem,
  labelA = "X_A",
  labelB = "X_B",
  alpha = 0.05,
  EPSILON = 1e-20,
  nOfBootstrapSamples = 1000,
  ignoreMinimumLengthCheck = FALSE
)

Arguments

`X_A_observed`	array of the observed samples (real values) of X_A.
`X_B_observed`	array of the observed samples (real values) of X_B, it needs to have the same length as X_A.
`isMinimizationProblem`	a boolean value where TRUE represents that lower values are preferred to larger values.
`labelA`	(optional, default value "X_A") the label corresponding to X_A.
`labelB`	(optional, default value "X_B") the label corresponding to X_B.
`alpha`	(optional, default value 0.05) the error of the confidence interval. If alpha = 0.05 then we have 95 percent confidence interval.
`EPSILON`	(optional, default value 1e-20) minimum difference between two values to be considered different.
`nOfBootstrapSamples`	(optional, default value 1e3) how many bootstrap samples to average. Increases computation time.
`ignoreMinimumLengthCheck`	(optional, default value FALSE) whether to skip the check for a minimum length of 100 in X_A_observed and X_A_observed.

Value

retunrs and shows the cumulative difference-plot

Examples


### Example 1 ###
X_A_observed <- rnorm(100, mean = 2, sd = 1)
X_B_observed <- rnorm(100, mean = 2.1, sd = 0.5)

 cumulative_difference_plot(X_A_observed, X_B_observed, TRUE, labelA="X_A", labelB="X_B")



### Example 2 ###
# Comparing the optimization algorithms PL-EDA and PL-GS
# with 400 samples each.
PL_EDA_fitness <- c(
52235, 52485, 52542, 52556, 52558, 52520, 52508, 52491, 52474, 52524,
52414, 52428, 52413, 52457, 52437, 52449, 52534, 52531, 52476, 52434,
52492, 52554, 52520, 52500, 52342, 52520, 52392, 52478, 52422, 52469,
52421, 52386, 52373, 52230, 52504, 52445, 52378, 52554, 52475, 52528,
52508, 52222, 52416, 52492, 52538, 52192, 52416, 52213, 52478, 52496,
52444, 52524, 52501, 52495, 52415, 52151, 52440, 52390, 52428, 52438,
52475, 52177, 52512, 52530, 52493, 52424, 52201, 52484, 52389, 52334,
52548, 52560, 52536, 52467, 52392, 51327, 52506, 52473, 52087, 52502,
52533, 52523, 52485, 52535, 52502, 52577, 52508, 52463, 52530, 52507,
52472, 52400, 52511, 52528, 52532, 52526, 52421, 52442, 52532, 52505,
52531, 52644, 52513, 52507, 52444, 52471, 52474, 52426, 52526, 52564,
52512, 52521, 52533, 52511, 52416, 52414, 52425, 52457, 52522, 52508,
52481, 52439, 52402, 52442, 52512, 52377, 52412, 52432, 52506, 52524,
52488, 52494, 52531, 52471, 52616, 52482, 52499, 52386, 52492, 52484,
52537, 52517, 52536, 52449, 52439, 52410, 52417, 52402, 52406, 52217,
52484, 52418, 52550, 52513, 52530, 51667, 52185, 52089, 51853, 52511,
52051, 52584, 52475, 52447, 52390, 52506, 52514, 52452, 52526, 52502,
52422, 52411, 52171, 52437, 52323, 52488, 52546, 52505, 52563, 52457,
52502, 52503, 52126, 52537, 52435, 52419, 52300, 52481, 52419, 52540,
52566, 52547, 52476, 52448, 52474, 52438, 52430, 52363, 52484, 52455,
52420, 52385, 52152, 52505, 52457, 52473, 52503, 52507, 52429, 52513,
52433, 52538, 52416, 52479, 52501, 52485, 52429, 52395, 52503, 52195,
52380, 52487, 52498, 52421, 52137, 52493, 52403, 52511, 52409, 52479,
52400, 52498, 52482, 52440, 52541, 52499, 52476, 52485, 52294, 52408,
52426, 52464, 52535, 52512, 52516, 52531, 52449, 52507, 52485, 52491,
52499, 52414, 52403, 52398, 52548, 52536, 52410, 52549, 52454, 52534,
52468, 52483, 52239, 52502, 52525, 52328, 52467, 52217, 52543, 52391,
52524, 52474, 52509, 52496, 52432, 52532, 52493, 52503, 52508, 52422,
52459, 52477, 52521, 52515, 52469, 52416, 52249, 52537, 52494, 52393,
52057, 52513, 52452, 52458, 52518, 52520, 52524, 52531, 52439, 52530,
52422, 52649, 52481, 52256, 52428, 52425, 52458, 52488, 52502, 52373,
52426, 52441, 52471, 52468, 52465, 52265, 52455, 52501, 52340, 52457,
52275, 52527, 52574, 52474, 52487, 52416, 52634, 52514, 52184, 52430,
52462, 52392, 52529, 52178, 52495, 52438, 52539, 52430, 52459, 52312,
52437, 52637, 52511, 52563, 52270, 52341, 52436, 52515, 52480, 52569,
52490, 52453, 52422, 52443, 52419, 52512, 52447, 52425, 52509, 52180,
52521, 52566, 52060, 52425, 52480, 52454, 52501, 52536, 52143, 52432,
52451, 52548, 52508, 52561, 52515, 52502, 52468, 52373, 52511, 52516,
52195, 52499, 52534, 52453, 52449, 52431, 52473, 52553, 52444, 52459,
52536, 52413, 52537, 52537, 52501, 52425, 52507, 52525, 52452, 52499
)
PL_GS_fitness <- c(
52476, 52211, 52493, 52484, 52499, 52500, 52476, 52483, 52431, 52483,
52515, 52493, 52490, 52464, 52478, 52440, 52482, 52498, 52460, 52219,
52444, 52479, 52498, 52481, 52490, 52470, 52498, 52521, 52452, 52494,
52451, 52429, 52248, 52525, 52513, 52489, 52448, 52157, 52449, 52447,
52476, 52535, 52464, 52453, 52493, 52438, 52489, 52462, 52219, 52223,
52514, 52476, 52495, 52496, 52502, 52538, 52491, 52457, 52471, 52531,
52488, 52441, 52467, 52483, 52476, 52494, 52485, 52507, 52224, 52464,
52503, 52495, 52518, 52490, 52508, 52505, 52214, 52506, 52507, 52207,
52531, 52492, 52515, 52497, 52476, 52490, 52436, 52495, 52437, 52494,
52513, 52483, 52522, 52496, 52196, 52525, 52490, 52506, 52498, 52250,
52524, 52469, 52497, 52519, 52437, 52481, 52237, 52436, 52508, 52518,
52490, 52501, 52508, 52476, 52520, 52435, 52463, 52481, 52486, 52489,
52482, 52496, 52499, 52443, 52497, 52464, 52514, 52476, 52498, 52496,
52498, 52530, 52203, 52482, 52441, 52493, 52532, 52518, 52474, 52498,
52512, 52226, 52538, 52477, 52508, 52243, 52533, 52463, 52440, 52246,
52209, 52488, 52530, 52195, 52487, 52494, 52508, 52505, 52444, 52515,
52499, 52428, 52498, 52244, 52520, 52463, 52187, 52484, 52517, 52504,
52511, 52530, 52519, 52514, 52532, 52203, 52485, 52439, 52496, 52443,
52503, 52520, 52516, 52478, 52473, 52505, 52480, 52196, 52492, 52527,
52490, 52493, 52252, 52470, 52493, 52533, 52506, 52496, 52519, 52492,
52509, 52530, 52213, 52499, 52492, 52528, 52499, 52526, 52521, 52488,
52485, 52502, 52515, 52470, 52207, 52494, 52527, 52442, 52200, 52485,
52489, 52499, 52488, 52486, 52232, 52477, 52485, 52490, 52524, 52470,
52504, 52501, 52497, 52489, 52152, 52527, 52487, 52501, 52504, 52494,
52484, 52213, 52449, 52490, 52525, 52476, 52540, 52463, 52200, 52471,
52479, 52504, 52526, 52533, 52473, 52475, 52518, 52507, 52500, 52499,
52512, 52478, 52523, 52453, 52488, 52523, 52240, 52505, 52532, 52504,
52444, 52194, 52514, 52474, 52473, 52526, 52437, 52536, 52491, 52523,
52529, 52535, 52453, 52522, 52519, 52446, 52500, 52490, 52459, 52467,
52456, 52490, 52521, 52484, 52508, 52451, 52231, 52488, 52485, 52215,
52493, 52475, 52474, 52508, 52524, 52477, 52514, 52452, 52491, 52473,
52441, 52520, 52471, 52466, 52475, 52439, 52483, 52491, 52204, 52500,
52488, 52489, 52519, 52495, 52448, 52453, 52466, 52462, 52489, 52471,
52484, 52483, 52501, 52486, 52494, 52473, 52481, 52502, 52516, 52223,
52490, 52447, 52222, 52469, 52509, 52194, 52490, 52484, 52446, 52487,
52476, 52509, 52496, 52459, 52474, 52501, 52516, 52223, 52487, 52468,
52534, 52522, 52474, 52227, 52450, 52506, 52193, 52429, 52496, 52493,
52493, 52488, 52190, 52509, 52434, 52469, 52510, 52481, 52520, 52504,
52230, 52500, 52487, 52517, 52473, 52488, 52450, 52203, 52215, 52490,
52479, 52515, 52210, 52485, 52516, 52504, 52521, 52499, 52503, 52526)
# Considering that the LOP is a maximization problem, we need isMinimizationProblem=FALSE.

 cumulative_difference_plot(PL_EDA_fitness,
                            PL_GS_fitness,
                            isMinimizationProblem=FALSE,
                            labelA="PL-EDA",
                            labelB="PL-GS")

### Example 1 ###
X_A_observed <- rnorm(100, mean = 2, sd = 1)
X_B_observed <- rnorm(100, mean = 2.1, sd = 0.5)

 cumulative_difference_plot(X_A_observed, X_B_observed, TRUE, labelA="X_A", labelB="X_B")



### Example 2 ###
# Comparing the optimization algorithms PL-EDA and PL-GS
# with 400 samples each.
PL_EDA_fitness <- c(
52235, 52485, 52542, 52556, 52558, 52520, 52508, 52491, 52474, 52524,
52414, 52428, 52413, 52457, 52437, 52449, 52534, 52531, 52476, 52434,
52492, 52554, 52520, 52500, 52342, 52520, 52392, 52478, 52422, 52469,
52421, 52386, 52373, 52230, 52504, 52445, 52378, 52554, 52475, 52528,
52508, 52222, 52416, 52492, 52538, 52192, 52416, 52213, 52478, 52496,
52444, 52524, 52501, 52495, 52415, 52151, 52440, 52390, 52428, 52438,
52475, 52177, 52512, 52530, 52493, 52424, 52201, 52484, 52389, 52334,
52548, 52560, 52536, 52467, 52392, 51327, 52506, 52473, 52087, 52502,
52533, 52523, 52485, 52535, 52502, 52577, 52508, 52463, 52530, 52507,
52472, 52400, 52511, 52528, 52532, 52526, 52421, 52442, 52532, 52505,
52531, 52644, 52513, 52507, 52444, 52471, 52474, 52426, 52526, 52564,
52512, 52521, 52533, 52511, 52416, 52414, 52425, 52457, 52522, 52508,
52481, 52439, 52402, 52442, 52512, 52377, 52412, 52432, 52506, 52524,
52488, 52494, 52531, 52471, 52616, 52482, 52499, 52386, 52492, 52484,
52537, 52517, 52536, 52449, 52439, 52410, 52417, 52402, 52406, 52217,
52484, 52418, 52550, 52513, 52530, 51667, 52185, 52089, 51853, 52511,
52051, 52584, 52475, 52447, 52390, 52506, 52514, 52452, 52526, 52502,
52422, 52411, 52171, 52437, 52323, 52488, 52546, 52505, 52563, 52457,
52502, 52503, 52126, 52537, 52435, 52419, 52300, 52481, 52419, 52540,
52566, 52547, 52476, 52448, 52474, 52438, 52430, 52363, 52484, 52455,
52420, 52385, 52152, 52505, 52457, 52473, 52503, 52507, 52429, 52513,
52433, 52538, 52416, 52479, 52501, 52485, 52429, 52395, 52503, 52195,
52380, 52487, 52498, 52421, 52137, 52493, 52403, 52511, 52409, 52479,
52400, 52498, 52482, 52440, 52541, 52499, 52476, 52485, 52294, 52408,
52426, 52464, 52535, 52512, 52516, 52531, 52449, 52507, 52485, 52491,
52499, 52414, 52403, 52398, 52548, 52536, 52410, 52549, 52454, 52534,
52468, 52483, 52239, 52502, 52525, 52328, 52467, 52217, 52543, 52391,
52524, 52474, 52509, 52496, 52432, 52532, 52493, 52503, 52508, 52422,
52459, 52477, 52521, 52515, 52469, 52416, 52249, 52537, 52494, 52393,
52057, 52513, 52452, 52458, 52518, 52520, 52524, 52531, 52439, 52530,
52422, 52649, 52481, 52256, 52428, 52425, 52458, 52488, 52502, 52373,
52426, 52441, 52471, 52468, 52465, 52265, 52455, 52501, 52340, 52457,
52275, 52527, 52574, 52474, 52487, 52416, 52634, 52514, 52184, 52430,
52462, 52392, 52529, 52178, 52495, 52438, 52539, 52430, 52459, 52312,
52437, 52637, 52511, 52563, 52270, 52341, 52436, 52515, 52480, 52569,
52490, 52453, 52422, 52443, 52419, 52512, 52447, 52425, 52509, 52180,
52521, 52566, 52060, 52425, 52480, 52454, 52501, 52536, 52143, 52432,
52451, 52548, 52508, 52561, 52515, 52502, 52468, 52373, 52511, 52516,
52195, 52499, 52534, 52453, 52449, 52431, 52473, 52553, 52444, 52459,
52536, 52413, 52537, 52537, 52501, 52425, 52507, 52525, 52452, 52499
)
PL_GS_fitness <- c(
52476, 52211, 52493, 52484, 52499, 52500, 52476, 52483, 52431, 52483,
52515, 52493, 52490, 52464, 52478, 52440, 52482, 52498, 52460, 52219,
52444, 52479, 52498, 52481, 52490, 52470, 52498, 52521, 52452, 52494,
52451, 52429, 52248, 52525, 52513, 52489, 52448, 52157, 52449, 52447,
52476, 52535, 52464, 52453, 52493, 52438, 52489, 52462, 52219, 52223,
52514, 52476, 52495, 52496, 52502, 52538, 52491, 52457, 52471, 52531,
52488, 52441, 52467, 52483, 52476, 52494, 52485, 52507, 52224, 52464,
52503, 52495, 52518, 52490, 52508, 52505, 52214, 52506, 52507, 52207,
52531, 52492, 52515, 52497, 52476, 52490, 52436, 52495, 52437, 52494,
52513, 52483, 52522, 52496, 52196, 52525, 52490, 52506, 52498, 52250,
52524, 52469, 52497, 52519, 52437, 52481, 52237, 52436, 52508, 52518,
52490, 52501, 52508, 52476, 52520, 52435, 52463, 52481, 52486, 52489,
52482, 52496, 52499, 52443, 52497, 52464, 52514, 52476, 52498, 52496,
52498, 52530, 52203, 52482, 52441, 52493, 52532, 52518, 52474, 52498,
52512, 52226, 52538, 52477, 52508, 52243, 52533, 52463, 52440, 52246,
52209, 52488, 52530, 52195, 52487, 52494, 52508, 52505, 52444, 52515,
52499, 52428, 52498, 52244, 52520, 52463, 52187, 52484, 52517, 52504,
52511, 52530, 52519, 52514, 52532, 52203, 52485, 52439, 52496, 52443,
52503, 52520, 52516, 52478, 52473, 52505, 52480, 52196, 52492, 52527,
52490, 52493, 52252, 52470, 52493, 52533, 52506, 52496, 52519, 52492,
52509, 52530, 52213, 52499, 52492, 52528, 52499, 52526, 52521, 52488,
52485, 52502, 52515, 52470, 52207, 52494, 52527, 52442, 52200, 52485,
52489, 52499, 52488, 52486, 52232, 52477, 52485, 52490, 52524, 52470,
52504, 52501, 52497, 52489, 52152, 52527, 52487, 52501, 52504, 52494,
52484, 52213, 52449, 52490, 52525, 52476, 52540, 52463, 52200, 52471,
52479, 52504, 52526, 52533, 52473, 52475, 52518, 52507, 52500, 52499,
52512, 52478, 52523, 52453, 52488, 52523, 52240, 52505, 52532, 52504,
52444, 52194, 52514, 52474, 52473, 52526, 52437, 52536, 52491, 52523,
52529, 52535, 52453, 52522, 52519, 52446, 52500, 52490, 52459, 52467,
52456, 52490, 52521, 52484, 52508, 52451, 52231, 52488, 52485, 52215,
52493, 52475, 52474, 52508, 52524, 52477, 52514, 52452, 52491, 52473,
52441, 52520, 52471, 52466, 52475, 52439, 52483, 52491, 52204, 52500,
52488, 52489, 52519, 52495, 52448, 52453, 52466, 52462, 52489, 52471,
52484, 52483, 52501, 52486, 52494, 52473, 52481, 52502, 52516, 52223,
52490, 52447, 52222, 52469, 52509, 52194, 52490, 52484, 52446, 52487,
52476, 52509, 52496, 52459, 52474, 52501, 52516, 52223, 52487, 52468,
52534, 52522, 52474, 52227, 52450, 52506, 52193, 52429, 52496, 52493,
52493, 52488, 52190, 52509, 52434, 52469, 52510, 52481, 52520, 52504,
52230, 52500, 52487, 52517, 52473, 52488, 52450, 52203, 52215, 52490,
52479, 52515, 52210, 52485, 52516, 52504, 52521, 52499, 52503, 52526)
# Considering that the LOP is a maximization problem, we need isMinimizationProblem=FALSE.

 cumulative_difference_plot(PL_EDA_fitness,
                            PL_GS_fitness,
                            isMinimizationProblem=FALSE,
                            labelA="PL-EDA",
                            labelB="PL-GS")

Estimate Y_A and Y_B bounds with bootstrap

Description

Estimate the confidence intervals for the cumulative distributions of Y_A and Y_B using bootstrap. Much slower than the Dvoretzky–Kiefer–Wolfowitz approach.

Usage

get_Y_AB_bounds_bootstrap(
  X_A_observed,
  X_B_observed,
  alpha = 0.05,
  EPSILON = 1e-20,
  nOfBootstrapSamples = 1000,
  ignoreMinimumLengthCheck = FALSE
)
get_Y_AB_bounds_bootstrap(
  X_A_observed,
  X_B_observed,
  alpha = 0.05,
  EPSILON = 1e-20,
  nOfBootstrapSamples = 1000,
  ignoreMinimumLengthCheck = FALSE
)

Arguments

`X_A_observed`	array of the observed samples (real values) of X_A.
`X_B_observed`	array of the observed samples (real values) of X_B, it needs to have the same length as X_A.
`alpha`	(optional, default value 0.05) the error of the confidence interval. If alpha = 0.05 then we have 95 percent confidence interval.
`EPSILON`	(optional, default value 1e-20) minimum difference between two values to be considered different.
`nOfBootstrapSamples`	(optional, default value 1e3) how many bootstrap samples to average. Increases computation time.
`ignoreMinimumLengthCheck`	(optional, default value FALSE) wether to check for a minimum length in X_A and X_B.

Value

Returns a list with the following fields:

- p: values in the interval [0,1] that represent the nOfEstimationPoints points in which the densities are estimated. Useful for plotting.

- Y_A_cumulative_estimation: an array with the estimated cumulative diustribution function of Y_A from 0 to p[[i]].

- Y_A_cumulative_upper: an array with the upper bounds of confidence 1 - alpha of the cumulative density of Y_A

- Y_A_cumulative_lower: an array with the lower bounds of confidence 1 - alpha of the cumulative density of Y_A

- Y_B_cumulative_estimation: The same as Y_A_cumulative_estimation for Y_B.

- Y_B_cumulative_upper: The same as Y_A_cumulative_upper for Y_B

- Y_B_cumulative_lower: The same as Y_A_cumulative_lower for Y_B

- diff_estimation: Y_A_cumulative_estimation - Y_B_cumulative_estimation

- diff_upper: an array with the upper bounds of confidence 1 - alpha of the difference between the cumulative distributions

- diff_lower: an array with the lower bounds of confidence 1 - alpha of the difference between the cumulative distributions

Examples

library(ggplot2)

### Example 1 ###
X_A_observed <- rnorm(100, mean = 2, sd = 1)
X_B_observed <- rnorm(100, mean = 2.1, sd = 0.5)

 res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)


fig1 = plot_Y_AB(res, plotDifference=FALSE)+ ggplot2::ggtitle("Example 1")
print(fig1)



### Example 2 ###
# Comparing the estimations with the actual distributions for two normal distributions.
###################################
## sample size = 100 ##############
###################################
X_A_observed <- rnorm(100,mean = 1, sd = 1)
X_B_observed <- rnorm(100,mean = 1.3, sd = 0.5)
res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)

X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1))
X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5))
actualDistributions <- getEmpiricalCumulativeDistributions(
        X_A_observed_large_sample,
        X_B_observed_large_sample,
        nOfEstimationPoints=1e4,
        EPSILON=1e-20)


actualDistributions$Y_A_cumulative_estimation <- lm(Y_A_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values
actualDistributions$Y_B_cumulative_estimation <- lm(Y_B_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values

fig = plot_Y_AB(res, plotDifference=FALSE) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_A_cumulative_estimation, colour = "Actual Y_A", linetype="Actual Y_A")) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_B_cumulative_estimation, colour = "Actual Y_B", linetype="Actual Y_B")) +

scale_colour_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="#00BFC4", "X_B"="#F8766D", "Actual Y_A"="#0000FF", "Actual Y_B"="#FF0000"))+

scale_linetype_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="solid", "X_B"="dashed", "Actual Y_A"="solid", "Actual Y_B"="solid"))+

ggtitle("100 samples used in the estimation")
print(fig)

###################################
## sample size = 300 ##############
###################################
X_A_observed <- rnorm(300,mean = 1, sd = 1)
X_B_observed <- rnorm(300,mean = 1.3, sd = 0.5)
res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)

X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1))
X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5))
actualDistributions <- getEmpiricalCumulativeDistributions(
        X_A_observed_large_sample,
        X_B_observed_large_sample,
        nOfEstimationPoints=1e4,
        EPSILON=1e-20)


actualDistributions$Y_A_cumulative_estimation <- lm(Y_A_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values

actualDistributions$Y_B_cumulative_estimation <- lm(Y_B_cumulative_estimation ~
       p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
       data = actualDistributions)$fitted.values

fig = plot_Y_AB(res, plotDifference=FALSE) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_A_cumulative_estimation, colour = "Actual Y_A", linetype="Actual Y_A")) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_B_cumulative_estimation, colour = "Actual Y_B", linetype="Actual Y_B")) +

scale_colour_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="#00BFC4", "X_B"="#F8766D", "Actual Y_A"="#0000FF", "Actual Y_B"="#FF0000"))+

scale_linetype_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="solid", "X_B"="dashed", "Actual Y_A"="solid", "Actual Y_B"="solid"))+

ggtitle("300 samples used in the estimation")
print(fig)


library(ggplot2)

### Example 1 ###
X_A_observed <- rnorm(100, mean = 2, sd = 1)
X_B_observed <- rnorm(100, mean = 2.1, sd = 0.5)

 res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)


fig1 = plot_Y_AB(res, plotDifference=FALSE)+ ggplot2::ggtitle("Example 1")
print(fig1)



### Example 2 ###
# Comparing the estimations with the actual distributions for two normal distributions.
###################################
## sample size = 100 ##############
###################################
X_A_observed <- rnorm(100,mean = 1, sd = 1)
X_B_observed <- rnorm(100,mean = 1.3, sd = 0.5)
res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)

X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1))
X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5))
actualDistributions <- getEmpiricalCumulativeDistributions(
        X_A_observed_large_sample,
        X_B_observed_large_sample,
        nOfEstimationPoints=1e4,
        EPSILON=1e-20)


actualDistributions$Y_A_cumulative_estimation <- lm(Y_A_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values
actualDistributions$Y_B_cumulative_estimation <- lm(Y_B_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values

fig = plot_Y_AB(res, plotDifference=FALSE) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_A_cumulative_estimation, colour = "Actual Y_A", linetype="Actual Y_A")) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_B_cumulative_estimation, colour = "Actual Y_B", linetype="Actual Y_B")) +

scale_colour_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="#00BFC4", "X_B"="#F8766D", "Actual Y_A"="#0000FF", "Actual Y_B"="#FF0000"))+

scale_linetype_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="solid", "X_B"="dashed", "Actual Y_A"="solid", "Actual Y_B"="solid"))+

ggtitle("100 samples used in the estimation")
print(fig)

###################################
## sample size = 300 ##############
###################################
X_A_observed <- rnorm(300,mean = 1, sd = 1)
X_B_observed <- rnorm(300,mean = 1.3, sd = 0.5)
res <- get_Y_AB_bounds_bootstrap(X_A_observed, X_B_observed)

X_A_observed_large_sample <- sort(rnorm(1e4, mean = 1, sd = 1))
X_B_observed_large_sample <- sort(rnorm(1e4, mean = 1.3, sd = 0.5))
actualDistributions <- getEmpiricalCumulativeDistributions(
        X_A_observed_large_sample,
        X_B_observed_large_sample,
        nOfEstimationPoints=1e4,
        EPSILON=1e-20)


actualDistributions$Y_A_cumulative_estimation <- lm(Y_A_cumulative_estimation ~
        p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
        data = actualDistributions)$fitted.values

actualDistributions$Y_B_cumulative_estimation <- lm(Y_B_cumulative_estimation ~
       p + I(p^2) + I(p^3)+ I(p^4)+ I(p^5)+ I(p^6)+I(p^7)+ I(p^8),
       data = actualDistributions)$fitted.values

fig = plot_Y_AB(res, plotDifference=FALSE) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_A_cumulative_estimation, colour = "Actual Y_A", linetype="Actual Y_A")) +

geom_line(data=as.data.frame(actualDistributions),
aes(x=p, y=Y_B_cumulative_estimation, colour = "Actual Y_B", linetype="Actual Y_B")) +

scale_colour_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="#00BFC4", "X_B"="#F8766D", "Actual Y_A"="#0000FF", "Actual Y_B"="#FF0000"))+

scale_linetype_manual("", breaks = c("X_A", "X_B","Actual Y_A", "Actual Y_B"),
values = c("X_A"="solid", "X_B"="dashed", "Actual Y_A"="solid", "Actual Y_B"="solid"))+

ggtitle("300 samples used in the estimation")
print(fig)

Estimate Y_A and Y_B bounds with Dvoretzky–Kiefer–Wolfowitz

Description

Estimate the confidence intervals for the cumulative distributions of Y_A and Y_B with Dvoretzky–Kiefer–Wolfowitz.

Usage

get_Y_AB_bounds_DKW(
  X_A_observed,
  X_B_observed,
  nOfEstimationPoints = 1000,
  alpha = 0.05,
  EPSILON = 1e-20,
  ignoreMinimumLengthCheck = FALSE
)
get_Y_AB_bounds_DKW(
  X_A_observed,
  X_B_observed,
  nOfEstimationPoints = 1000,
  alpha = 0.05,
  EPSILON = 1e-20,
  ignoreMinimumLengthCheck = FALSE
)

Arguments

`X_A_observed`	array of the observed samples (real values) of X_A.
`X_B_observed`	array of the observed samples (real values) of X_B.
`nOfEstimationPoints`	(optional, default 1000) the number of points in the interval [0,1] in which the density is estimated.
`alpha`	(optional, default value 0.05) the error of the confidence interval. If alpha = 0.05 then we have 95 percent confidence interval.
`EPSILON`	(optional, default value 1e-20) minimum difference between two values to be considered different.
`ignoreMinimumLengthCheck`	(optional, default value FALSE) wether to check for a minimum length in X_A and X_B.

Value

Returns a list with the following fields:

- p: values in the interval [0,1] that represent the nOfEstimationPoints points in which the densities are estimated. Useful for plotting.

- Y_A_cumulative_estimation: an array with the empirical cumulative diustribution function of Y_A from 0 to p[[i]].