1 Decision Making for Two Samples

Major Themes of Chapter 5

Two-Sample Comparisons: Comparing parameters between two populations or treatments
Paired vs. Independent Samples: Understanding when to use different comparison methods
Variance Comparisons: Testing equality of variances using F-tests
Proportion Comparisons: Comparing success rates between two groups
Introduction to ANOVA: Extending two-sample methods to multiple samples

Learning Objectives

After careful study of this chapter, you should be able to do the following:

Test hypotheses and construct confidence intervals for the difference between two means when variances are known and unknown.
Understand and apply the pooled variance procedure for two-sample t-tests.
Conduct and interpret paired t-tests for dependent samples.
Test hypotheses and construct confidence intervals for the ratio of two variances using F-tests.
Compare two population proportions using appropriate statistical methods.
Calculate sample sizes and power for two-sample tests.
Understand the basic concepts of Analysis of Variance (ANOVA).
Distinguish between completely randomized and randomized complete block experimental designs.

1.1 Introduction

Two-Sample Problems

Many engineering problems involve comparing two populations, processes, or treatments. Common scenarios include:

Comparing the mean strength of materials from two suppliers
Testing whether a new manufacturing process reduces defect rates
Evaluating if a design modification improves performance
Comparing measurements before and after a process change

The choice of statistical method depends on:

Whether samples are independent or paired
Whether population variances are known or unknown
Whether variances can be assumed equal or unequal

1.2 Inference on the Means of Two Populations, Variances Known

1.2.1 Hypothesis Testing on the Difference in Means

Two-Sample Z-Test (σ₁, σ₂ known)

When both population variances are known, the test statistic for testing H₀: μ₁ = μ₂ is:

Z_0 = \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

Under H₀: μ₁ - μ₂ = 0, Z₀ ~ N(0,1)

Common Hypothesis Tests:

Two-sided: H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂
One-sided: H₀: μ₁ ≤ μ₂ vs H₁: μ₁ > μ₂
One-sided: H₀: μ₁ ≥ μ₂ vs H₁: μ₁ < μ₂

1.2.2 Example: Comparing Two Manufacturing Processes

Example

A company wants to compare the tensile strength of cables produced by two different machines. Machine 1 produces cables with σ₁ = 12 psi, and Machine 2 with σ₂ = 10 psi. Random samples yield:

[1] "Cable Strength Data Summary by Machine:"

     machine     n      Min       Q1   Median     Mean       Q3      Max
      <char> <int>    <num>    <num>    <num>    <num>    <num>    <num>
1: Machine_1    20 221.4006 239.0774 246.4398 246.6995 251.5847 266.4430
2: Machine_2    25 221.1331 231.0529 235.9208 238.0735 246.2158 259.6896
          SD       Var
       <num>     <num>
1: 11.671984 136.23521
2:  9.449202  89.28741

library(collapse)
library(data.table)
library(ggplot2)
library(scales)
library(kableExtra)
library(gridExtra)
library(tidyr)

# Cable strength comparison (two-sample Z-test, sigma known)
set.seed(123)
cable_data <- data.table(
  machine = rep(c("Machine_1", "Machine_2"), c(20, 25)),
  strength = c(rnorm(20, mean = 245, sd = 12), rnorm(25, mean = 238, sd = 10))
)

# Summary statistics using collapse functions
cable_summary <- cable_data %>% 
  fgroup_by(machine) %>% 
  fsummarise(
    n = fnobs(strength),
    Min = fmin(strength),
    Q1 = fquantile(strength, 0.25),
    Median = fmedian(strength),
    Mean = fmean(strength),
    Q3 = fquantile(strength, 0.75),
    Max = fmax(strength),
    SD = fsd(strength),
    Var = fvar(strength)
  )

print("Cable Strength Data Summary by Machine:")
print(cable_summary)

# Write data to CSV for download
fwrite(cable_data, "data/cable_strength.csv")

Test H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂ at α = 0.05.

Manual Calculation:

Given: n₁ = 20, x̄₁ = 245, σ₁ = 12, n₂ = 25, x̄₂ = 238, σ₂ = 10

Test Statistic: Z_0 = \frac{(245 - 238) - 0}{\sqrt{\frac{12^2}{20} + \frac{10^2}{25}}} = \frac{7}{\sqrt{7.2 + 4}} = \frac{7}{3.35} = 2.09
Critical Value: z₀.₀₂₅ = 1.96
Decision: |Z₀| = 2.09 > 1.96, so reject H₀

R Output
R Code

[1] "Two-Sample Z-Test Results:"

                     Test_Type    n1    n2 xbar1  xbar2 sigma1 sigma2 Pooled_SE
                        <char> <int> <int> <num>  <num>  <num>  <num>     <num>
1: Two-sample Z-test (σ known)    20    25 246.7 238.07     12     10     3.347
   Z_Statistic P_Value Alpha Z_Critical  Decision
         <num>   <num> <num>      <num>    <char>
1:      2.5775    0.01  0.05       1.96 Reject H0
                          Conclusion
                              <char>
1: Means are significantly different

# Two-sample Z-test (manual and R calculations)
# Known population standard deviations
sigma1 <- 12
sigma2 <- 10
alpha <- 0.05

# Extract sample statistics
machine1_data <- cable_data[machine == "Machine_1"]
machine2_data <- cable_data[machine == "Machine_2"]

n1 <- fnobs(machine1_data$strength)
n2 <- fnobs(machine2_data$strength)
xbar1 <- fmean(machine1_data$strength)
xbar2 <- fmean(machine2_data$strength)

# Manual calculation
pooled_se <- sqrt((sigma1^2)/n1 + (sigma2^2)/n2)
z_stat <- (xbar1 - xbar2) / pooled_se
p_value_z <- 2 * (1 - pnorm(abs(z_stat)))  # Two-sided test
z_critical <- qnorm(1 - alpha/2)

# Test results
z_test_results <- data.table(
  Test_Type = "Two-sample Z-test (σ known)",
  n1 = n1,
  n2 = n2,
  xbar1 = round(xbar1, 2),
  xbar2 = round(xbar2, 2),
  sigma1 = sigma1,
  sigma2 = sigma2,
  Pooled_SE = round(pooled_se, 3),
  Z_Statistic = round(z_stat, 4),
  P_Value = round(p_value_z, 4),
  Alpha = alpha,
  Z_Critical = round(z_critical, 3),
  Decision = ifelse(abs(z_stat) > z_critical, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_z < alpha, 
                     "Means are significantly different", 
                     "No significant difference in means")
)

print("Two-Sample Z-Test Results:")
print(z_test_results)

1.2.3 Type II Error and Choice of Sample Size

Power and Sample Size for Two-Sample Z-Test

Type II Error: For a specific difference δ = |μ₁ - μ₂|:

\beta = P\left(-z_{\alpha/2} - \frac{\delta}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \leq Z \leq z_{\alpha/2} - \frac{\delta}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\right)

Sample Size: For equal sample sizes (n₁ = n₂ = n):

n = \frac{(z_{\alpha/2} + z_\beta)^2(\sigma_1^2 + \sigma_2^2)}{\delta^2}

Figure 1: Power curves for two-sample Z-test

# Power curves for two-sample Z-test
delta_values <- seq(0, 15, by = 0.5)
sample_sizes <- c(10, 20, 30, 50)
alpha_power <- 0.05
sigma1_power <- 12
sigma2_power <- 10

# Function to calculate power for two-sample Z-test
calculate_power_two_sample <- function(delta, n1, n2, sigma1, sigma2, alpha) {
  se <- sqrt(sigma1^2/n1 + sigma2^2/n2)
  z_alpha2 <- qnorm(1 - alpha/2)
  
  # Power for two-sided test
  power <- 1 - pnorm(z_alpha2 - delta/se) + pnorm(-z_alpha2 - delta/se)
  return(power)
}

# Generate power curve data
power_curve_data <- data.table()
for(n_val in sample_sizes) {
  for(delta in delta_values) {
    power_val <- calculate_power_two_sample(delta, n_val, n_val, sigma1_power, sigma2_power, alpha_power)
    power_curve_data <- rbind(power_curve_data, 
                             data.table(delta = delta, n = factor(n_val), power = power_val))
  }
}

# Plot power curves
ggplot(data = power_curve_data, mapping = aes(x = delta, y = power, color = n)) +
  geom_line(linewidth = 1.2) +
  geom_hline(yintercept = 0.8, linetype = "dotted", alpha = 0.7) +
  annotate("text", x = 12, y = 0.82, label = "Power = 0.8", hjust = 0) +
  scale_x_continuous(breaks = pretty_breaks(n = 8)) +
  scale_y_continuous(breaks = pretty_breaks(n = 6), limits = c(0, 1)) +
  labs(
    x = "True Difference in Means (δ = |μ₁ - μ₂|)",
    y = "Power (1 - β)",
    color = "Sample Size\n(each group)",
    title = "Power Curves for Two-Sample Z-Test",
    subtitle = "H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂, α = 0.05, σ₁ = 12, σ₂ = 10"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "right"
  )

1.2.5 Confidence Interval on the Difference in Means

Confidence Interval for μ₁ - μ₂ (σ₁, σ₂ known)

A 100(1-α)% confidence interval for μ₁ - μ₂:

(\overline{x_1} - \overline{x_2}) \pm z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

R Output
R Code

[1] "95% Confidence Interval for Difference in Means (σ known):"

   Parameter Confidence_Level Difference Standard_Error Z_Critical Margin_Error
      <char>           <char>      <num>          <num>      <num>        <num>
1:   μ₁ - μ₂              95%       8.63          3.347       1.96        6.559
   Lower_Bound Upper_Bound Width
         <num>       <num> <num>
1:        2.07       15.19 13.12
                                         Interpretation
                                                 <char>
1: 95% confident that μ₁ - μ₂ is between 2.07 and 15.19

# Confidence interval for difference in means (sigma known)
margin_error_z <- z_critical * pooled_se
ci_lower_z <- (xbar1 - xbar2) - margin_error_z
ci_upper_z <- (xbar1 - xbar2) + margin_error_z

z_ci_results <- data.table(
  Parameter = "μ₁ - μ₂",
  Confidence_Level = "95%",
  Difference = round(xbar1 - xbar2, 2),
  Standard_Error = round(pooled_se, 3),
  Z_Critical = round(z_critical, 3),
  Margin_Error = round(margin_error_z, 3),
  Lower_Bound = round(ci_lower_z, 2),
  Upper_Bound = round(ci_upper_z, 2),
  Width = round(ci_upper_z - ci_lower_z, 2),
  Interpretation = paste("95% confident that μ₁ - μ₂ is between", 
                        round(ci_lower_z, 2), "and", round(ci_upper_z, 2))
)

print("95% Confidence Interval for Difference in Means (σ known):")
print(z_ci_results)

1.3 Inference on the Means of Two Populations, Variances Unknown

1.3.1 Hypothesis Testing on the Difference in Means

Two-Sample t-Test (σ₁, σ₂ unknown)

When population variances are unknown, we must consider two cases:

Case 1: Equal Variances (σ₁² = σ₂²)

Use pooled variance:

S_p^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2}

Test statistic:

T_0 = \frac{\overline{X_1} - \overline{X_2}}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}

Degrees of freedom: ν = n₁ + n₂ - 2

Case 2: Unequal Variances (σ₁² ≠ σ₂²)

Welch’s t-test
Test statistic:

T_0 = \frac{\overline{X_1} - \overline{X_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}

Degrees of freedom:

\nu = \frac{(\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2})^2}{\frac{S_1^4}{n_1^2(n_1-1)} + \frac{S_2^4}{n_2^2(n_2-1)}}

1.3.2 Example: Comparing Material Strength

Example

Compare the breaking strength of two types of plastic. Random samples give:

[1] "Plastic Strength Data Summary by Type:"

     type     n      Min       Q1   Median     Mean       Q3      Max        SD
   <char> <int>    <num>    <num>    <num>    <num>    <num>    <num>     <num>
1: Type_A    15 150.6854 155.8163 167.2005 163.4680 169.8374 176.0622  8.599716
2: Type_B    18 139.6099 152.5444 156.1428 158.8415 170.5125 178.5483 11.920081
         Var  SE_Mean
       <num>    <num>
1:  73.95511 2.220437
2: 142.08832 2.809590

# Plastic strength comparison (two-sample t-test, sigma unknown)
set.seed(456)
plastic_data <- data.table(
  type = rep(c("Type_A", "Type_B"), c(15, 18)),
  strength = c(rnorm(15, mean = 162.5, sd = 8.2), rnorm(18, mean = 157.8, sd = 9.1))
)

# Summary statistics
plastic_summary <- plastic_data %>% 
  fgroup_by(type) %>% 
  fsummarise(
    n = fnobs(strength),
    Min = fmin(strength),
    Q1 = fquantile(strength, 0.25),
    Median = fmedian(strength),
    Mean = fmean(strength),
    Q3 = fquantile(strength, 0.75),
    Max = fmax(strength),
    SD = fsd(strength),
    Var = fvar(strength),
    SE_Mean = fsd(strength) / sqrt(fnobs(strength))
  )

print("Plastic Strength Data Summary by Type:")
print(plastic_summary)

# Write data to CSV for download
fwrite(plastic_data, "data/plastic_strength.csv")

First, test for equal variances, then perform appropriate t-test.

Manual Calculation (assuming equal variances):

Given: n₁ = 15, x̄₁ = 162.5, s₁ = 8.2, n₂ = 18, x̄₂ = 157.8, s₂ = 9.1

Pooled Variance: S_p^2 = \frac{(15-1)(8.2)^2 + (18-1)(9.1)^2}{15 + 18 - 2} = \frac{940.36 + 1408.57}{31} = 75.77
Test Statistic: T_0 = \frac{162.5 - 157.8}{8.71\sqrt{\frac{1}{15} + \frac{1}{18}}} = \frac{4.7}{3.01} = 1.56
Critical Value: t₀.₀₂₅,₃₁ = 2.040
Decision: |T₀| = 1.56 < 2.040, so fail to reject H₀

R Output
R Code

[1] "F-test for equal variances: F = 0.52 , p-value = 0.2229"

[1] "Two-Sample t-Test Results (Equal Variances):"

                             Test_Type    n1    n2  xbar1  xbar2    s1    s2
                                <char> <int> <int>  <num>  <num> <num> <num>
1: Two-sample t-test (equal variances)    15    18 163.47 158.84   8.6 11.92
   Pooled_SD SE_Difference t_Statistic    df P_Value Alpha t_Critical
       <num>         <num>       <num> <num>   <num> <num>      <num>
1:     10.55         3.689      1.2543    31  0.2191  0.05       2.04
            Decision                         Conclusion
              <char>                             <char>
1: Fail to reject H0 No significant difference in means

[1] "R's t.test verification:"


    Two Sample t-test

data:  typeA_data$strength and typeB_data$strength
t = 1.2543, df = 31, p-value = 0.2191
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.896371 12.149429
sample estimates:
mean of x mean of y 
 163.4680  158.8415

# Two-sample t-test with equal variances (manual and R calculations)
typeA_data <- plastic_data[type == "Type_A"]
typeB_data <- plastic_data[type == "Type_B"]

n1_t <- fnobs(typeA_data$strength)
n2_t <- fnobs(typeB_data$strength)
xbar1_t <- fmean(typeA_data$strength)
xbar2_t <- fmean(typeB_data$strength)
s1_t <- fsd(typeA_data$strength)
s2_t <- fsd(typeB_data$strength)

# Test for equal variances first (F-test)
f_stat_equal <- s1_t^2 / s2_t^2
df1_f <- n1_t - 1
df2_f <- n2_t - 1
p_value_f <- 2 * min(pf(f_stat_equal, df1_f, df2_f), 1 - pf(f_stat_equal, df1_f, df2_f))

print(paste("F-test for equal variances: F =", round(f_stat_equal, 3), ", p-value =", round(p_value_f, 4)))

# Assuming equal variances (pooled variance)
sp_squared <- ((n1_t - 1) * s1_t^2 + (n2_t - 1) * s2_t^2) / (n1_t + n2_t - 2)
sp <- sqrt(sp_squared)
se_pooled <- sp * sqrt(1/n1_t + 1/n2_t)

# Manual t-test calculation
t_stat <- (xbar1_t - xbar2_t) / se_pooled
df_t <- n1_t + n2_t - 2
p_value_t <- 2 * (1 - pt(abs(t_stat), df = df_t))
t_critical <- qt(1 - alpha/2, df = df_t)

# Test results
t_test_results <- data.table(
  Test_Type = "Two-sample t-test (equal variances)",
  n1 = n1_t,
  n2 = n2_t,
  xbar1 = round(xbar1_t, 2),
  xbar2 = round(xbar2_t, 2),
  s1 = round(s1_t, 2),
  s2 = round(s2_t, 2),
  Pooled_SD = round(sp, 2),
  SE_Difference = round(se_pooled, 3),
  t_Statistic = round(t_stat, 4),
  df = df_t,
  P_Value = round(p_value_t, 4),
  Alpha = alpha,
  t_Critical = round(t_critical, 3),
  Decision = ifelse(abs(t_stat) > t_critical, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_t < alpha, 
                     "Means are significantly different", 
                     "No significant difference in means")
)

print("Two-Sample t-Test Results (Equal Variances):")
print(t_test_results)

# Using R's built-in t.test for verification
t_test_r <- t.test(typeA_data$strength, typeB_data$strength, var.equal = TRUE)
print("R's t.test verification:")
print(t_test_r)

1.3.3 Type II Error and Choice of Sample Size

Sample Size for Two-Sample t-Test

For equal sample sizes and equal variances:

n \approx \frac{2(t_{\alpha/2,\nu} + t_{\beta,\nu})^2 s_p^2}{\delta^2}

where ν = 2n - 2 and s_p² is estimated from pilot data.

1.3.4 Confidence Interval on the Difference in Means

Confidence Interval for μ₁ - μ₂ (σ₁, σ₂ unknown)

Equal Variances:

(\overline{x_1} - \overline{x_2}) \pm t_{\alpha/2,n_1+n_2-2} \cdot S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}

Unequal Variances:

(\overline{x_1} - \overline{x_2}) \pm t_{\alpha/2,\nu} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

R Output
R Code

[1] "95% Confidence Interval for Difference in Means (σ unknown):"

   Parameter Confidence_Level Difference Pooled_SD Standard_Error t_Critical
      <char>           <char>      <num>     <num>          <num>      <num>
1:   μ₁ - μ₂              95%       4.63     10.55          3.689       2.04
   Margin_Error Lower_Bound Upper_Bound Width
          <num>       <num>       <num> <num>
1:        7.523        -2.9       12.15 15.05
                                         Interpretation
                                                 <char>
1: 95% confident that μ₁ - μ₂ is between -2.9 and 12.15

[1] "Welch's t-test (unequal variances):"


    Welch Two Sample t-test

data:  typeA_data$strength and typeB_data$strength
t = 1.2919, df = 30.446, p-value = 0.2061
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.682527 11.935585
sample estimates:
mean of x mean of y 
 163.4680  158.8415

# Confidence interval for difference in means (sigma unknown)
margin_error_t <- t_critical * se_pooled
ci_lower_t <- (xbar1_t - xbar2_t) - margin_error_t
ci_upper_t <- (xbar1_t - xbar2_t) + margin_error_t

t_ci_results <- data.table(
  Parameter = "μ₁ - μ₂",
  Confidence_Level = "95%",
  Difference = round(xbar1_t - xbar2_t, 2),
  Pooled_SD = round(sp, 2),
  Standard_Error = round(se_pooled, 3),
  t_Critical = round(t_critical, 3),
  Margin_Error = round(margin_error_t, 3),
  Lower_Bound = round(ci_lower_t, 2),
  Upper_Bound = round(ci_upper_t, 2),
  Width = round(ci_upper_t - ci_lower_t, 2),
  Interpretation = paste("95% confident that μ₁ - μ₂ is between", 
                        round(ci_lower_t, 2), "and", round(ci_upper_t, 2))
)

print("95% Confidence Interval for Difference in Means (σ unknown):")
print(t_ci_results)

# Welch's t-test (unequal variances) for comparison
welch_test <- t.test(typeA_data$strength, typeB_data$strength, var.equal = FALSE)
print("Welch's t-test (unequal variances):")
print(welch_test)

1.4 The Paired t-Test

Paired t-Test

When observations come in natural pairs (before/after, matched subjects, etc.), use the paired t-test:

Calculate differences: d_i = x₁ᵢ - x₂ᵢ
Test statistic:

T_0 = \frac{\overline{d} - \mu_d}{S_d/\sqrt{n}}

Degrees of freedom: ν = n - 1

Where:

\overline{d} = \frac{1}{n}\sum_{i=1}^n d_i
S_d^2 = \frac{1}{n-1}\sum_{i=1}^n (d_i - \overline{d})^2

Assumptions:

Differences are normally distributed
Pairs are independent

1.4.1 Example: Before/After Comparison

Example

A manufacturing process improvement is tested on 12 production lines. Measure output before and after the improvement:

[1] "Process Improvement Paired Data Summary:"

       n Mean_Before Mean_After Mean_Difference SD_Before SD_After
   <int>       <num>      <num>           <num>     <num>    <num>
1:    12    81.70922   89.14206        7.432843   6.36605 4.794191
   SD_Difference SE_Difference Min_Difference Max_Difference
           <num>         <num>          <num>          <num>
1:      6.906354      1.993693      -2.023308        19.6769

[1] "Individual Differences:"

    line_id   before    after difference
      <int>    <num>    <num>      <num>
 1:       1 89.19277 88.91711 -0.2756677
 2:       2 66.91386 86.59075 19.6768963
 3:       3 84.84256 97.20931 12.3667432
 4:       4 86.46512 84.44181 -2.0233083
 5:       5 82.10919 93.42153 11.3123453
 6:       6 81.12413 85.69777  4.5736465
 7:       7 79.66950 91.82031 12.1508159
 8:       8 83.60425 84.41997  0.8157253
 9:       9 76.91232 84.98462  8.0723014
10:      10 90.91757 95.37602  4.4584503
11:      11 81.78163 83.81734  2.0357137
12:      12 76.97776 93.00821 16.0304489

# Process improvement paired t-test example
set.seed(789)
process_data <- data.table(
  line_id = 1:12,
  before = rnorm(12, mean = 85, sd = 8),
  after = rnorm(12, mean = 90.25, sd = 7.5)
)

# Calculate differences
process_data[, difference := after - before]

# Summary statistics for paired data
paired_summary <- process_data %>% 
  fsummarise(
    n = fnobs(difference),
    Mean_Before = fmean(before),
    Mean_After = fmean(after),
    Mean_Difference = fmean(difference),
    SD_Before = fsd(before),
    SD_After = fsd(after),
    SD_Difference = fsd(difference),
    SE_Difference = fsd(difference) / sqrt(fnobs(difference)),
    Min_Difference = fmin(difference),
    Max_Difference = fmax(difference)
  )

print("Process Improvement Paired Data Summary:")
print(paired_summary)

# Individual differences
differences_table <- process_data[, .(line_id, before, after, difference)]
print("Individual Differences:")
print(differences_table)

# Write data to CSV for download
fwrite(process_data, "data/process_improvement.csv")

Test H₀: μ_d = 0 vs H₁: μ_d > 0 at α = 0.05.

Manual Calculation:

Calculate differences: d_i = After_i - Before_i

If d̄ = 5.25 and s_d = 3.2:

Test Statistic: T_0 = \frac{5.25 - 0}{3.2/\sqrt{12}} = \frac{5.25}{0.924} = 5.68
Critical Value: t₀.₀₅,₁₁ = 1.796
Decision: T₀ = 5.68 > 1.796, so reject H₀

R Output
R Code

[1] "Paired t-Test Results:"

       Test_Type n_pairs Mean_Difference SD_Difference SE_Difference
          <char>   <int>           <num>         <num>         <num>
1: Paired t-test      12           7.433         6.906         1.994
   Null_Difference t_Statistic    df P_Value Alpha t_Critical  Decision
             <num>       <num> <num>   <num> <num>      <num>    <char>
1:               0      3.7282    11  0.0017  0.05      1.796 Reject H0
                         Conclusion
                             <char>
1: Significant improvement detected

[1] "R's paired t.test verification:"


    Paired t-test

data:  process_data$after and process_data$before
t = 3.7282, df = 11, p-value = 0.001667
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 3.8524    Inf
sample estimates:
mean difference 
       7.432843

[1] "95% CI for mean difference: [ 3.045 ,  11.821 ]"

# Paired t-test (manual and R calculations)
n_paired <- fnobs(process_data$difference)
d_bar <- fmean(process_data$difference)
s_d <- fsd(process_data$difference)
mu_d0 <- 0  # Null hypothesis: no difference

# Manual calculation
t_stat_paired <- (d_bar - mu_d0) / (s_d / sqrt(n_paired))
df_paired <- n_paired - 1
p_value_paired <- 1 - pt(t_stat_paired, df = df_paired)  # One-sided upper test
t_critical_paired <- qt(1 - alpha, df = df_paired)

# Test results
paired_test_results <- data.table(
  Test_Type = "Paired t-test",
  n_pairs = n_paired,
  Mean_Difference = round(d_bar, 3),
  SD_Difference = round(s_d, 3),
  SE_Difference = round(s_d / sqrt(n_paired), 3),
  Null_Difference = mu_d0,
  t_Statistic = round(t_stat_paired, 4),
  df = df_paired,
  P_Value = round(p_value_paired, 4),
  Alpha = alpha,
  t_Critical = round(t_critical_paired, 3),
  Decision = ifelse(t_stat_paired > t_critical_paired, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_paired < alpha, 
                     "Significant improvement detected", 
                     "No significant improvement")
)

print("Paired t-Test Results:")
print(paired_test_results)

# Using R's paired t.test for verification
paired_test_r <- t.test(process_data$after, process_data$before, paired = TRUE, alternative = "greater")
print("R's paired t.test verification:")
print(paired_test_r)

# Confidence interval for mean difference
t_critical_ci_paired <- qt(1 - alpha/2, df = df_paired)
margin_error_paired <- t_critical_ci_paired * (s_d / sqrt(n_paired))
ci_lower_paired <- d_bar - margin_error_paired
ci_upper_paired <- d_bar + margin_error_paired

print(paste("95% CI for mean difference: [", round(ci_lower_paired, 3), ", ", round(ci_upper_paired, 3), "]"))

R Output
R Code

# Paired t-test visualization

# Create long format for plotting
process_long <- process_data %>%
  pivot_longer(cols = c(before, after), names_to = "time", values_to = "output") %>%
  data.table()

# Before/After comparison plot
p1 <- ggplot(data = process_long, mapping = aes(x = time, y = output, group = line_id)) +
  geom_line(color = "gray", alpha = 0.7) +
  geom_point(mapping = aes(color = time), size = 3) +
  scale_color_manual(values = c("before" = "red", "after" = "blue")) +
  labs(
    x = "Time Period",
    y = "Output",
    title = "Before/After Process Improvement",
    color = "Period"
  ) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Differences histogram
p2 <- ggplot(data = process_data, mapping = aes(x = difference)) +
  geom_histogram(bins = 8, fill = "lightblue", color = "black", alpha = 0.7) +
  geom_vline(xintercept = d_bar, color = "red", linetype = "dashed", linewidth = 1) +
  geom_vline(xintercept = 0, color = "black", linetype = "solid", linewidth = 1) +
  labs(
    x = "Difference (After - Before)",
    y = "Frequency",
    title = "Distribution of Differences"
  ) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Combine plots
grid.arrange(p1, p2, nrow = 1, ncol = 2)

1.5 Inference on the Ratio of Variances of Two Normal Populations

1.5.1 Hypothesis Testing on the Ratio of Two Variances

F-Test for Equality of Variances

To test H₀: σ₁² = σ₂² vs H₁: σ₁² ≠ σ₂², use:

F_0 = \frac{S_1^2}{S_2^2}

Under H₀, F₀ ~ F_{n₁-1, n₂-1}

Convention: Put the larger sample variance in the numerator so F₀ ≥ 1.

Properties of F-distribution:

Always positive
Right-skewed
Approaches 1 as both df → ∞
F₁₋α,ν₁,ν₂ = 1/F_α,ν₂,ν₁

1.5.2 Example: Variance Comparison

Example

Compare the variability in measurements from two instruments:

[1] "Instrument Comparison Data Summary:"

     instrument     n      Min       Q1   Median     Mean       Q3      Max
         <char> <int>    <num>    <num>    <num>    <num>    <num>    <num>
1: Instrument_1    16 23.17625 24.45219 25.33319 25.39453 25.69963 28.86313
2: Instrument_2    21 22.07273 24.18302 25.35367 25.00446 25.81188 26.85136
         SD      Var
      <num>    <num>
1: 1.522413 2.317741
2: 1.359890 1.849302

# Instrument comparison for F-test (variance comparison)
set.seed(321)
instrument_data <- data.table(
  instrument = rep(c("Instrument_1", "Instrument_2"), c(16, 21)),
  measurement = c(rnorm(16, mean = 25, sd = sqrt(2.5)), rnorm(21, mean = 25.2, sd = sqrt(1.8)))
)

# Summary statistics
instrument_summary <- instrument_data %>% 
  fgroup_by(instrument) %>% 
  fsummarise(
    n = fnobs(measurement),
    Min = fmin(measurement),
    Q1 = fquantile(measurement, 0.25),
    Median = fmedian(measurement),
    Mean = fmean(measurement),
    Q3 = fquantile(measurement, 0.75),
    Max = fmax(measurement),
    SD = fsd(measurement),
    Var = fvar(measurement)
  )

print("Instrument Comparison Data Summary:")
print(instrument_summary)

# Write data to CSV for download
fwrite(instrument_data, "data/instrument_comparison.csv")

Test H₀: σ₁² = σ₂² vs H₁: σ₁² ≠ σ₂² at α = 0.05.

Manual Calculation:

Given: n₁ = 16, s₁² = 2.5, n₂ = 21, s₂² = 1.8

Test Statistic: F_0 = \frac{2.5}{1.8} = 1.39
Critical Value: F₀.₀₂₅,₁₅,₂₀ = 2.57
Decision: F₀ = 1.39 < 2.57, so fail to reject H₀

R Output
R Code

[1] "F-Test Results for Equality of Variances:"

                          Test_Type    n1    n2 s1_squared s2_squared
                             <char> <int> <int>      <num>      <num>
1: F-test for equality of variances    16    21     2.3177     1.8493
   Larger_Variance F_Statistic   df1   df2 P_Value Alpha F_Critical
            <char>       <num> <num> <num>   <num> <num>      <num>
1:    Instrument_1      1.2533    15    20   0.627  0.05      2.573
            Decision                             Conclusion
              <char>                                 <char>
1: Fail to reject H0 No significant difference in variances

[1] "R's var.test verification:"


    F test to compare two variances

data:  inst1_data$measurement and inst2_data$measurement
F = 1.2533, num df = 15, denom df = 20, p-value = 0.627
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.4870809 3.4539881
sample estimates:
ratio of variances 
          1.253306

# F-test for equality of variances
inst1_data <- instrument_data[instrument == "Instrument_1"]
inst2_data <- instrument_data[instrument == "Instrument_2"]

n1_f <- fnobs(inst1_data$measurement)
n2_f <- fnobs(inst2_data$measurement)
s1_f_squared <- fvar(inst1_data$measurement)
s2_f_squared <- fvar(inst2_data$measurement)

# F-test (put larger variance in numerator)
if(s1_f_squared >= s2_f_squared) {
  f_stat <- s1_f_squared / s2_f_squared
  df1_f_test <- n1_f - 1
  df2_f_test <- n2_f - 1
  larger_var <- "Instrument_1"
} else {
  f_stat <- s2_f_squared / s1_f_squared
  df1_f_test <- n2_f - 1
  df2_f_test <- n1_f - 1
  larger_var <- "Instrument_2"
}

# Calculate p-value for two-sided test
p_value_f_test <- 2 * (1 - pf(f_stat, df1_f_test, df2_f_test))
f_critical <- qf(1 - alpha/2, df1_f_test, df2_f_test)

# Test results
f_test_results <- data.table(
  Test_Type = "F-test for equality of variances",
  n1 = n1_f,
  n2 = n2_f,
  s1_squared = round(s1_f_squared, 4),
  s2_squared = round(s2_f_squared, 4),
  Larger_Variance = larger_var,
  F_Statistic = round(f_stat, 4),
  df1 = df1_f_test,
  df2 = df2_f_test,
  P_Value = round(p_value_f_test, 4),
  Alpha = alpha,
  F_Critical = round(f_critical, 3),
  Decision = ifelse(f_stat > f_critical, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_f_test < alpha, 
                     "Variances are significantly different", 
                     "No significant difference in variances")
)

print("F-Test Results for Equality of Variances:")
print(f_test_results)

# Using R's var.test for verification
var_test_r <- var.test(inst1_data$measurement, inst2_data$measurement)
print("R's var.test verification:")
print(var_test_r)

1.5.3 Confidence Interval on the Ratio of Two Variances

Confidence Interval for σ₁²/σ₂²

A 100(1-α)% confidence interval for σ₁²/σ₂²:

\frac{s_1^2/s_2^2}{F_{\alpha/2,n_1-1,n_2-1}} \leq \frac{\sigma_1^2}{\sigma_2^2} \leq \frac{s_1^2/s_2^2}{F_{1-\alpha/2,n_1-1,n_2-1}}

Note: F₁₋α/₂,ν₁,ν₂ = 1/F_α/₂,ν₂,ν₁

R Output
R Code

[1] "95% Confidence Interval for Ratio of Variances:"

   Parameter Confidence_Level Sample_Ratio Lower_Bound Upper_Bound  Width
      <char>           <char>        <num>       <num>       <num>  <num>
1:   σ₁²/σ₂²              95%       1.2533      0.4871       3.454 2.9669
                                           Interpretation
                                                   <char>
1: 95% confident that σ₁²/σ₂² is between 0.4871 and 3.454

# Confidence interval for ratio of variances
f_alpha2_lower <- qf(alpha/2, df1_f_test, df2_f_test)
f_alpha2_upper <- qf(1 - alpha/2, df1_f_test, df2_f_test)

# Note: For CI, use original ratio without putting larger in numerator
ratio_original <- s1_f_squared / s2_f_squared
ci_lower_f <- ratio_original / f_alpha2_upper
ci_upper_f <- ratio_original / f_alpha2_lower

variance_ratio_ci <- data.table(
  Parameter = "σ₁²/σ₂²",
  Confidence_Level = "95%",
  Sample_Ratio = round(ratio_original, 4),
  Lower_Bound = round(ci_lower_f, 4),
  Upper_Bound = round(ci_upper_f, 4),
  Width = round(ci_upper_f - ci_lower_f, 4),
  Interpretation = paste("95% confident that σ₁²/σ₂² is between", 
                        round(ci_lower_f, 4), "and", round(ci_upper_f, 4))
)

print("95% Confidence Interval for Ratio of Variances:")
print(variance_ratio_ci)

1.6 Inference on Two Population Proportions

1.6.1 Hypothesis Testing on the Equality of Two Proportions

Two-Sample Z-Test for Proportions

To test H₀: p₁ = p₂, use the pooled estimate:

\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}

Test Statistic:

Z_0 = \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}

Conditions: n₁p̂ ≥ 5, n₁(1-p̂) ≥ 5, n₂p̂ ≥ 5, n₂(1-p̂) ≥ 5

1.6.2 Example: Comparing Defect Rates

Example

Compare defect rates between two production lines:

[1] "Defect Rate Comparison Summary:"

     line sample_size defects  prop
   <char>       <num>   <num> <num>
1: Line_1         200      16 0.080
2: Line_2         250      14 0.056

[1] "Overall Summary:"

   Total_Items Total_Defects Line1_Rate Line2_Rate Overall_Rate
         <num>         <num>      <num>      <num>        <num>
1:         450            30       0.08      0.056   0.06666667

# Defect rate comparison (two-sample proportion test)
defect_comparison_data <- data.table(
  line = c("Line_1", "Line_2"),
  sample_size = c(200, 250),
  defects = c(16, 14),
  prop = c(16/200, 14/250)
)

# Summary
defect_summary <- defect_comparison_data %>% 
  fsummarise(
    Total_Items = fsum(sample_size),
    Total_Defects = fsum(defects),
    Line1_Rate = defects[1] / sample_size[1],
    Line2_Rate = defects[2] / sample_size[2],
    Overall_Rate = fsum(defects) / fsum(sample_size)
  )

print("Defect Rate Comparison Summary:")
print(defect_comparison_data)
print("Overall Summary:")
print(defect_summary)

# Write data to CSV for download
fwrite(defect_comparison_data, "data/defect_comparison.csv")

Test H₀: p₁ = p₂ vs H₁: p₁ ≠ p₂ at α = 0.05.

Manual Calculation:

Given: n₁ = 200, x₁ = 16, n₂ = 250, x₂ = 14

Sample Proportions:
- p̂₁ = 16/200 = 0.08
- p̂₂ = 14/250 = 0.056
Pooled Proportion: \hat{p} = \frac{16 + 14}{200 + 250} = \frac{30}{450} = 0.067
Test Statistic: Z_0 = \frac{0.08 - 0.056}{\sqrt{0.067(0.933)(\frac{1}{200} + \frac{1}{250})}} = \frac{0.024}{0.024} = 1.00
Decision: |Z₀| = 1.00 < 1.96, so fail to reject H₀

R Output
R Code

[1] "Large Sample Conditions:"

     Condition     Value    Met
        <char>     <num> <lgcl>
1:     n₁p̂ ≥ 5  13.33333   TRUE
2: n₁(1-p̂) ≥ 5 186.66667   TRUE
3:     n₂p̂ ≥ 5  16.66667   TRUE
4: n₂(1-p̂) ≥ 5 233.33333   TRUE

[1] "Two-Sample Proportion Test Results:"

                    Test_Type    n1    n2    x1    x2 p1_hat p2_hat p_pooled
                       <char> <num> <num> <num> <num>  <num>  <num>    <num>
1: Two-sample proportion test   200   250    16    14   0.08  0.056   0.0667
   SE_Difference Z_Statistic P_Value Alpha Z_Critical          Decision
           <num>       <num>   <num> <num>      <num>            <char>
1:        0.0237      1.0142  0.3105  0.05       1.96 Fail to reject H0
                                 Conclusion
                                     <char>
1: No significant difference in proportions

[1] "R's prop.test verification:"


    2-sample test for equality of proportions with continuity correction

data:  c(x1_prop, x2_prop) out of c(n1_prop, n2_prop)
X-squared = 0.67902, df = 1, p-value = 0.4099
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.02768004  0.07568004
sample estimates:
prop 1 prop 2 
 0.080  0.056

# Two-sample proportion test
n1_prop <- defect_comparison_data[line == "Line_1", sample_size]
n2_prop <- defect_comparison_data[line == "Line_2", sample_size]
x1_prop <- defect_comparison_data[line == "Line_1", defects]
x2_prop <- defect_comparison_data[line == "Line_2", defects]
p1_hat <- x1_prop / n1_prop
p2_hat <- x2_prop / n2_prop

# Pooled proportion
p_pooled <- (x1_prop + x2_prop) / (n1_prop + n2_prop)

# Check conditions
conditions <- data.table(
  Condition = c("n₁p̂ ≥ 5", "n₁(1-p̂) ≥ 5", "n₂p̂ ≥ 5", "n₂(1-p̂) ≥ 5"),
  Value = c(n1_prop * p_pooled, n1_prop * (1 - p_pooled), 
           n2_prop * p_pooled, n2_prop * (1 - p_pooled)),
  Met = c(n1_prop * p_pooled >= 5, n1_prop * (1 - p_pooled) >= 5,
         n2_prop * p_pooled >= 5, n2_prop * (1 - p_pooled) >= 5)
)

print("Large Sample Conditions:")
print(conditions)

# Manual calculation
se_prop <- sqrt(p_pooled * (1 - p_pooled) * (1/n1_prop + 1/n2_prop))
z_stat_prop <- (p1_hat - p2_hat) / se_prop
p_value_prop <- 2 * (1 - pnorm(abs(z_stat_prop)))  # Two-sided test
z_critical_prop <- qnorm(1 - alpha/2)

# Test results
prop_test_results <- data.table(
  Test_Type = "Two-sample proportion test",
  n1 = n1_prop,
  n2 = n2_prop,
  x1 = x1_prop,
  x2 = x2_prop,
  p1_hat = round(p1_hat, 4),
  p2_hat = round(p2_hat, 4),
  p_pooled = round(p_pooled, 4),
  SE_Difference = round(se_prop, 4),
  Z_Statistic = round(z_stat_prop, 4),
  P_Value = round(p_value_prop, 4),
  Alpha = alpha,
  Z_Critical = round(z_critical_prop, 3),
  Decision = ifelse(abs(z_stat_prop) > z_critical_prop, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_prop < alpha, 
                     "Proportions are significantly different", 
                     "No significant difference in proportions")
)

print("Two-Sample Proportion Test Results:")
print(prop_test_results)

# Using R's prop.test for verification
prop_test_r <- prop.test(x = c(x1_prop, x2_prop), n = c(n1_prop, n2_prop))
print("R's prop.test verification:")
print(prop_test_r)

1.6.3 Type II Error and Choice of Sample Size

Sample Size for Two-Proportion Test

For equal sample sizes (n₁ = n₂ = n) and difference δ = |p₁ - p₂|:

n = \frac{[z_{\alpha/2}\sqrt{2\bar{p}(1-\bar{p})} + z_\beta\sqrt{p_1(1-p_1) + p_2(1-p_2)}]^2}{\delta^2}

where \bar{p} = \frac{p_1 + p_2}{2}

1.6.4 Confidence Interval on the Difference in Proportions

Confidence Interval for p₁ - p₂

A 100(1-α)% confidence interval for p₁ - p₂:

(\hat{p_1} - \hat{p_2}) \pm z_{\alpha/2}\sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}}

Note: Use individual sample proportions, not the pooled estimate.

R Output
R Code

[1] "95% Confidence Interval for Difference in Proportions:"

   Parameter Confidence_Level Difference Standard_Error Z_Critical Margin_Error
      <char>           <char>      <num>          <num>      <num>        <num>
1:   p₁ - p₂              95%      0.024         0.0241       1.96       0.0472
   Lower_Bound Upper_Bound  Width
         <num>       <num>  <num>
1:     -0.0232      0.0712 0.0944
                                             Interpretation
                                                     <char>
1: 95% confident that p₁ - p₂ is between -0.0232 and 0.0712

# Confidence interval for difference in proportions
se_diff_ci <- sqrt(p1_hat * (1 - p1_hat) / n1_prop + p2_hat * (1 - p2_hat) / n2_prop)
margin_error_prop <- z_critical_prop * se_diff_ci
ci_lower_prop <- (p1_hat - p2_hat) - margin_error_prop
ci_upper_prop <- (p1_hat - p2_hat) + margin_error_prop

prop_ci_results <- data.table(
  Parameter = "p₁ - p₂",
  Confidence_Level = "95%",
  Difference = round(p1_hat - p2_hat, 4),
  Standard_Error = round(se_diff_ci, 4),
  Z_Critical = round(z_critical_prop, 3),
  Margin_Error = round(margin_error_prop, 4),
  Lower_Bound = round(ci_lower_prop, 4),
  Upper_Bound = round(ci_upper_prop, 4),
  Width = round(ci_upper_prop - ci_lower_prop, 4),
  Interpretation = paste("95% confident that p₁ - p₂ is between", 
                        round(ci_lower_prop, 4), "and", round(ci_upper_prop, 4))
)

print("95% Confidence Interval for Difference in Proportions:")
print(prop_ci_results)

1.7 Summary Tables for Inference Procedures for Two Samples

R Output
R Code

Table 1: Summary of two-sample inference procedures

Summary of Two-Sample Inference Procedures
Parameter	Test_Statistic	Distribution	Confidence_Interval	Assumptions
μ₁ - μ₂ (σ known)	Z = (x̄₁-x̄₂)/√(σ₁²/n₁+σ₂²/n₂)	N(0,1)	(x̄₁-x̄₂) ± z_{α/2}√(σ₁²/n₁+σ₂²/n₂)	Normal populations, σ known
μ₁ - μ₂ (σ unknown, equal)	t = (x̄₁-x̄₂)/(sp√(1/n₁+1/n₂))	t(n₁+n₂-2)	(x̄₁-x̄₂) ± t_{α/2}sp√(1/n₁+1/n₂)	Normal populations, σ₁²=σ₂²
μ₁ - μ₂ (σ unknown, unequal)	t = (x̄₁-x̄₂)/√(s₁²/n₁+s₂²/n₂)	t(ν)	(x̄₁-x̄₂) ± t_{α/2}√(s₁²/n₁+s₂²/n₂)	Normal populations, σ₁²≠σ₂²
μ_d (paired)	t = d̄/(sd/√n)	t(n-1)	d̄ ± t_{α/2}sd/√n	Normal differences
σ₁²/σ₂²	F = s₁²/s₂²	F(n₁-1,n₂-1)	(s₁²/s₂²)/F_{α/2} to (s₁²/s₂²)/F_{1-α/2}	Normal populations
p₁ - p₂	Z = (p̂₁-p̂₂)/√(p̂(1-p̂)(1/n₁+1/n₂))	N(0,1)	(p̂₁-p̂₂) ± z_{α/2}√(p̂₁(1-p̂₁)/n₁+p̂₂(1-p̂₂)/n₂)	Large samples, np≥5

Table 2

# Summary table of two-sample inference procedures
inference_procedures <- data.table(
  Parameter = c("μ₁ - μ₂ (σ known)", "μ₁ - μ₂ (σ unknown, equal)", 
               "μ₁ - μ₂ (σ unknown, unequal)", "μ_d (paired)",
               "σ₁²/σ₂²", "p₁ - p₂"),
  Test_Statistic = c("Z = (x̄₁-x̄₂)/√(σ₁²/n₁+σ₂²/n₂)", 
                    "t = (x̄₁-x̄₂)/(sp√(1/n₁+1/n₂))",
                    "t = (x̄₁-x̄₂)/√(s₁²/n₁+s₂²/n₂)",
                    "t = d̄/(sd/√n)",
                    "F = s₁²/s₂²",
                    "Z = (p̂₁-p̂₂)/√(p̂(1-p̂)(1/n₁+1/n₂))"),
  Distribution = c("N(0,1)", "t(n₁+n₂-2)", "t(ν)", "t(n-1)", "F(n₁-1,n₂-1)", "N(0,1)"),
  Confidence_Interval = c("(x̄₁-x̄₂) ± z_{α/2}√(σ₁²/n₁+σ₂²/n₂)",
                         "(x̄₁-x̄₂) ± t_{α/2}sp√(1/n₁+1/n₂)",
                         "(x̄₁-x̄₂) ± t_{α/2}√(s₁²/n₁+s₂²/n₂)",
                         "d̄ ± t_{α/2}sd/√n",
                         "(s₁²/s₂²)/F_{α/2} to (s₁²/s₂²)/F_{1-α/2}",
                         "(p̂₁-p̂₂) ± z_{α/2}√(p̂₁(1-p̂₁)/n₁+p̂₂(1-p̂₂)/n₂)"),
  Assumptions = c("Normal populations, σ known", "Normal populations, σ₁²=σ₂²",
                 "Normal populations, σ₁²≠σ₂²", "Normal differences",
                 "Normal populations", "Large samples, np≥5")
)

kbl(inference_procedures, 
    caption = "Summary of Two-Sample Inference Procedures") %>%
  kable_styling() %>%
  column_spec(2, width = "3.5cm") %>%
  column_spec(3, width = "2cm") %>%
  column_spec(4, width = "3.5cm") %>%
  column_spec(5, width = "2.5cm")

1.8 What if We Have More than Two Samples?

1.8.1 Completely Randomized Experiment and Analysis of Variance

Introduction to ANOVA

When comparing more than two groups, multiple t-tests lead to inflated Type I error rates. Analysis of Variance (ANOVA) provides a single test for:

H₀: μ₁ = μ₂ = … = μₐ

H₁: At least one mean differs

One-Way ANOVA Model:

y_{ij} = \mu + \tau_i + \epsilon_{ij}

where:

y_{ij} = j-th observation in i-th treatment
μ = overall mean
τᵢ = effect of i-th treatment
εᵢⱼ \sim N(0, σ²)

F-Statistic:

F_0 = \frac{MS_{Treatments}}{MS_{Error}} = \frac{SS_{Treatments}/(a-1)}{SS_{Error}/(N-a)}

1.8.2 Example: Comparing Multiple Processes

Example

Compare the yield from four different chemical processes:

[1] "Process Yield Data Summary by Process:"

     process     n      Min       Q1   Median     Mean       Q3      Max
      <char> <int>    <num>    <num>    <num>    <num>    <num>    <num>
1: Process_A     6 71.95873 73.73665 75.00213 75.90529 77.54712 81.75850
2: Process_B     6 74.64733 76.58680 79.26790 78.91072 81.56075 82.26307
3: Process_C     6 68.06958 68.56533 70.23732 70.18130 71.91936 72.05552
4: Process_D     6 76.94869 79.26128 81.26787 80.64549 82.49767 82.88851
         SD       Var
      <num>     <num>
1: 3.561280 12.682715
2: 3.180471 10.115399
3: 1.927890  3.716761
4: 2.362796  5.582806

[1] "Overall Summary:"

   Total_n Overall_Mean Overall_SD Overall_Var Between_Process_Range
     <int>        <num>      <num>       <num>                 <num>
1:      24      76.4107   4.845237    23.47632               10.4642

# One-way ANOVA example (comparing multiple processes)
set.seed(654)
process_yield_data <- data.table(
  process = rep(c("Process_A", "Process_B", "Process_C", "Process_D"), each = 6),
  yield = c(rnorm(6, mean = 75, sd = 4),   # Process A
           rnorm(6, mean = 78, sd = 4),   # Process B  
           rnorm(6, mean = 72, sd = 4),   # Process C
           rnorm(6, mean = 80, sd = 4))   # Process D
)

# Summary statistics by process
yield_summary <- process_yield_data %>% 
  fgroup_by(process) %>% 
  fsummarise(
    n = fnobs(yield),
    Min = fmin(yield),
    Q1 = fquantile(yield, 0.25),
    Median = fmedian(yield),
    Mean = fmean(yield),
    Q3 = fquantile(yield, 0.75),
    Max = fmax(yield),
    SD = fsd(yield),
    Var = fvar(yield)
  )

print("Process Yield Data Summary by Process:")
print(yield_summary)

# Overall summary
overall_summary <- process_yield_data %>% 
  fsummarise(
    Total_n = fnobs(yield),
    Overall_Mean = fmean(yield),
    Overall_SD = fsd(yield),
    Overall_Var = fvar(yield),
    Between_Process_Range = fmax(yield_summary$Mean) - fmin(yield_summary$Mean)
  )

print("Overall Summary:")
print(overall_summary)

# Write data to CSV for download
fwrite(process_yield_data, "data/process_yield.csv")

ANOVA Table:

Source	SS	df	MS	F	P-value
Total	SSₜ	N-1
Treatments	SSₜᵣ	a-1	MSₜᵣ	F₀	P
Error	SSₑ	N-a	MSₑ

R Output
R Code

[1] "One-Way ANOVA Table:"

       Source     SS    df     MS F_Statistic P_Value
       <char>  <num> <num>  <num>       <num>   <num>
1: Treatments 379.47     3 126.49      15.763       0
2:      Error 160.49    20   8.02          NA      NA
3:      Total 539.96    23     NA          NA      NA

[1] "ANOVA Test Results:"

   F_Statistic F_Critical P_Value Alpha  Decision
         <num>      <num>   <num> <num>    <char>
1:      15.763      3.098       0  0.05 Reject H0
                          Conclusion
                              <char>
1: At least one process mean differs

[1] "R's ANOVA verification:"

            Df Sum Sq Mean Sq F value  Pr(>F)    
process      3  379.5  126.49   15.76 1.7e-05 ***
Residuals   20  160.5    8.02                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# One-way ANOVA calculations (manual and R)
# ANOVA calculations
a <- length(unique(process_yield_data$process))  # number of treatments
N <- fnobs(process_yield_data$yield)  # total observations
n_per_group <- N / a  # observations per group (balanced design)

# Calculate sum of squares
grand_mean <- fmean(process_yield_data$yield)

# Treatment sum of squares (SST)
group_means <- yield_summary$Mean
SST <- n_per_group * sum((group_means - grand_mean)^2)

# Error sum of squares (SSE) 
SSE <- sum(yield_summary$Var * (yield_summary$n - 1))

# Total sum of squares (SS_Total)
SS_Total <- sum((process_yield_data$yield - grand_mean)^2)

# Degrees of freedom
df_treatments <- a - 1
df_error <- N - a
df_total <- N - 1

# Mean squares
MS_treatments <- SST / df_treatments
MS_error <- SSE / df_error

# F-statistic
f_stat_anova <- MS_treatments / MS_error
p_value_anova <- 1 - pf(f_stat_anova, df_treatments, df_error)

# ANOVA table
anova_table <- data.table(
  Source = c("Treatments", "Error", "Total"),
  SS = c(round(SST, 2), round(SSE, 2), round(SS_Total, 2)),
  df = c(df_treatments, df_error, df_total),
  MS = c(round(MS_treatments, 2), round(MS_error, 2), NA),
  F_Statistic = c(round(f_stat_anova, 4), NA, NA),
  P_Value = c(round(p_value_anova, 4), NA, NA)
)

print("One-Way ANOVA Table:")
print(anova_table)

# ANOVA conclusion
alpha_anova <- 0.05
f_critical_anova <- qf(1 - alpha_anova, df_treatments, df_error)
anova_conclusion <- data.table(
  F_Statistic = round(f_stat_anova, 4),
  F_Critical = round(f_critical_anova, 3),
  P_Value = round(p_value_anova, 4),
  Alpha = alpha_anova,
  Decision = ifelse(f_stat_anova > f_critical_anova, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_anova < alpha_anova,
                     "At least one process mean differs",
                     "No significant difference between process means")
)

print("ANOVA Test Results:")
print(anova_conclusion)

# Using R's aov function for verification
anova_r <- aov(yield ~ process, data = process_yield_data)
anova_summary_r <- summary(anova_r)
print("R's ANOVA verification:")
print(anova_summary_r)

R Output
R Code

# One-way ANOVA visualization
# Box plot by process
p1 <- ggplot(data = process_yield_data, mapping = aes(x = process, y = yield, fill = process)) +
  geom_boxplot(alpha = 0.7) +
  geom_point(position = position_jitter(width = 0.2), alpha = 0.6) +
  stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "red") +
  geom_hline(yintercept = grand_mean, linetype = "dashed", color = "blue") +
  labs(
    x = "Process",
    y = "Yield",
    title = "Yield by Process",
    subtitle = "Red diamonds = group means, blue line = grand mean"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "none"
  )

# Residuals vs fitted plot
fitted_values <- rep(group_means, each = 6)
residuals <- process_yield_data$yield - fitted_values
residual_data <- data.table(fitted = fitted_values, residuals = residuals, process = process_yield_data$process)

p2 <- ggplot(data = residual_data, mapping = aes(x = fitted, y = residuals)) +
  geom_point(mapping = aes(color = process), size = 3, alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    x = "Fitted Values",
    y = "Residuals",
    title = "Residuals vs Fitted Values",
    color = "Process"
  ) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Combine plots
grid.arrange(p1, p2, nrow = 1, ncol = 2)

1.8.3 Randomized Complete Block Experiment

Randomized Complete Block Design

When experimental units are heterogeneous, blocking reduces error variance:

Model:

y_{ij} = \mu + \tau_i + \beta_j + \epsilon_{ij}

where:

τᵢ = treatment effect
βⱼ = block effect
One observation per treatment-block combination

Advantages:

Controls for known sources of variation
Increases precision of treatment comparisons
Reduces experimental error

1.8.4 Example: Blocked Experiment

Example

Test four coating methods on different types of metal substrates (blocks):

[1] "Coating Experiment Data Summary by Coating:"

     coating     n     Mean       SD      Min      Max
      <char> <int>    <num>    <num>    <num>    <num>
1: Coating_A     5 85.34018 5.821115 78.63352 93.27538
2: Coating_B     5 85.93684 6.051341 79.31872 91.13382
3: Coating_C     5 81.46412 7.616026 72.13510 91.27308
4: Coating_D     5 90.18421 4.988870 82.43243 95.37133

[1] "Summary by Substrate (Block):"

   substrate     n     Mean       SD      Min      Max
      <char> <int>    <num>    <num>    <num>    <num>
1:  Aluminum     4 78.79708 4.624031 72.13510 82.43243
2:    Copper     4 91.60979 2.779976 88.66094 95.37133
3:    Nickel     4 80.94072 6.287063 75.63415 90.04449
4:     Steel     4 86.92283 2.550402 84.62341 89.42896
5:  Titanium     4 90.38625 4.640854 83.65485 93.64384

# Randomized complete block design example (coating experiment)
set.seed(987)
coating_data <- data.table(
  coating = rep(c("Coating_A", "Coating_B", "Coating_C", "Coating_D"), 5),
  substrate = rep(c("Steel", "Aluminum", "Copper", "Titanium", "Nickel"), each = 4),
  adhesion = c(
    # Steel substrate
    rnorm(4, mean = c(85, 88, 82, 90), sd = 3),
    # Aluminum substrate  
    rnorm(4, mean = c(78, 82, 75, 85), sd = 3),
    # Copper substrate
    rnorm(4, mean = c(92, 95, 89, 98), sd = 3),
    # Titanium substrate
    rnorm(4, mean = c(88, 91, 85, 94), sd = 3),
    # Nickel substrate
    rnorm(4, mean = c(80, 83, 77, 87), sd = 3)
  )
)

# Summary by coating
coating_summary <- coating_data %>% 
  fgroup_by(coating) %>% 
  fsummarise(
    n = fnobs(adhesion),
    Mean = fmean(adhesion),
    SD = fsd(adhesion),
    Min = fmin(adhesion),
    Max = fmax(adhesion)
  )

# Summary by substrate (block)
substrate_summary <- coating_data %>% 
  fgroup_by(substrate) %>% 
  fsummarise(
    n = fnobs(adhesion),
    Mean = fmean(adhesion),
    SD = fsd(adhesion),
    Min = fmin(adhesion),
    Max = fmax(adhesion)
  )

print("Coating Experiment Data Summary by Coating:")
print(coating_summary)
print("Summary by Substrate (Block):")
print(substrate_summary)

# Write data to CSV for download
fwrite(coating_data, "data/coating_experiment.csv")

ANOVA Table for RCBD:

Source	SS	df	MS	F	P-value
Total	SSₜ	ab-1
Treatments	SSₜᵣ	a-1	MSₜᵣ	MSₜᵣ/MSₑ	P₁
Blocks	SSᵦₗ	b-1	MSᵦₗ	MSᵦₗ/MSₑ	P₂
Error	SSₑ	(a-1)(b-1)	MSₑ

R Output
R Code

[1] "RCBD ANOVA Table:"

       Source     SS    df     MS F_Statistic P_Value
       <char>  <num> <num>  <num>       <num>   <num>
1:   Coatings 191.16     3  63.72      7.7335  0.0039
2: Substrates 514.71     4 128.68     15.6170  0.0001
3:      Error  98.88    12   8.24          NA      NA
4:      Total 804.75    19     NA          NA      NA

[1] "RCBD Test Results:"

             Effect F_Statistic P_Value  Decision
             <char>       <num>   <num>    <char>
1:   Coating Effect      7.7335  0.0039 Reject H0
2: Substrate Effect     15.6170  0.0001 Reject H0
                             Conclusion
                                 <char>
1:   Coating means differ significantly
2: Substrate means differ significantly

[1] "R's RCBD ANOVA verification:"

            Df Sum Sq Mean Sq F value   Pr(>F)    
coating      3  191.2   63.72   7.733 0.003871 ** 
substrate    4  514.7  128.68  15.617 0.000106 ***
Residuals   12   98.9    8.24                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# RCBD ANOVA calculations
a_rcbd <- length(unique(coating_data$coating))  # treatments
b_rcbd <- length(unique(coating_data$substrate))  # blocks  
N_rcbd <- fnobs(coating_data$adhesion)

# Calculate sum of squares for RCBD
grand_mean_rcbd <- fmean(coating_data$adhesion)

# Treatment sum of squares
treatment_means <- coating_summary$Mean
SS_treatments_rcbd <- b_rcbd * sum((treatment_means - grand_mean_rcbd)^2)

# Block sum of squares  
block_means <- substrate_summary$Mean
SS_blocks_rcbd <- a_rcbd * sum((block_means - grand_mean_rcbd)^2)

# Total sum of squares
SS_total_rcbd <- sum((coating_data$adhesion - grand_mean_rcbd)^2)

# Error sum of squares (by subtraction)
SS_error_rcbd <- SS_total_rcbd - SS_treatments_rcbd - SS_blocks_rcbd

# Degrees of freedom
df_treatments_rcbd <- a_rcbd - 1
df_blocks_rcbd <- b_rcbd - 1  
df_error_rcbd <- (a_rcbd - 1) * (b_rcbd - 1)
df_total_rcbd <- N_rcbd - 1

# Mean squares
MS_treatments_rcbd <- SS_treatments_rcbd / df_treatments_rcbd
MS_blocks_rcbd <- SS_blocks_rcbd / df_blocks_rcbd
MS_error_rcbd <- SS_error_rcbd / df_error_rcbd

# F-statistics
f_treatments_rcbd <- MS_treatments_rcbd / MS_error_rcbd
f_blocks_rcbd <- MS_blocks_rcbd / MS_error_rcbd

# P-values
p_treatments_rcbd <- 1 - pf(f_treatments_rcbd, df_treatments_rcbd, df_error_rcbd)
p_blocks_rcbd <- 1 - pf(f_blocks_rcbd, df_blocks_rcbd, df_error_rcbd)

# RCBD ANOVA table
rcbd_anova_table <- data.table(
  Source = c("Coatings", "Substrates", "Error", "Total"),
  SS = c(round(SS_treatments_rcbd, 2), round(SS_blocks_rcbd, 2), 
         round(SS_error_rcbd, 2), round(SS_total_rcbd, 2)),
  df = c(df_treatments_rcbd, df_blocks_rcbd, df_error_rcbd, df_total_rcbd),
  MS = c(round(MS_treatments_rcbd, 2), round(MS_blocks_rcbd, 2), 
         round(MS_error_rcbd, 2), NA),
  F_Statistic = c(round(f_treatments_rcbd, 4), round(f_blocks_rcbd, 4), NA, NA),
  P_Value = c(round(p_treatments_rcbd, 4), round(p_blocks_rcbd, 4), NA, NA)
)

print("RCBD ANOVA Table:")
print(rcbd_anova_table)

# RCBD conclusions
rcbd_conclusions <- data.table(
  Effect = c("Coating Effect", "Substrate Effect"),
  F_Statistic = c(round(f_treatments_rcbd, 4), round(f_blocks_rcbd, 4)),
  P_Value = c(round(p_treatments_rcbd, 4), round(p_blocks_rcbd, 4)),
  Decision = c(
    ifelse(p_treatments_rcbd < alpha_anova, "Reject H0", "Fail to reject H0"),
    ifelse(p_blocks_rcbd < alpha_anova, "Reject H0", "Fail to reject H0")
  ),
  Conclusion = c(
    ifelse(p_treatments_rcbd < alpha_anova, "Coating means differ significantly", "No significant coating effect"),
    ifelse(p_blocks_rcbd < alpha_anova, "Substrate means differ significantly", "No significant substrate effect")
  )
)

print("RCBD Test Results:")
print(rcbd_conclusions)

# Using R's aov for verification
rcbd_r <- aov(adhesion ~ coating + substrate, data = coating_data)
rcbd_summary_r <- summary(rcbd_r)
print("R's RCBD ANOVA verification:")
print(rcbd_summary_r)

R Output
R Code

Figure 4: Randomized complete block design visualization

# RCBD visualization
# Interaction plot
interaction_plot_data <- coating_data %>%
  fgroup_by(coating, substrate) %>%
  fsummarise(mean_adhesion = fmean(adhesion))

p1_rcbd <- ggplot(data = interaction_plot_data, mapping = aes(x = coating, y = mean_adhesion, 
                                                             color = substrate, group = substrate)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  labs(
    x = "Coating Type",
    y = "Mean Adhesion",
    color = "Substrate",
    title = "Interaction Plot: Coating × Substrate"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Box plot by coating with substrate grouping
p2_rcbd <- ggplot(data = coating_data, mapping = aes(x = coating, y = adhesion, fill = substrate)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    x = "Coating Type",
    y = "Adhesion",
    fill = "Substrate",
    title = "Adhesion by Coating and Substrate"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Combine plots
grid.arrange(p1_rcbd, p2_rcbd, nrow = 2, ncol = 1)

1.9 Comprehensive Example: Quality Improvement Study

Comprehensive Example

A manufacturing company conducts a comprehensive quality study comparing two suppliers, testing before/after process improvements, and evaluating multiple quality metrics:

[1] "Comprehensive Quality Study Summary by Supplier:"

     supplier     n Mean_Before Mean_After Mean_Improvement SD_Before SD_After
       <char> <int>       <num>      <num>            <num>     <num>    <num>
1: Supplier_A    30    83.49488   92.93163         9.436750  8.242329 5.746158
2: Supplier_B    30    82.31349   87.60750         5.294008 11.587749 8.813704
   SD_Improvement Defect_Rate_Before Defect_Rate_After Var_Before Var_After
            <num>              <num>             <num>      <num>     <num>
1:       12.23185         0.16666667        0.00000000   67.93598  33.01834
2:       15.98305         0.06666667        0.03333333  134.27592  77.68137

[1] "Overall Quality Summary:"

   Total_n Overall_Mean_Before Overall_Mean_After Overall_Improvement
     <int>               <num>              <num>               <num>
1:      60            82.90418           90.26956            7.365379
   Overall_Defect_Before Overall_Defect_After
                   <num>                <num>
1:             0.1166667           0.01666667

# Comprehensive quality improvement study
set.seed(111)
comprehensive_quality_data <- data.table(
  part_id = 1:60,
  supplier = rep(c("Supplier_A", "Supplier_B"), each = 30),
  before_improvement = c(rnorm(30, mean = 85, sd = 8), rnorm(30, mean = 82, sd = 9)),
  after_improvement = c(rnorm(30, mean = 92, sd = 7), rnorm(30, mean = 88, sd = 8)),
  defective_before = sample(c(0, 1), 60, replace = TRUE, prob = c(0.88, 0.12)),
  defective_after = sample(c(0, 1), 60, replace = TRUE, prob = c(0.94, 0.06))
)

# Calculate improvements
comprehensive_quality_data[, improvement := after_improvement - before_improvement]

# Comprehensive summary
quality_comprehensive_summary <- comprehensive_quality_data %>% 
  fgroup_by(supplier) %>% 
  fsummarise(
    n = fnobs(part_id),
    # Before/After means
    Mean_Before = fmean(before_improvement),
    Mean_After = fmean(after_improvement),
    Mean_Improvement = fmean(improvement),
    # Standard deviations
    SD_Before = fsd(before_improvement),
    SD_After = fsd(after_improvement),
    SD_Improvement = fsd(improvement),
    # Defect rates
    Defect_Rate_Before = fmean(defective_before),
    Defect_Rate_After = fmean(defective_after),
    # Variance comparison
    Var_Before = fvar(before_improvement),
    Var_After = fvar(after_improvement)
  )

print("Comprehensive Quality Study Summary by Supplier:")
print(quality_comprehensive_summary)

# Overall summary
overall_quality_summary <- comprehensive_quality_data %>% 
  fsummarise(
    Total_n = fnobs(part_id),
    Overall_Mean_Before = fmean(before_improvement),
    Overall_Mean_After = fmean(after_improvement),
    Overall_Improvement = fmean(improvement),
    Overall_Defect_Before = fmean(defective_before),
    Overall_Defect_After = fmean(defective_after)
  )

print("Overall Quality Summary:")
print(overall_quality_summary)

# Write data to CSV for download
fwrite(comprehensive_quality_data, "data/comprehensive_quality.csv")

Complete Analysis:

Compare supplier means (independent samples)
Test variance equality between suppliers
Before/after comparison (paired test)
Compare defect proportions
Multi-factor ANOVA

R Output
R Code

Error in select(., part_id, supplier, before_improvement, after_improvement): could not find function "select"

Error: object 'before_after_long' not found

Error in select(., supplier, before_improvement, after_improvement): could not find function "select"

Error: object 'variance_data' not found

Error in select(., supplier, Rate_Before, Rate_After): could not find function "select"

Error: object 'defect_summary_comp' not found

Error: object 'p2_comp' not found

# Comprehensive quality analysis dashboard

# 1. Supplier comparison (before improvement)
p1_comp <- ggplot(data = comprehensive_quality_data, 
                  mapping = aes(x = supplier, y = before_improvement, fill = supplier)) +
  geom_boxplot(alpha = 0.7) +
  geom_point(position = position_jitter(width = 0.2), alpha = 0.4) +
  labs(x = "Supplier", y = "Quality Score", title = "Quality Before Improvement") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")

# 2. Before/After comparison (paired)
before_after_long <- comprehensive_quality_data %>%
  select(part_id, supplier, before_improvement, after_improvement) %>%
  pivot_longer(cols = c(before_improvement, after_improvement), 
               names_to = "period", values_to = "quality") %>%
  data.table()

p2_comp <- ggplot(data = before_after_long, 
                  mapping = aes(x = period, y = quality, group = part_id)) +
  geom_line(alpha = 0.3) +
  geom_point(mapping = aes(color = period), alpha = 0.6) +
  facet_wrap(~supplier) +
  labs(x = "Period", y = "Quality Score", title = "Before/After Comparison by Supplier") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# 3. Variance comparison
variance_data <- comprehensive_quality_data %>%
  select(supplier, before_improvement, after_improvement) %>%
  pivot_longer(cols = c(before_improvement, after_improvement),
               names_to = "period", values_to = "quality") %>%
  data.table()

p3_comp <- ggplot(data = variance_data, mapping = aes(x = quality, fill = period)) +
  geom_density(alpha = 0.6) +
  facet_wrap(~supplier) +
  labs(x = "Quality Score", y = "Density", title = "Distribution Comparison", fill = "Period") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# 4. Defect rate comparison
defect_summary_comp <- comprehensive_quality_data %>%
  fgroup_by(supplier) %>%
  fsummarise(
    Before = fsum(defective_before),
    After = fsum(defective_after),
    n = fnobs(part_id)
  ) %>%
  mutate(
    Rate_Before = Before / n,
    Rate_After = After / n
  ) %>%
  select(supplier, Rate_Before, Rate_After) %>%
  pivot_longer(cols = c(Rate_Before, Rate_After), names_to = "period", values_to = "rate") %>%
  data.table()

p4_comp <- ggplot(data = defect_summary_comp, 
                  mapping = aes(x = supplier, y = rate, fill = period)) +
  geom_col(position = "dodge", alpha = 0.7) +
  labs(x = "Supplier", y = "Defect Rate", title = "Defect Rate Comparison", fill = "Period") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Combine all plots
grid.arrange(p1_comp, p2_comp, p3_comp, p4_comp, nrow = 2, ncol = 2)

1.10 Key Formulas and Decision Framework

Essential Formulas Summary

Two-Sample Z-Test (σ known):

Z_0 = \frac{(\overline{X_1} - \overline{X_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

Two-Sample t-Test (σ unknown, equal variances):

T_0 = \frac{\overline{X_1} - \overline{X_2}}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}

Paired t-Test:

T_0 = \frac{\overline{d} - \mu_d}{S_d/\sqrt{n}}

F-Test for Variances:

F_0 = \frac{S_1^2}{S_2^2}

Two-Proportion Z-Test:

Z_0 = \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}

One-Way ANOVA:

F_0 = \frac{MS_{Treatments}}{MS_{Error}}

R Output
R Code

Table 3: Decision framework for choosing appropriate test

Decision Framework for Choosing Appropriate Two-Sample Test
Situation	Test_Method	Key_Assumptions	When_to_Use
Two independent groups, σ known	Two-sample Z-test	Normal populations, σ₁, σ₂ known	Population SDs known from historical data
Two independent groups, σ unknown, equal variances	Two-sample t-test (pooled)	Normal populations, σ₁² = σ₂²	Most common two-sample situation
Two independent groups, σ unknown, unequal variances	Welch's t-test	Normal populations	When variances clearly unequal
Paired/matched observations	Paired t-test	Normal differences	Before/after, matched pairs
Compare two variances	F-test	Normal populations	Check equal variance assumption
Compare two proportions	Two-proportion Z-test	Large samples	Compare success rates
Compare more than two groups	One-way ANOVA	Normal populations, equal variances	Multiple groups to compare
Compare groups with blocking	RCBD ANOVA	Normal populations, additive model	Control for known nuisance factors

Table 4

# Decision framework for choosing appropriate test
decision_framework <- data.table(
  Situation = c("Two independent groups, σ known", "Two independent groups, σ unknown, equal variances",
               "Two independent groups, σ unknown, unequal variances", "Paired/matched observations",
               "Compare two variances", "Compare two proportions",
               "Compare more than two groups", "Compare groups with blocking"),
  Test_Method = c("Two-sample Z-test", "Two-sample t-test (pooled)", 
                 "Welch's t-test", "Paired t-test",
                 "F-test", "Two-proportion Z-test",
                 "One-way ANOVA", "RCBD ANOVA"),
  Key_Assumptions = c("Normal populations, σ₁, σ₂ known", "Normal populations, σ₁² = σ₂²",
                     "Normal populations", "Normal differences",
                     "Normal populations", "Large samples",
                     "Normal populations, equal variances", "Normal populations, additive model"),
  When_to_Use = c("Population SDs known from historical data", "Most common two-sample situation",
                 "When variances clearly unequal", "Before/after, matched pairs",
                 "Check equal variance assumption", "Compare success rates",
                 "Multiple groups to compare", "Control for known nuisance factors")
)

kbl(decision_framework, 
    caption = "Decision Framework for Choosing Appropriate Two-Sample Test") %>%
  kable_styling() %>%
  column_spec(1, width = "3cm") %>%
  column_spec(2, width = "2.5cm") %>%
  column_spec(3, width = "3cm") %>%
  column_spec(4, width = "3cm")

1.11 Sample Size and Power Analysis

R Output
R Code

[1] "Sample Size Requirements for Two-Sample Tests:"

             Test_Type Alpha Power      Effect_Size Required_n_per_group
                <char> <num> <num>           <char>                <num>
1:   Two-sample t-test  0.05   0.8        δ=5, σ=10                   63
2:   Two-sample t-test  0.05   0.8        δ=3, σ=10                  175
3:   Two-sample t-test  0.05   0.8        δ=1, σ=10                 1570
4: Two-proportion test  0.05   0.8   p₁=0.2, p₂=0.3                  294
5: Two-proportion test  0.05   0.8  p₁=0.1, p₂=0.15                  686
6: Two-proportion test  0.05   0.8 p₁=0.05, p₂=0.10                  435

# Sample size determination for various two-sample tests

# Function for two-sample t-test sample size
sample_size_two_sample_t <- function(alpha, beta, delta, sigma_pooled) {
  t_alpha2 <- qt(1 - alpha/2, df = Inf)  # Use normal approximation for large samples
  t_beta <- qt(1 - beta, df = Inf)
  n <- 2 * ((t_alpha2 + t_beta) * sigma_pooled / delta)^2
  return(ceiling(n))
}

# Function for two-proportion test sample size
sample_size_two_prop <- function(alpha, beta, p1, p2) {
  z_alpha2 <- qnorm(1 - alpha/2)
  z_beta <- qnorm(1 - beta)
  p_bar <- (p1 + p2) / 2
  delta <- abs(p1 - p2)
  
  n <- (z_alpha2 * sqrt(2 * p_bar * (1 - p_bar)) + z_beta * sqrt(p1 * (1 - p1) + p2 * (1 - p2)))^2 / delta^2
  return(ceiling(n))
}

# Sample size examples
sample_size_examples <- data.table(
  Test_Type = c("Two-sample t-test", "Two-sample t-test", "Two-sample t-test",
               "Two-proportion test", "Two-proportion test", "Two-proportion test"),
  Alpha = rep(0.05, 6),
  Power = rep(0.80, 6),
  Effect_Size = c("δ=5, σ=10", "δ=3, σ=10", "δ=1, σ=10",
                 "p₁=0.2, p₂=0.3", "p₁=0.1, p₂=0.15", "p₁=0.05, p₂=0.10"),
  Required_n_per_group = c(
    sample_size_two_sample_t(0.05, 0.20, 5, 10),
    sample_size_two_sample_t(0.05, 0.20, 3, 10),
    sample_size_two_sample_t(0.05, 0.20, 1, 10),
    sample_size_two_prop(0.05, 0.20, 0.2, 0.3),
    sample_size_two_prop(0.05, 0.20, 0.1, 0.15),
    sample_size_two_prop(0.05, 0.20, 0.05, 0.10)
  )
)

print("Sample Size Requirements for Two-Sample Tests:")
print(sample_size_examples)

1.12 Effect Size Analysis

R Output
R Code

Error in parse(text = input): <text>:17:3: unexpected symbol
16:   ),
17: F value
      ^

# Calculate effect sizes for all examples in the chapter

effect_size_summary <- data.table(
  Example = c("Cable Strength (Z-test)", "Plastic Strength (t-test)", 
             "Process Improvement (Paired)", "Defect Rate Comparison", 
             "Instrument Variance", "Process Yield (ANOVA)"),
  Effect_Type = c("Cohen's d", "Cohen's d", "Cohen's d", 
                 "Difference in proportions", "Variance ratio", "Eta-squared"),
  Calculated_Effect = c(
    abs(xbar1 - xbar2) / sqrt((sigma1^2 + sigma2^2)/2),  # Cable strength
    abs(xbar1_t - xbar2_t) / sp,                         # Plastic strength
    abs(d_bar) / s_d,                                    # Process improvement
    abs(p1_hat - p2_hat),                                # Defect rates
    max(s1_f_squared, s2_f_squared) / min(s1_f_squared, s2_f_squared), # Variance ratio
    SST / SS_Total                                       # ANOVA eta-squared
  ),
F value`[1]) < 0.001
  )
)

print("Validation of Manual vs R Calculations:")
print(validation_checks)

# Check if all validations passed
all_passed <- all(validation_checks$Match)
print(paste("All validation checks passed:", all_passed))

if (!all_passed) {
  print("WARNING: Some manual calculations do not match R functions")
  print("Please review the calculations above")
} else {
  print("SUCCESS: All manual calculations match R function results")
}

print(paste0("\n", strrep("=", 80)))
print("CHAPTER 5 COMPLETE - ALL EXAMPLES VERIFIED")
print(strrep("=", 80))

1.13 Chapter Summary

This chapter covered the fundamental concepts of two-sample comparisons in engineering statistics:

Two-sample Z-tests when population variances are known
Two-sample t-tests for unknown variances (equal and unequal)
Paired t-tests for dependent observations
F-tests for comparing variances
Two-proportion tests for comparing success rates
ANOVA for comparing multiple groups
Experimental design concepts (CRD and RCBD)

The key is choosing the appropriate method based on:

Data structure (independent vs. paired)
Knowledge of population parameters
Assumptions about variance equality
Type of data (continuous vs. categorical)