1 Decision Making for a Single Sample

Major Themes of Chapter 4

Statistical Inference: Making decisions about population parameters based on sample data
Point Estimation: Estimating specific values of population parameters
Hypothesis Testing: Testing claims about population parameters
Confidence Intervals: Estimating ranges of plausible values for parameters

Learning Objectives

After careful study of this chapter, you should be able to do the following:

Understand the concepts of statistical inference and point estimation.
Formulate and test statistical hypotheses using appropriate procedures.
Calculate and interpret P-values for hypothesis tests.
Distinguish between one-sided and two-sided hypothesis tests.
Conduct inference on the mean of a population when variance is known and unknown.
Calculate Type II error probabilities and determine sample sizes.
Construct and interpret confidence intervals for means, variances, and proportions.
Perform inference on population variance and population proportions.
Construct prediction intervals and tolerance intervals.
Test for goodness of fit using chi-square tests.

1.1 Statistical Inference

Statistical Inference

Statistical inference is the process of drawing conclusions about a population based on information obtained from a sample. The two main branches are:

Parameter Estimation

Estimating the value(s) of unknown population parameter(s)

Hypothesis Testing

Testing assumptions (hypotheses) about population parameter(s)

The foundation of statistical inference relies on the sampling distribution of statistics computed from the sample.

1.1.1 Example: Quality Control in Manufacturing

R Output
R Code

[1] "Quality Control Data Summary:"

       n Mean_Diameter SD_Diameter Mean_Strength SD_Strength Defect_Rate
   <int>         <num>       <num>         <num>       <num>       <num>
1:    30      10.48587   0.2943092      252.6751    12.52692  0.06666667
   Min_Diameter Max_Diameter Q1_Diameter Q3_Diameter
          <num>        <num>       <num>       <num>
1:     9.910015     11.03607    10.29858    10.64659

library(collapse)
library(data.table)
library(ggplot2)
library(scales)
library(kableExtra)
library(gridExtra)

# Quality control example - statistical inference introduction
set.seed(123)
quality_data <- data.table(
  sample_id = 1:30,
  diameter = rnorm(30, mean = 10.5, sd = 0.3),
  strength = rnorm(30, mean = 250, sd = 15),
  defective = sample(c(0, 1), 30, replace = TRUE, prob = c(0.95, 0.05))
)

# Summary statistics using preferred style
quality_summary <- quality_data %>%
  fsummarise(
    n = fnobs(diameter),
    Mean_Diameter = fmean(diameter),
    SD_Diameter = fsd(diameter),
    Mean_Strength = fmean(strength),
    SD_Strength = fsd(strength),
    Defect_Rate = fmean(defective),
    Min_Diameter = fmin(diameter),
    Max_Diameter = fmax(diameter),
    Q1_Diameter = fquantile(diameter, 0.25),
    Q3_Diameter = fquantile(diameter, 0.75)
  )

print("Quality Control Data Summary:")
print(quality_summary)

1.2 Point Estimation

Point Estimation

A point estimator is a statistic used to estimate a population parameter. The point estimate is the specific numerical value of the estimator for a given sample.

Key Properties of Good Estimators:

Unbiased: E[θ̂] = θ
Efficient: Has minimum variance among all unbiased estimators
Consistent: θ̂ converges to θ as n → ∞
Sufficient: Uses all information in the sample about θ

[1] "Point Estimates:"

       n Sample_Mean Sample_Median Sample_Variance Sample_SD Sample_Range
   <int>       <num>         <num>           <num>     <num>        <num>
1:    10       12.05         12.05      0.09166667  0.302765          0.9
   Sample_IQR
        <num>
1:       0.45

[1] "Unbiasedness Check (1000 simulations):"

   True_Mu Mean_of_Sample_Means   Bias_Mean True_Sigma_Squared
     <num>                <num>       <num>              <num>
1:      12             12.00797 0.007970092               0.25
   Mean_of_Sample_Vars Bias_Variance
                 <num>         <num>
1:           0.2505641  0.0005641324

# Point estimation examples
# Common point estimators and their properties

# Sample data for demonstration
sample_data <- data.table(
  values = c(12.1, 11.8, 12.3, 11.9, 12.5, 12.0, 11.7, 12.2, 12.4, 11.6)
)

# Point estimators
point_estimates <- sample_data %>%
  fsummarise(
    n = fnobs(values),
    Sample_Mean = fmean(values), # Estimator for μ
    Sample_Median = fmedian(values), # Alternative estimator for μ
    Sample_Variance = fvar(values), # Estimator for σ²
    Sample_SD = fsd(values), # Estimator for σ
    Sample_Range = fmax(values) - fmin(values), # Simple variability measure
    Sample_IQR = fquantile(values, 0.75) - fquantile(values, 0.25)
  )

print("Point Estimates:")
print(point_estimates)

# Demonstrate unbiasedness with simulation
set.seed(456)
n_simulations <- 1000
sample_size <- 10
true_mu <- 12
true_sigma <- 0.5

simulation_results <- data.table()
for (i in 1:n_simulations) {
  sample_vals <- rnorm(sample_size, mean = true_mu, sd = true_sigma)
  sample_mean <- mean(sample_vals)
  sample_var <- var(sample_vals)
  simulation_results <- rbind(
    simulation_results,
    data.table(sim = i, x_bar = sample_mean, s_squared = sample_var)
  )
}

# Check unbiasedness
unbiasedness_check <- simulation_results %>%
  fsummarise(
    True_Mu = true_mu,
    Mean_of_Sample_Means = fmean(x_bar),
    Bias_Mean = fmean(x_bar) - true_mu,
    True_Sigma_Squared = true_sigma^2,
    Mean_of_Sample_Vars = fmean(s_squared),
    Bias_Variance = fmean(s_squared) - true_sigma^2
  )

print("Unbiasedness Check (1000 simulations):")
print(unbiasedness_check)

1.3 Hypothesis Testing

1.3.1 Statistical Hypotheses

Statistical Hypotheses

A statistical hypothesis is a statement about the parameter(s) of a population.

Null Hypothesis (H₀): The hypothesis being tested (assumed true until evidence suggests otherwise)
Alternative Hypothesis (H₁ or Hₐ): The hypothesis we conclude if there is sufficient evidence against H₀

Example:

H₀: μ = 50 (The population mean equals 50)
H₁: μ ≠ 50 (The population mean does not equal 50)

1.3.2 Testing Statistical Hypotheses

Hypothesis Testing Framework

Steps in Hypothesis Testing:

State the null and alternative hypotheses
Choose the significance level (α)
Determine the test statistic and its distribution
Define the rejection region
Calculate the test statistic from sample data
Make a decision (reject or fail to reject H₀)
Draw conclusions in context of the problem

Types of Errors:

Type I Error (α): Rejecting H₀ when it is true
Type II Error (β): Failing to reject H₀ when it is false

1.3.3 Example: Hypothesis Testing Setup

R Output
R Code

[1] "Hypothesis Testing Framework:"

   Null_Hypothesis Alternative_Hypothesis Significance_Level Sample_Size
            <char>                 <char>              <num>       <num>
1:     H0: μ = 100            H1: μ ≠ 100               0.05          25
   Population_SD Critical_Z_Value Critical_Lower_Bound Critical_Upper_Bound
           <num>            <num>                <num>                <num>
1:            10             1.96                96.08               103.92
                         Rejection_Rule
                                 <char>
1: Reject H0 if x̄ < 96.08 or x̄ > 103.92

# Hypothesis testing framework demonstration
# Type I and Type II errors illustration

alpha <- 0.05
mu0 <- 100
sigma <- 10
n <- 25

# Critical values for two-sided test
z_critical <- qnorm(1 - alpha / 2)
critical_lower <- mu0 - z_critical * (sigma / sqrt(n))
critical_upper <- mu0 + z_critical * (sigma / sqrt(n))

hypothesis_framework <- data.table(
  Null_Hypothesis = "H0: μ = 100",
  Alternative_Hypothesis = "H1: μ ≠ 100",
  Significance_Level = alpha,
  Sample_Size = n,
  Population_SD = sigma,
  Critical_Z_Value = round(z_critical, 3),
  Critical_Lower_Bound = round(critical_lower, 2),
  Critical_Upper_Bound = round(critical_upper, 2),
  Rejection_Rule = paste("Reject H0 if x̄ <", round(critical_lower, 2), "or x̄ >", round(critical_upper, 2))
)

print("Hypothesis Testing Framework:")
print(hypothesis_framework)

1.3.4 P-Values in Hypothesis Testing

P-Values

The P-value is the probability of observing a test statistic as extreme or more extreme than the observed value, assuming H₀ is true.

Interpretation:

Small P-value (≤ α): Strong evidence against H₀
Large P-value (> α): Insufficient evidence against H₀

Decision Rule: Reject H₀ if P-value ≤ α

1.3.5 One-Sided and Two-Sided Hypotheses

Types of Hypothesis Tests

Two-Sided (Two-Tailed) Test:

H₀: θ = θ₀
H₁: θ ≠ θ₀
Rejection region: |test statistic| > critical value

One-Sided (One-Tailed) Tests:

Upper-tailed:

H₀: θ = θ₀ (or θ ≤ θ₀)
H₁: θ > θ₀
Rejection region: test statistic > critical value

Lower-tailed:

H₀: θ = θ₀ (or θ ≥ θ₀)
H₁: θ < θ₀
Rejection region: test statistic < critical value

1.3.6 General Procedure for Hypothesis Testing

R Output
R Code

Figure 1: Comparison of one-sided and two-sided tests

# Visualization of one-sided vs two-sided tests
x <- seq(-4, 4, length.out = 1000)
y <- dnorm(x)

# Create data for visualization
test_comparison_data <- data.table(
  x = rep(x, 3),
  y = rep(y, 3),
  test_type = rep(c("Two-sided (α = 0.05)", "One-sided Upper (α = 0.05)", "One-sided Lower (α = 0.05)"), each = length(x))
)

# Add critical regions
alpha_vis <- 0.05
z_alpha2 <- qnorm(1 - alpha_vis / 2)
z_alpha <- qnorm(1 - alpha_vis)

ggplot(data = test_comparison_data, mapping = aes(x = x, y = y)) +
  geom_line(linewidth = 1, color = "blue") +
  facet_wrap(~test_type, nrow = 3) +
  geom_area(
    data = test_comparison_data[test_type == "Two-sided (α = 0.05)" & (x <= -z_alpha2 | x >= z_alpha2)],
    mapping = aes(x = x, y = y), fill = "red", alpha = 0.3
  ) +
  geom_area(
    data = test_comparison_data[test_type == "One-sided Upper (α = 0.05)" & x >= z_alpha],
    mapping = aes(x = x, y = y), fill = "red", alpha = 0.3
  ) +
  geom_area(
    data = test_comparison_data[test_type == "One-sided Lower (α = 0.05)" & x <= -z_alpha],
    mapping = aes(x = x, y = y), fill = "red", alpha = 0.3
  ) +
  geom_vline(
    data = data.table(test_type = "Two-sided (α = 0.05)", x = c(-z_alpha2, z_alpha2)),
    mapping = aes(xintercept = x), linetype = "dashed", color = "red"
  ) +
  geom_vline(
    data = data.table(test_type = "One-sided Upper (α = 0.05)", x = z_alpha),
    mapping = aes(xintercept = x), linetype = "dashed", color = "red"
  ) +
  geom_vline(
    data = data.table(test_type = "One-sided Lower (α = 0.05)", x = -z_alpha),
    mapping = aes(xintercept = x), linetype = "dashed", color = "red"
  ) +
  labs(
    x = "Z-score",
    y = "Density",
    title = "Comparison of One-sided and Two-sided Tests",
    subtitle = "Red areas represent rejection regions"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    strip.background = element_rect(fill = "lightgray")
  )

1.4 Inference on the Mean of a Population, Variance Known

1.4.1 Hypothesis Testing on the Mean

Z-Test for Population Mean (σ known)

When the population is normal with known variance σ², or when n is large, the test statistic is:

Z_0 = \frac{\overline{X} - \mu_0}{\sigma/\sqrt{n}}

Under H₀: μ = μ₀, Z₀ ~ N(0,1)

Rejection Regions:

Two-sided: |Z₀| > z_{α/2}
Upper-tailed: Z₀ > z_α
Lower-tailed: Z₀ < -z_α

1.4.2 Example: Fill Volume Testing

Example

A beverage company fills bottles with a target volume of 355 ml. The filling process has a known standard deviation of σ = 1.5 ml. A quality engineer samples 25 bottles and wants to test if the mean fill volume differs from the target.

[1] "Fill Volume Data Summary:"

       n   Min    Q1 Median    Mean    Q3   Max        SD Range
   <int> <num> <num>  <num>   <num> <num> <num>     <num> <num>
1:    25 354.3 354.7  354.9 354.908 355.2 355.5 0.3239341   1.2

# Fill volume testing example (Z-test, sigma known)
fill_volume_data <- data.table(
  bottle_id = 1:25,
  volume = c(
    354.8, 355.2, 354.5, 355.1, 354.9, 355.3, 354.7, 355.0, 354.6, 355.4,
    354.3, 355.5, 354.8, 354.9, 355.2, 354.4, 355.1, 354.7, 355.3, 354.5,
    355.0, 354.8, 355.2, 354.6, 354.9
  )
)

# Summary statistics
fill_summary <- fill_volume_data %>%
  fsummarise(
    n = fnobs(volume),
    Min = fmin(volume),
    Q1 = fquantile(volume, 0.25),
    Median = fmedian(volume),
    Mean = fmean(volume),
    Q3 = fquantile(volume, 0.75),
    Max = fmax(volume),
    SD = fsd(volume),
    Range = fmax(volume) - fmin(volume)
  )

print("Fill Volume Data Summary:")
print(fill_summary)

Manual Calculation:

Given: n = 25, σ = 1.5, μ₀ = 355, α = 0.05

Hypotheses:
- H₀: μ = 355
- H₁: μ ≠ 355
Test Statistic:
- If x̄ = 354.2, then Z₀ = (354.2 - 355)/(1.5/√25) = -2.67
Critical Value: z₀.₀₂₅ = 1.96
Decision: |Z₀| = 2.67 > 1.96, so reject H₀

R Output
R Code

[1] "Z-Test Results for Fill Volume:"

          Test_Type     n Sample_Mean Null_Mean Population_SD Z_Statistic
             <char> <int>       <num>     <num>         <num>       <num>
1: Z-test (σ known)    25     354.908       355           1.5     -0.3067
   P_Value Alpha Z_Critical          Decision                       Conclusion
     <num> <num>      <num>            <char>                           <char>
1:  0.7591  0.05       1.96 Fail to reject H0 Insufficient evidence against H0

[1] "95% Confidence Interval: [ 354.32 ,  355.496 ]"

# Z-test for fill volume (manual and R calculations)
n_fill <- fnobs(fill_volume_data$volume)
xbar_fill <- fmean(fill_volume_data$volume)
sigma_fill <- 1.5 # Known population standard deviation
mu0_fill <- 355
alpha_fill <- 0.05

# Manual calculation
z_stat_fill <- (xbar_fill - mu0_fill) / (sigma_fill / sqrt(n_fill))
p_value_fill <- 2 * (1 - pnorm(abs(z_stat_fill))) # Two-sided test
z_critical_fill <- qnorm(1 - alpha_fill / 2)

# Test results
z_test_results <- data.table(
  Test_Type = "Z-test (σ known)",
  n = n_fill,
  Sample_Mean = round(xbar_fill, 3),
  Null_Mean = mu0_fill,
  Population_SD = sigma_fill,
  Z_Statistic = round(z_stat_fill, 4),
  P_Value = round(p_value_fill, 4),
  Alpha = alpha_fill,
  Z_Critical = round(z_critical_fill, 3),
  Decision = ifelse(abs(z_stat_fill) > z_critical_fill, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_fill < alpha_fill,
    "Significant evidence against H0",
    "Insufficient evidence against H0"
  )
)

print("Z-Test Results for Fill Volume:")
print(z_test_results)

# Confidence interval
margin_error_fill <- z_critical_fill * (sigma_fill / sqrt(n_fill))
ci_lower_fill <- xbar_fill - margin_error_fill
ci_upper_fill <- xbar_fill + margin_error_fill

print(paste("95% Confidence Interval: [", round(ci_lower_fill, 3), ", ", round(ci_upper_fill, 3), "]"))

1.4.3 Type II Error and Choice of Sample Size

Type II Error and Power

Type II Error (β): The probability of failing to reject H₀ when it is false.

Power (1-β): The probability of correctly rejecting a false H₀.

For a two-sided test:

\beta = P\left(-z_{\alpha/2} - \frac{\delta}{\sigma/\sqrt{n}} \leq Z \leq z_{\alpha/2} - \frac{\delta}{\sigma/\sqrt{n}}\right)

where δ = |μ₁ - μ₀| is the difference we want to detect.

Sample Size Formula: n = \frac{(z_{\alpha/2} + z_\beta)^2 \sigma^2}{\delta^2}

1.4.4 Example: Power Analysis

R Output
R Code

Figure 2: Power curves for different sample sizes

# Power curves for different sample sizes
mu_values <- seq(353, 357, by = 0.1)
sample_sizes <- c(10, 25, 50, 100)
alpha_power <- 0.05
sigma_power <- 1.5
mu0_power <- 355

# Function to calculate power for two-sided Z-test
calculate_power_z <- function(mu1, n, mu0, sigma, alpha) {
  delta <- abs(mu1 - mu0)
  z_alpha2 <- qnorm(1 - alpha / 2)
  se <- sigma / sqrt(n)

  # Power calculation for two-sided test
  power_upper <- 1 - pnorm(z_alpha2 - delta / se)
  power_lower <- pnorm(-z_alpha2 - delta / se)
  power_total <- power_upper + power_lower

  return(power_total)
}

# Generate power curve data
power_curve_data <- data.table()
for (n_val in sample_sizes) {
  for (mu in mu_values) {
    power_val <- calculate_power_z(mu, n_val, mu0_power, sigma_power, alpha_power)
    power_curve_data <- rbind(
      power_curve_data,
      data.table(mu = mu, n = factor(n_val), power = power_val)
    )
  }
}

# Plot power curves
ggplot(data = power_curve_data, mapping = aes(x = mu, y = power, color = n)) +
  geom_line(linewidth = 1.2) +
  geom_vline(xintercept = mu0_power, linetype = "dashed", alpha = 0.7) +
  geom_hline(yintercept = 0.8, linetype = "dotted", alpha = 0.7) +
  annotate("text", x = 356.5, y = 0.82, label = "Power = 0.8", hjust = 0) +
  scale_x_continuous(breaks = pretty_breaks(n = 8)) +
  scale_y_continuous(breaks = pretty_breaks(n = 6), limits = c(0, 1)) +
  labs(
    x = "True Mean (μ)",
    y = "Power (1 - β)",
    color = "Sample Size",
    title = "Power Curves for Z-Test (Fill Volume)",
    subtitle = "H₀: μ = 355 vs H₁: μ ≠ 355, α = 0.05, σ = 1.5"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "right"
  )

1.4.5 Large-Sample Test

Large-Sample Test

When n is large (typically n ≥ 30), the Central Limit Theorem allows us to use the normal distribution even when: - The population distribution is unknown - The population variance is unknown (use sample variance s²)

The test statistic becomes: Z_0 = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}

1.4.6 Some Practical Comments on Hypothesis Testing

Practical Considerations

Statistical vs. Practical Significance: A statistically significant result may not be practically important
Effect Size: Consider the magnitude of the difference, not just its significance
Sample Size: Larger samples can detect smaller differences
Assumptions: Verify that test assumptions are met
Multiple Testing: Adjust significance levels when conducting multiple tests

1.4.7 Confidence Interval on the Mean

Confidence Interval for μ (σ known)

A 100(1-α)% confidence interval for μ when σ is known:

\overline{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}

Interpretation: We are 100(1-α)% confident that the true population mean lies within this interval.

1.4.8 Example: Confidence Interval

R Output
R Code

[1] "Confidence Intervals for Different Confidence Levels:"

   Confidence_Level Alpha Z_Critical Margin_Error Lower_Bound Upper_Bound Width
             <char> <num>      <num>        <num>       <num>       <num> <num>
1:              90%  0.10      1.645       0.4935     354.415     355.401 0.987
2:              95%  0.05      1.960       0.5880     354.320     355.496 1.176
3:              99%  0.01      2.576       0.7727     354.135     355.681 1.545

# Confidence interval examples for different confidence levels
confidence_levels <- c(0.90, 0.95, 0.99)
ci_results <- data.table()

for (conf_level in confidence_levels) {
  alpha_ci <- 1 - conf_level
  z_crit <- qnorm(1 - alpha_ci / 2)
  margin_error <- z_crit * (sigma_fill / sqrt(n_fill))
  ci_lower <- xbar_fill - margin_error
  ci_upper <- xbar_fill + margin_error
  ci_width <- ci_upper - ci_lower

  ci_results <- rbind(
    ci_results,
    data.table(
      Confidence_Level = paste0(conf_level * 100, "%"),
      Alpha = alpha_ci,
      Z_Critical = round(z_crit, 3),
      Margin_Error = round(margin_error, 4),
      Lower_Bound = round(ci_lower, 3),
      Upper_Bound = round(ci_upper, 3),
      Width = round(ci_width, 3)
    )
  )
}

print("Confidence Intervals for Different Confidence Levels:")
print(ci_results)

1.4.9 General Method for Deriving a Confidence Interval

General CI Method

Steps to derive a confidence interval:

Find a statistic θ̂ that estimates θ
Determine the sampling distribution of θ̂
Find constants a and b such that P(a ≤ θ̂ ≤ b) = 1-α
Rearrange the inequality to isolate θ
The resulting interval is the confidence interval for θ

1.5 Inference on the Mean of a Population, Variance Unknown

1.5.1 Hypothesis Testing on the Mean

t-Test for Population Mean (σ unknown)

When the population is normal with unknown variance, the test statistic is:

T_0 = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}

Under H₀: μ = μ₀, T₀ ~ t_{n-1}

Properties of t-distribution:

Symmetric about 0
More spread than standard normal
Approaches standard normal as df → ∞

1.5.2 Example: Material Strength Testing

Example

An engineer tests the compressive strength of concrete. A sample of 16 specimens yields a mean strength of 3250 psi with a standard deviation of 180 psi. Test if the mean strength differs from 3200 psi.

Manual Calculation:

Given: n = 16, x̄ = 3250, s = 180, μ₀ = 3200, α = 0.05

Hypotheses:
- H₀: μ = 3200
- H₁: μ ≠ 3200
Test Statistic:
- T₀ = (3250 - 3200)/(180/√16) = 50/45 = 1.11
Critical Value: t₀.₀₂₅,₁₅ = 2.131
Decision: |T₀| = 1.11 < 2.131, so fail to reject H₀

[1] "Concrete Strength Data Summary:"

       n   Min     Q1 Median    Mean     Q3   Max      SD      Var  SE_Mean
   <int> <num>  <num>  <num>   <num>  <num> <num>   <num>    <num>    <num>
1:    16  3180 3207.5   3235 3233.75 3262.5  3290 37.3943 1398.333 9.348574

# Concrete strength testing example (t-test, sigma unknown)
concrete_data <- data.table(
  specimen_id = 1:16,
  strength = c(
    3180, 3240, 3290, 3210, 3260, 3190, 3270, 3220,
    3250, 3200, 3280, 3230, 3240, 3180, 3290, 3210
  )
)

# Summary statistics
concrete_summary <- concrete_data %>%
  fsummarise(
    n = fnobs(strength),
    Min = fmin(strength),
    Q1 = fquantile(strength, 0.25),
    Median = fmedian(strength),
    Mean = fmean(strength),
    Q3 = fquantile(strength, 0.75),
    Max = fmax(strength),
    SD = fsd(strength),
    Var = fvar(strength),
    SE_Mean = fsd(strength) / sqrt(fnobs(strength))
  )

print("Concrete Strength Data Summary:")
print(concrete_summary)

R Output
R Code

[1] "t-Test Results for Concrete Strength:"

            Test_Type     n Sample_Mean Sample_SD Null_Mean t_Statistic
               <char> <int>       <num>     <num>     <num>       <num>
1: t-test (σ unknown)    16     3233.75     37.39      3200      3.6102
   Degrees_Freedom P_Value Alpha t_Critical  Decision
             <num>   <num> <num>      <num>    <char>
1:              15  0.0026  0.05      2.131 Reject H0
                         Conclusion
                             <char>
1: Significant difference from 3200

[1] "R's t.test verification:"


    One Sample t-test

data:  concrete_data$strength
t = 3.6102, df = 15, p-value = 0.002571
alternative hypothesis: true mean is not equal to 3200
95 percent confidence interval:
 3213.824 3253.676
sample estimates:
mean of x 
  3233.75

# t-test for concrete strength
n_concrete <- fnobs(concrete_data$strength)
xbar_concrete <- fmean(concrete_data$strength)
s_concrete <- fsd(concrete_data$strength)
mu0_concrete <- 3200
alpha_concrete <- 0.05

# Manual calculation
t_stat_concrete <- (xbar_concrete - mu0_concrete) / (s_concrete / sqrt(n_concrete))
df_concrete <- n_concrete - 1
p_value_concrete <- 2 * (1 - pt(abs(t_stat_concrete), df = df_concrete)) # Two-sided test
t_critical_concrete <- qt(1 - alpha_concrete / 2, df = df_concrete)

# Test results
t_test_results <- data.table(
  Test_Type = "t-test (σ unknown)",
  n = n_concrete,
  Sample_Mean = round(xbar_concrete, 2),
  Sample_SD = round(s_concrete, 2),
  Null_Mean = mu0_concrete,
  t_Statistic = round(t_stat_concrete, 4),
  Degrees_Freedom = df_concrete,
  P_Value = round(p_value_concrete, 4),
  Alpha = alpha_concrete,
  t_Critical = round(t_critical_concrete, 3),
  Decision = ifelse(abs(t_stat_concrete) > t_critical_concrete, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_concrete < alpha_concrete,
    "Significant difference from 3200",
    "No significant difference from 3200"
  )
)

print("t-Test Results for Concrete Strength:")
print(t_test_results)

# Using R's built-in t.test function for verification
t_test_r <- t.test(concrete_data$strength, mu = mu0_concrete, conf.level = 1 - alpha_concrete)
print("R's t.test verification:")
print(t_test_r)

1.5.3 Type II Error and Choice of Sample Size

Power for t-Test

For the t-test, power calculations are more complex due to the non-central t-distribution. Approximate sample size formula:

n \approx \frac{(t_{\alpha/2,\nu} + t_{\beta,\nu})^2 s^2}{\delta^2}

where ν = n-1 degrees of freedom (requires iteration).

1.5.4 Confidence Interval on the Mean

Confidence Interval for μ (σ unknown)

A 100(1-α)% confidence interval for μ when σ is unknown:

\overline{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}

R Output
R Code

[1] "95% Confidence Interval for Concrete Strength Mean:"

             Parameter Confidence_Level Sample_Mean Standard_Error t_Critical
                <char>           <char>       <num>          <num>      <num>
1: Population Mean (μ)              95%     3233.75          9.349      2.131
   Margin_Error Lower_Bound Upper_Bound Width
          <num>       <num>       <num> <num>
1:       19.926     3213.82     3253.68 39.85
                                        Interpretation
                                                <char>
1: 95% confident that μ is between 3213.82 and 3253.68

# Confidence intervals for t-test
t_alpha2_concrete <- qt(1 - alpha_concrete / 2, df = df_concrete)
margin_error_t <- t_alpha2_concrete * (s_concrete / sqrt(n_concrete))
ci_lower_t <- xbar_concrete - margin_error_t
ci_upper_t <- xbar_concrete + margin_error_t

t_ci_results <- data.table(
  Parameter = "Population Mean (μ)",
  Confidence_Level = "95%",
  Sample_Mean = round(xbar_concrete, 2),
  Standard_Error = round(s_concrete / sqrt(n_concrete), 3),
  t_Critical = round(t_alpha2_concrete, 3),
  Margin_Error = round(margin_error_t, 3),
  Lower_Bound = round(ci_lower_t, 2),
  Upper_Bound = round(ci_upper_t, 2),
  Width = round(ci_upper_t - ci_lower_t, 2),
  Interpretation = paste("95% confident that μ is between", round(ci_lower_t, 2), "and", round(ci_upper_t, 2))
)

print("95% Confidence Interval for Concrete Strength Mean:")
print(t_ci_results)

1.6 Inference on the Variance of a Normal Population

1.6.1 Hypothesis Testing on the Variance of a Normal Population

Chi-Square Test for Variance

To test hypotheses about σ², we use:

\chi_0^2 = \frac{(n-1)S^2}{\sigma_0^2}

Under H₀: σ² = σ₀², this follows χ²_{n-1}

Properties of χ² distribution:

Always positive
Right-skewed
Shape depends on degrees of freedom

1.6.2 Example: Process Variability

Example

A manufacturing process should have a variance in part diameter of σ² = 0.01 mm². A sample of 20 parts gives s² = 0.015 mm². Test if the variance exceeds the specification.

Manual Calculation:

Given: n = 20, s² = 0.015, σ₀² = 0.01, α = 0.05

Hypotheses:
- H₀: σ² = 0.01
- H₁: σ² > 0.01
Test Statistic:
- χ₀² = (19)(0.015)/0.01 = 28.5
Critical Value: χ²₀.₀₅,₁₉ = 30.14
Decision: χ₀² = 28.5 < 30.14, so fail to reject H₀

[1] "Part Diameter Data Summary:"

       n   Min    Q1 Median   Mean    Q3   Max         SD         Var Range
   <int> <num> <num>  <num>  <num> <num> <num>      <num>       <num> <num>
1:    20  9.95  9.98 10.005 10.006 10.03 10.07 0.03283131 0.001077895  0.12

# Part diameter variance testing example
part_diameter_data <- data.table(
  part_id = 1:20,
  diameter = c(
    10.02, 9.98, 10.05, 9.97, 10.03, 10.01, 9.99, 10.04, 10.00, 9.96,
    10.07, 9.95, 10.02, 10.01, 9.98, 10.03, 9.99, 10.05, 9.97, 10.00
  )
)

# Summary statistics
diameter_summary <- part_diameter_data %>%
  fsummarise(
    n = fnobs(diameter),
    Min = fmin(diameter),
    Q1 = fquantile(diameter, 0.25),
    Median = fmedian(diameter),
    Mean = fmean(diameter),
    Q3 = fquantile(diameter, 0.75),
    Max = fmax(diameter),
    SD = fsd(diameter),
    Var = fvar(diameter),
    Range = fmax(diameter) - fmin(diameter)
  )

print("Part Diameter Data Summary:")
print(diameter_summary)

R Output
R Code

[1] "Chi-square Test Results for Variance:"

                    Test_Type     n Sample_Variance Sample_SD Null_Variance
                       <char> <int>           <num>     <num>         <num>
1: Chi-square test (variance)    20        0.001078    0.0328          0.01
   Chi2_Statistic Degrees_Freedom P_Value Alpha Chi2_Lower Chi2_Upper  Decision
            <num>           <num>   <num> <num>      <num>      <num>    <char>
1:          2.048              19       0  0.05     8.9065    32.8523 Reject H0
                                   Conclusion
                                       <char>
1: Variance significantly different from 0.01

# Chi-square test for variance
n_diameter <- fnobs(part_diameter_data$diameter)
s2_diameter <- fvar(part_diameter_data$diameter)
sigma2_0 <- 0.01 # Null hypothesis: σ² = 0.01
alpha_chi <- 0.05

# Manual calculation
chi2_stat <- (n_diameter - 1) * s2_diameter / sigma2_0
df_chi <- n_diameter - 1

# P-value for two-sided test
p_value_lower_chi <- pchisq(chi2_stat, df = df_chi)
p_value_upper_chi <- 1 - pchisq(chi2_stat, df = df_chi)
p_value_chi2 <- 2 * min(p_value_lower_chi, p_value_upper_chi)

# Critical values
chi2_lower <- qchisq(alpha_chi / 2, df = df_chi)
chi2_upper <- qchisq(1 - alpha_chi / 2, df = df_chi)

# Test results
chi2_test_results <- data.table(
  Test_Type = "Chi-square test (variance)",
  n = n_diameter,
  Sample_Variance = round(s2_diameter, 6),
  Sample_SD = round(sqrt(s2_diameter), 4),
  Null_Variance = sigma2_0,
  Chi2_Statistic = round(chi2_stat, 4),
  Degrees_Freedom = df_chi,
  P_Value = round(p_value_chi2, 4),
  Alpha = alpha_chi,
  Chi2_Lower = round(chi2_lower, 4),
  Chi2_Upper = round(chi2_upper, 4),
  Decision = ifelse(chi2_stat < chi2_lower | chi2_stat > chi2_upper,
    "Reject H0", "Fail to reject H0"
  ),
  Conclusion = ifelse(p_value_chi2 < alpha_chi,
    "Variance significantly different from 0.01",
    "Variance not significantly different from 0.01"
  )
)

print("Chi-square Test Results for Variance:")
print(chi2_test_results)

1.6.3 Confidence Interval on the Variance of a Normal Population

Confidence Interval for σ²

A 100(1-α)% confidence interval for σ²:

\frac{(n-1)s^2}{\chi_{\alpha/2,n-1}^2} \leq \sigma^2 \leq \frac{(n-1)s^2}{\chi_{1-\alpha/2,n-1}^2}

For σ, take the square root of the endpoints.

R Output
R Code

[1] "95% Confidence Intervals for Variance and Standard Deviation:"

                Parameter Sample_Estimate Confidence_Level Lower_Bound
                   <char>           <num>           <char>       <num>
1:          Variance (σ²)        0.001078              95%    0.000623
2: Standard Deviation (σ)        0.032800              95%    0.025000
   Upper_Bound    Width
         <num>    <num>
1:    0.002299 0.001676
2:    0.048000 0.023000

# Confidence interval for variance and standard deviation
ci_var_lower <- (df_chi * s2_diameter) / chi2_upper
ci_var_upper <- (df_chi * s2_diameter) / chi2_lower
ci_sd_lower <- sqrt(ci_var_lower)
ci_sd_upper <- sqrt(ci_var_upper)

variance_ci_results <- data.table(
  Parameter = c("Variance (σ²)", "Standard Deviation (σ)"),
  Sample_Estimate = c(round(s2_diameter, 6), round(sqrt(s2_diameter), 4)),
  Confidence_Level = c("95%", "95%"),
  Lower_Bound = c(round(ci_var_lower, 6), round(ci_sd_lower, 4)),
  Upper_Bound = c(round(ci_var_upper, 6), round(ci_sd_upper, 4)),
  Width = c(round(ci_var_upper - ci_var_lower, 6), round(ci_sd_upper - ci_sd_lower, 4))
)

print("95% Confidence Intervals for Variance and Standard Deviation:")
print(variance_ci_results)

1.7 Inference on a Population Proportion

1.7.1 Hypothesis Testing on a Binomial Proportion

Large-Sample Test for Proportion

For large n, the test statistic for H₀: p = p₀ is:

Z_0 = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}

where \hat{p} = X/n is the sample proportion.

Rule of thumb: Use when np₀ ≥ 5 and n(1-p₀) ≥ 5

1.7.2 Example: Defect Rate Testing

Example

A supplier claims their defect rate is 3%. An inspector examines 400 items and finds 18 defective. Test if the defect rate exceeds 3%.

Manual Calculation:

Given: n = 400, x = 18, p₀ = 0.03, α = 0.05

Hypotheses:
- H₀: p = 0.03
- H₁: p > 0.03
Sample Proportion:
- p̂ = 18/400 = 0.045
Test Statistic:
- Z₀ = (0.045 - 0.03)/√(0.03×0.97/400) = 0.015/0.0085 = 1.76
Critical Value: z₀.₀₅ = 1.645
Decision: Z₀ = 1.76 > 1.645, so reject H₀

[1] "Defect Rate Data Summary:"

       n     x Sample_Proportion Sample_Percent
   <num> <num>             <num>          <num>
1:   400    18             0.045            4.5

# Defect rate testing example
defect_data <- data.table(
  inspector_id = 1,
  items_examined = 400,
  defective_found = 18,
  p_hat = 18 / 400
)

# Summary
defect_summary <- defect_data %>%
  fsummarise(
    n = items_examined,
    x = defective_found,
    Sample_Proportion = round(fmean(p_hat), 4),
    Sample_Percent = round(fmean(p_hat) * 100, 2)
  )

print("Defect Rate Data Summary:")
print(defect_summary)

R Output
R Code

[1] "Proportion Test Results:"

             Test_Type     n     x Sample_Proportion Null_Proportion   np0
                <char> <num> <num>             <num>           <num> <num>
1: Z-test (proportion)   400    18             0.045            0.03    12
   n_1_p0 Conditions_Met Z_Statistic P_Value Alpha Z_Critical  Decision
    <num>         <lgcl>       <num>   <num> <num>      <num>    <char>
1:    388           TRUE      1.7586  0.0393  0.05      1.645 Reject H0
                             Conclusion
                                 <char>
1: Defect rate significantly exceeds 3%

[1] "R's prop.test verification:"


    1-sample proportions test with continuity correction

data:  x_prop out of n_prop, null probability p0_prop
X-squared = 2.5988, df = 1, p-value = 0.05347
alternative hypothesis: true p is greater than 0.03
95 percent confidence interval:
 0.02977218 1.00000000
sample estimates:
    p 
0.045

# Z-test for proportion
n_prop <- defect_data$items_examined
x_prop <- defect_data$defective_found
p_hat_prop <- x_prop / n_prop
p0_prop <- 0.03 # Null hypothesis: p = 0.03
alpha_prop <- 0.05

# Check conditions for large-sample test
np0 <- n_prop * p0_prop
n_1_p0 <- n_prop * (1 - p0_prop)
conditions_met <- (np0 >= 5) & (n_1_p0 >= 5)

# Manual calculation
z_stat_prop <- (p_hat_prop - p0_prop) / sqrt(p0_prop * (1 - p0_prop) / n_prop)
p_value_prop <- 1 - pnorm(z_stat_prop) # One-sided upper test
z_critical_prop <- qnorm(1 - alpha_prop)

# Test results
prop_test_results <- data.table(
  Test_Type = "Z-test (proportion)",
  n = n_prop,
  x = x_prop,
  Sample_Proportion = round(p_hat_prop, 4),
  Null_Proportion = p0_prop,
  np0 = np0,
  n_1_p0 = n_1_p0,
  Conditions_Met = conditions_met,
  Z_Statistic = round(z_stat_prop, 4),
  P_Value = round(p_value_prop, 4),
  Alpha = alpha_prop,
  Z_Critical = round(z_critical_prop, 3),
  Decision = ifelse(z_stat_prop > z_critical_prop, "Reject H0", "Fail to reject H0"),
  Conclusion = ifelse(p_value_prop < alpha_prop,
    "Defect rate significantly exceeds 3%",
    "No significant evidence that defect rate exceeds 3%"
  )
)

print("Proportion Test Results:")
print(prop_test_results)

# Using R's prop.test for verification
prop_test_r <- prop.test(x = x_prop, n = n_prop, p = p0_prop, alternative = "greater")
print("R's prop.test verification:")
print(prop_test_r)

1.7.3 Type II Error and Choice of Sample Size

Sample Size for Proportion Test

For specified α, β, and difference δ = |p₁ - p₀|:

n = \frac{[z_\alpha\sqrt{p_0(1-p_0)} + z_\beta\sqrt{p_1(1-p_1)}]^2}{\delta^2}

For a two-sided test, replace z_α with z_{α/2}.

1.7.4 Confidence Interval on a Binomial Proportion

Confidence Interval for p

A large-sample 100(1-α)% confidence interval for p:

\hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Wilson Score Interval (better for small samples): \frac{\hat{p} + \frac{z_{\alpha/2}^2}{2n} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^2}{4n^2}}}{1 + \frac{z_{\alpha/2}^2}{n}}

R Output
R Code

[1] "95% Confidence Intervals for Proportion:"

            Method Sample_Proportion Confidence_Level Lower_Bound Upper_Bound
            <char>             <num>           <char>       <num>       <num>
1:     Standard CI             0.045              95%      0.0247      0.0653
2: Wilson Score CI             0.045              95%      0.0287      0.0700
   Lower_Percent Upper_Percent  Width
           <num>         <num>  <num>
1:          2.47          6.53 0.0406
2:          2.87          7.00 0.0414

# Confidence intervals for proportion
z_alpha2_prop <- qnorm(1 - alpha_prop / 2)

# Standard confidence interval
margin_error_prop <- z_alpha2_prop * sqrt(p_hat_prop * (1 - p_hat_prop) / n_prop)
ci_lower_prop <- p_hat_prop - margin_error_prop
ci_upper_prop <- p_hat_prop + margin_error_prop

# Wilson score interval (more accurate for small samples)
wilson_adjustment <- (z_alpha2_prop^2) / (2 * n_prop)
wilson_denominator <- 1 + (z_alpha2_prop^2) / n_prop
wilson_center <- (p_hat_prop + wilson_adjustment) / wilson_denominator
wilson_margin <- z_alpha2_prop * sqrt((p_hat_prop * (1 - p_hat_prop) / n_prop + (z_alpha2_prop^2) / (4 * n_prop^2)) / wilson_denominator^2)
wilson_lower <- wilson_center - wilson_margin
wilson_upper <- wilson_center + wilson_margin

prop_ci_results <- data.table(
  Method = c("Standard CI", "Wilson Score CI"),
  Sample_Proportion = c(round(p_hat_prop, 4), round(p_hat_prop, 4)),
  Confidence_Level = c("95%", "95%"),
  Lower_Bound = c(round(ci_lower_prop, 4), round(wilson_lower, 4)),
  Upper_Bound = c(round(ci_upper_prop, 4), round(wilson_upper, 4)),
  Lower_Percent = c(round(ci_lower_prop * 100, 2), round(wilson_lower * 100, 2)),
  Upper_Percent = c(round(ci_upper_prop * 100, 2), round(wilson_upper * 100, 2)),
  Width = c(round(ci_upper_prop - ci_lower_prop, 4), round(wilson_upper - wilson_lower, 4))
)

print("95% Confidence Intervals for Proportion:")
print(prop_ci_results)

1.8 Other Interval Estimates for a Single Sample

1.8.1 Prediction Interval

Prediction Interval

A prediction interval provides bounds for a future observation from the same population.

For a normal population with unknown σ: \overline{x} \pm t_{\alpha/2,n-1}s\sqrt{1 + \frac{1}{n}}

Note: Prediction intervals are always wider than confidence intervals because they account for both sampling variability and the variability of individual observations.

1.8.2 Example: Prediction Interval

R Output
R Code

[1] "Comparison of Confidence and Prediction Intervals:"

             Interval_Type              Purpose Sample_Mean Margin_Error
                    <char>               <char>       <num>        <num>
1: 95% Confidence Interval  For population mean     3233.75        19.93
2: 95% Prediction Interval For next observation     3233.75        82.16
   Lower_Bound Upper_Bound  Width
         <num>       <num>  <num>
1:     3213.82     3253.68  39.85
2:     3151.59     3315.91 164.31

# Prediction interval example
# Using concrete strength data
prediction_data <- concrete_data %>%
  fsummarise(
    n = fnobs(strength),
    Sample_Mean = fmean(strength),
    Sample_SD = fsd(strength)
  )

# Calculate prediction interval for next observation
t_alpha2_pred <- qt(1 - alpha_concrete / 2, df = prediction_data$n - 1)
prediction_margin <- t_alpha2_pred * prediction_data$Sample_SD * sqrt(1 + 1 / prediction_data$n)
pred_lower <- prediction_data$Sample_Mean - prediction_margin
pred_upper <- prediction_data$Sample_Mean + prediction_margin

# Compare with confidence interval
conf_margin <- t_alpha2_pred * prediction_data$Sample_SD / sqrt(prediction_data$n)
conf_lower <- prediction_data$Sample_Mean - conf_margin
conf_upper <- prediction_data$Sample_Mean + conf_margin

interval_comparison <- data.table(
  Interval_Type = c("95% Confidence Interval", "95% Prediction Interval"),
  Purpose = c("For population mean", "For next observation"),
  Sample_Mean = c(round(prediction_data$Sample_Mean, 2), round(prediction_data$Sample_Mean, 2)),
  Margin_Error = c(round(conf_margin, 2), round(prediction_margin, 2)),
  Lower_Bound = c(round(conf_lower, 2), round(pred_lower, 2)),
  Upper_Bound = c(round(conf_upper, 2), round(pred_upper, 2)),
  Width = c(round(conf_upper - conf_lower, 2), round(pred_upper - pred_lower, 2))
)

print("Comparison of Confidence and Prediction Intervals:")
print(interval_comparison)

1.8.3 Tolerance Intervals for a Normal Distribution

Tolerance Intervals

A tolerance interval is an interval that contains at least a specified proportion P of the population with confidence level 100(1-α)%.

Two-sided tolerance interval: \overline{x} \pm ks

where k is a tolerance interval factor that depends on n, P, and 1-α.

Common applications:

Quality control specifications
Process capability studies
Engineering design limits

1.8.4 Example: Tolerance Interval

R Output
R Code

[1] "Tolerance Interval Results:"

                Interval_Type                                        Purpose
                       <char>                                         <char>
1: 95%/95% Tolerance Interval Contains 95% of population with 95% confidence
   Sample_Mean Sample_SD k_Factor Lower_Bound Upper_Bound  Width
         <num>     <num>    <num>       <num>       <num>  <num>
1:     3233.75     37.39    2.903     3125.19     3342.31 217.11
                                                                 Interpretation
                                                                         <char>
1: 95% confident that 95% of concrete strengths are between 3125.19 and 3342.31

# Tolerance interval example
# For 95% of population with 95% confidence
# Using tolerance interval factors (simplified version)

# Approximate tolerance interval factors for n=16, 95% content, 95% confidence
k_factor <- 2.903 # This would typically come from statistical tables

tolerance_lower <- prediction_data$Sample_Mean - k_factor * prediction_data$Sample_SD
tolerance_upper <- prediction_data$Sample_Mean + k_factor * prediction_data$Sample_SD

tolerance_results <- data.table(
  Interval_Type = "95%/95% Tolerance Interval",
  Purpose = "Contains 95% of population with 95% confidence",
  Sample_Mean = round(prediction_data$Sample_Mean, 2),
  Sample_SD = round(prediction_data$Sample_SD, 2),
  k_Factor = k_factor,
  Lower_Bound = round(tolerance_lower, 2),
  Upper_Bound = round(tolerance_upper, 2),
  Width = round(tolerance_upper - tolerance_lower, 2),
  Interpretation = paste(
    "95% confident that 95% of concrete strengths are between",
    round(tolerance_lower, 2), "and", round(tolerance_upper, 2)
  )
)

print("Tolerance Interval Results:")
print(tolerance_results)

1.9 Summary Tables of Inference Procedures for a Single Sample

R Output
R Code

Table 1: Summary of inference procedures for a single sample

Summary of Inference Procedures for a Single Sample
Parameter	Test_Statistic	Distribution	Confidence_Interval	Assumptions
Mean (σ known)	Z = (x̄ - μ₀)/(σ/√n)	N(0,1)	x̄ ± z_{α/2}σ/√n	Normal population or large n
Mean (σ unknown)	t = (x̄ - μ₀)/(s/√n)	t_{n-1}	x̄ ± t_{α/2,n-1}s/√n	Normal population
Variance	χ² = (n-1)s²/σ₀²	χ²_{n-1}	((n-1)s²/χ²_{α/2}, (n-1)s²/χ²_{1-α/2})	Normal population
Proportion	Z = (p̂ - p₀)/√(p₀(1-p₀)/n)	N(0,1)	p̂ ± z_{α/2}√(p̂(1-p̂)/n)	Large n, np≥5, n(1-p)≥5

Table 2

# Summary table of inference procedures
inference_summary <- data.table(
  Parameter = c("Mean (σ known)", "Mean (σ unknown)", "Variance", "Proportion"),
  Test_Statistic = c(
    "Z = (x̄ - μ₀)/(σ/√n)", "t = (x̄ - μ₀)/(s/√n)",
    "χ² = (n-1)s²/σ₀²", "Z = (p̂ - p₀)/√(p₀(1-p₀)/n)"
  ),
  Distribution = c("N(0,1)", "t_{n-1}", "χ²_{n-1}", "N(0,1)"),
  Confidence_Interval = c(
    "x̄ ± z_{α/2}σ/√n", "x̄ ± t_{α/2,n-1}s/√n",
    "((n-1)s²/χ²_{α/2}, (n-1)s²/χ²_{1-α/2})",
    "p̂ ± z_{α/2}√(p̂(1-p̂)/n)"
  ),
  Assumptions = c(
    "Normal population or large n", "Normal population",
    "Normal population", "Large n, np≥5, n(1-p)≥5"
  )
)

kbl(inference_summary,
  caption = "Summary of Inference Procedures for a Single Sample"
) %>%
  kable_styling() %>%
  column_spec(2, width = "3cm") %>%
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "2.5cm")

1.10 Testing for Goodness of Fit

Chi-Square Goodness-of-Fit Test

Tests whether sample data fits a specified distribution.

Test Statistic: \chi_0^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}

where:

O_i = observed frequency in cell i
E_i = expected frequency in cell i
k = number of cells

Under H₀, χ₀² ~ χ²_{k-1-p} where p = number of estimated parameters.

1.10.1 Example: Testing for Normality

Example

Test whether the following 50 observations follow a normal distribution:

[1] "Data Summary for Normality Test:"

       n      Min       Q1   Median     Mean       Q3      Max       SD
   <int>    <num>    <num>    <num>    <num>    <num>    <num>    <num>
1:    50 77.39232 93.06884 98.11524 100.3605 106.7775 124.0762 10.42499
    Skewness   Kurtosis
       <num>      <num>
1: 0.4022879 -0.4221122

# Goodness of fit test example
set.seed(789)
normality_test_data <- data.table(
  observation_id = 1:50,
  value = c(rnorm(40, mean = 100, sd = 10), rnorm(10, mean = 110, sd = 8)) # Mixed to test normality
)

# Summary statistics
normality_summary <- normality_test_data %>%
  fsummarise(
    n = fnobs(value),
    Min = fmin(value),
    Q1 = fquantile(value, 0.25),
    Median = fmedian(value),
    Mean = fmean(value),
    Q3 = fquantile(value, 0.75),
    Max = fmax(value),
    SD = fsd(value),
    Skewness = fsum((value - fmean(value))^3) / (fnobs(value) * fsd(value)^3),
    Kurtosis = fsum((value - fmean(value))^4) / (fnobs(value) * fsd(value)^4) - 3
  )

print("Data Summary for Normality Test:")
print(normality_summary)

Steps: 1. Estimate μ and σ from the sample 2. Define class intervals 3. Calculate expected frequencies under normality 4. Compute chi-square statistic 5. Compare to critical value

R Output
R Code

[1] "Goodness of Fit Test Results:"

                         Test                  Null_Hypothesis     n Classes
                       <char>                           <char> <int>   <int>
1: Chi-square goodness of fit Data follows normal distribution    50       7
   Chi2_Statistic Degrees_Freedom P_Value Alpha                   Decision
            <num>           <num>   <num> <num>                     <char>
1:          3.331               4   0.504  0.05 Fail to reject H0 (normal)
                                 Conclusion
                                     <char>
1: Data consistent with normal distribution

[1] "Observed vs Expected Frequencies:"

Key: <class_interval>
   class_interval     N  expected chi2_component
           <fctr> <int>     <num>          <num>
1:    [76.4,83.3]     1  2.029651    0.522346378
2:    (83.3,90.3]     7  5.798476    0.248972473
3:    (90.3,97.3]    14 10.782517    0.960091219
4:     (97.3,104]    10 13.056560    0.715545084
5:      (104,111]    10 10.296736    0.008551451
6:      (111,118]     5  5.287598    0.015642709
7:      (118,125]     3  1.767273    0.859864587

[1] "Shapiro-Wilk Test for comparison:"


    Shapiro-Wilk normality test

data:  normality_test_data$value
W = 0.971, p-value = 0.2541

# Chi-square goodness of fit test for normality
n_norm <- fnobs(normality_test_data$value)
sample_mean <- fmean(normality_test_data$value)
sample_sd <- fsd(normality_test_data$value)

# Create class intervals (using Sturges' rule for number of classes)
k_classes <- ceiling(1 + log2(n_norm)) # Sturges' rule
class_breaks <- seq(
  from = fmin(normality_test_data$value) - 1,
  to = fmax(normality_test_data$value) + 1,
  length.out = k_classes + 1
)

# Calculate observed frequencies
normality_test_data[, class_interval := cut(value, breaks = class_breaks, include.lowest = TRUE)]
observed_freq <- normality_test_data[, .N, by = class_interval][order(class_interval)]

# Calculate expected frequencies under normality
expected_freq <- data.table()
for (i in 1:(length(class_breaks) - 1)) {
  prob_class <- pnorm(class_breaks[i + 1], mean = sample_mean, sd = sample_sd) -
    pnorm(class_breaks[i], mean = sample_mean, sd = sample_sd)
  expected_count <- n_norm * prob_class
  expected_freq <- rbind(
    expected_freq,
    data.table(
      class_interval = observed_freq$class_interval[i],
      expected = expected_count
    )
  )
}

# Combine observed and expected
goodness_fit_table <- merge(observed_freq, expected_freq, by = "class_interval")
goodness_fit_table[, chi2_component := (N - expected)^2 / expected]

# Calculate chi-square statistic
chi2_gof <- sum(goodness_fit_table$chi2_component)
df_gof <- nrow(goodness_fit_table) - 1 - 2 # -2 for estimated μ and σ
p_value_gof <- 1 - pchisq(chi2_gof, df = df_gof)

goodness_fit_results <- data.table(
  Test = "Chi-square goodness of fit",
  Null_Hypothesis = "Data follows normal distribution",
  n = n_norm,
  Classes = nrow(goodness_fit_table),
  Chi2_Statistic = round(chi2_gof, 4),
  Degrees_Freedom = df_gof,
  P_Value = round(p_value_gof, 4),
  Alpha = 0.05,
  Decision = ifelse(p_value_gof < 0.05, "Reject H0 (not normal)", "Fail to reject H0 (normal)"),
  Conclusion = ifelse(p_value_gof < 0.05,
    "Data does not follow normal distribution",
    "Data consistent with normal distribution"
  )
)

print("Goodness of Fit Test Results:")
print(goodness_fit_results)
print("Observed vs Expected Frequencies:")
print(goodness_fit_table)

# Alternative tests for normality
shapiro_test <- shapiro.test(normality_test_data$value)
print("Shapiro-Wilk Test for comparison:")
print(shapiro_test)

1.10.2 Comprehensive Example: Quality Control Analysis

Comprehensive Example

A manufacturing company wants to perform a complete statistical analysis of their production process. They collect data on 100 products measuring diameter, strength, and defect status.

[1] "Comprehensive Quality Control Data Summary:"

       n Mean_Diameter SD_Diameter Min_Diameter Max_Diameter Mean_Strength
   <int>         <num>       <num>        <num>        <num>         <num>
1:   100      10.02267   0.1838186     9.477965     10.97406      246.6658
   SD_Strength Min_Strength Max_Strength Total_Defects Defect_Rate
         <num>        <num>        <num>         <num>       <num>
1:    20.84695     191.4787     285.4509             8        0.08
   Defect_Percent
            <num>
1:              8

Error in eval(e, .data, pe): object 'p_value' not found

Error in pchisq(chi2_stat, df = df): Non-numeric argument to mathematical function

Error in eval(e, .data, pe): object 'z_stat' not found

Error: object 'diameter_test' not found

[1] "Comprehensive Hypothesis Test Results:"

Error: object 'comprehensive_test_results' not found

# Comprehensive quality control analysis
set.seed(456)
comprehensive_qc_data <- data.table(
  product_id = 1:100,
  diameter = rnorm(100, mean = 10.0, sd = 0.15),
  strength = rnorm(100, mean = 250, sd = 20),
  defective = sample(c(0, 1), 100, replace = TRUE, prob = c(0.92, 0.08))
)

# Add some outliers to make it realistic
comprehensive_qc_data[sample(.N, 3), diameter := diameter + rnorm(3, 0, 0.5)]
comprehensive_qc_data[sample(.N, 2), strength := strength - 50]

# Comprehensive summary
qc_summary <- comprehensive_qc_data %>%
  fsummarise(
    n = fnobs(diameter),
    # Diameter statistics
    Mean_Diameter = fmean(diameter),
    SD_Diameter = fsd(diameter),
    Min_Diameter = fmin(diameter),
    Max_Diameter = fmax(diameter),
    # Strength statistics
    Mean_Strength = fmean(strength),
    SD_Strength = fsd(strength),
    Min_Strength = fmin(strength),
    Max_Strength = fmax(strength),
    # Defect statistics
    Total_Defects = fsum(defective),
    Defect_Rate = fmean(defective),
    Defect_Percent = fmean(defective) * 100
  )

print("Comprehensive Quality Control Data Summary:")
print(qc_summary)

# Multiple hypothesis tests
alpha_qc <- 0.05

# 1. Test diameter mean vs specification (10.0)
diameter_test <- comprehensive_qc_data %>%
  fsummarise(
    n = fnobs(diameter),
    xbar = fmean(diameter),
    s = fsd(diameter),
    mu0 = 10.0,
    t_stat = (fmean(diameter) - 10.0) / (fsd(diameter) / sqrt(fnobs(diameter))),
    df = fnobs(diameter) - 1,
    p_value = 2 * (1 - pt(abs((fmean(diameter) - 10.0) / (fsd(diameter) / sqrt(fnobs(diameter)))),
      df = fnobs(diameter) - 1
    )),
    decision = ifelse(p_value < alpha_qc, "Reject H0", "Fail to reject H0")
  )

# 2. Test strength variance
strength_var_test <- comprehensive_qc_data %>%
  fsummarise(
    n = fnobs(strength),
    s2 = fvar(strength),
    sigma2_0 = 400, # Target variance
    chi2_stat = (fnobs(strength) - 1) * fvar(strength) / 400,
    df = fnobs(strength) - 1,
    p_value = 2 * min(pchisq(chi2_stat, df = df), 1 - pchisq(chi2_stat, df = df)),
    decision = ifelse(p_value < alpha_qc, "Reject H0", "Fail to reject H0")
  )

# 3. Test defect proportion
defect_prop_test <- comprehensive_qc_data %>%
  fsummarise(
    n = fnobs(defective),
    x = fsum(defective),
    p_hat = fmean(defective),
    p0 = 0.05, # Target defect rate
    z_stat = (fmean(defective) - 0.05) / sqrt(0.05 * 0.95 / fnobs(defective)),
    p_value = 2 * (1 - pnorm(abs(z_stat))),
    decision = ifelse(p_value < alpha_qc, "Reject H0", "Fail to reject H0")
  )

comprehensive_test_results <- data.table(
  Test = c("Diameter Mean", "Strength Variance", "Defect Proportion"),
  Parameter = c("μ = 10.0", "σ² = 400", "p = 0.05"),
  Test_Statistic = c(
    round(diameter_test$t_stat, 3),
    round(strength_var_test$chi2_stat, 3),
    round(defect_prop_test$z_stat, 3)
  ),
  P_Value = c(
    round(diameter_test$p_value, 4),
    round(strength_var_test$p_value, 4),
    round(defect_prop_test$p_value, 4)
  ),
  Decision = c(diameter_test$decision, strength_var_test$decision, defect_prop_test$decision),
  Distribution = c(
    paste0("t(", diameter_test$df, ")"),
    paste0("χ²(", strength_var_test$df, ")"),
    "N(0,1)"
  )
)

print("Comprehensive Hypothesis Test Results:")
print(comprehensive_test_results)

Complete Analysis: 1. Test mean diameter vs. specification 2. Test variance in strength 3. Test defect proportion 4. Construct confidence intervals 5. Create prediction intervals 6. Establish tolerance intervals 7. Test for normality

R Output
R Code

Figure 3: Comprehensive quality control analysis dashboard

# Comprehensive quality control dashboard
library(gridExtra)

# Diameter histogram with normal overlay
p1 <- ggplot(data = comprehensive_qc_data, mapping = aes(x = diameter)) +
  geom_histogram(
    mapping = aes(y = after_stat(density)), bins = 20,
    fill = "lightblue", color = "black", alpha = 0.7
  ) +
  stat_function(
    fun = dnorm,
    args = list(
      mean = mean(comprehensive_qc_data$diameter),
      sd = sd(comprehensive_qc_data$diameter)
    ),
    color = "red", linewidth = 1
  ) +
  geom_vline(xintercept = 10.0, linetype = "dashed", color = "green", linewidth = 1) +
  labs(x = "Diameter", y = "Density", title = "Diameter Distribution") +
  theme_classic()

# Strength boxplot
p2 <- ggplot(data = comprehensive_qc_data, mapping = aes(x = "", y = strength)) +
  geom_boxplot(fill = "lightgreen", alpha = 0.7) +
  geom_hline(yintercept = 250, linetype = "dashed", color = "red") +
  labs(x = "", y = "Strength", title = "Strength Distribution") +
  theme_classic() +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

# Defect rate bar chart
defect_summary_plot <- comprehensive_qc_data[, .(Count = .N), by = defective]
defect_summary_plot[, Status := ifelse(defective == 0, "Good", "Defective")]

p3 <- ggplot(data = defect_summary_plot, mapping = aes(x = Status, y = Count, fill = Status)) +
  geom_col(alpha = 0.7) +
  scale_fill_manual(values = c("Good" = "green", "Defective" = "red")) +
  labs(x = "Product Status", y = "Count", title = "Defect Analysis") +
  theme_classic() +
  theme(legend.position = "none")

# Control chart simulation
set.seed(123)
control_data <- data.table(
  sample_num = 1:30,
  sample_mean = rnorm(30, mean = 10.0, sd = 0.05)
)
control_data[c(25, 28), sample_mean := c(10.3, 9.7)] # Add out-of-control points

ucl <- 10.0 + 3 * 0.05
lcl <- 10.0 - 3 * 0.05

p4 <- ggplot(data = control_data, mapping = aes(x = sample_num, y = sample_mean)) +
  geom_line(color = "blue", linewidth = 0.8) +
  geom_point(size = 2, color = "red") +
  geom_hline(yintercept = 10.0, color = "green", linewidth = 1) +
  geom_hline(yintercept = ucl, color = "red", linetype = "dashed") +
  geom_hline(yintercept = lcl, color = "red", linetype = "dashed") +
  labs(x = "Sample Number", y = "Sample Mean", title = "Control Chart") +
  theme_classic()

# Combine plots
grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)

1.11 Key Formulas and Critical Values

Essential Formulas Summary

Test Statistics:

Z-test (σ known): Z_0 = \frac{\overline{X} - \mu_0}{\sigma/\sqrt{n}}
t-test (σ unknown): T_0 = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}
Chi-square (variance): \chi_0^2 = \frac{(n-1)S^2}{\sigma_0^2}
Z-test (proportion): Z_0 = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}

Confidence Intervals:

Mean (σ known): \overline{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}
Mean (σ unknown): \overline{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}
Variance: \frac{(n-1)s^2}{\chi_{\alpha/2,n-1}^2} \leq \sigma^2 \leq \frac{(n-1)s^2}{\chi_{1-\alpha/2,n-1}^2}
Proportion: \hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

Other Intervals:

Prediction: \overline{x} \pm t_{\alpha/2,n-1}s\sqrt{1 + \frac{1}{n}}
Tolerance: \overline{x} \pm ks (k from tables)

R Output
R Code

Table 3: Critical values for common significance levels

Critical Values for Common Significance Levels
		Normal	t-distribution	Chi-square
α	df	z_{α/2}	t_{α/2,df}	χ²_{α/2,df}	χ²_{1-α/2,df}
α = 0.05
0.05	5	1.960	2.571	0.831	12.833
0.05	10	1.960	2.228	3.247	20.483
0.05	20	1.960	2.086	9.591	34.170
0.05	30	1.960	2.042	16.791	46.979
0.05	100	1.960	1.984	74.222	129.561
α = 0.01
0.01	5	2.576	4.032	0.412	16.750
0.01	10	2.576	3.169	2.156	25.188
0.01	20	2.576	2.845	7.434	39.997
0.01	30	2.576	2.750	13.787	53.672
0.01	100	2.576	2.626	67.328	140.169

Table 4

# Critical values table
alpha_levels <- c(0.10, 0.05, 0.01)
df_values <- c(1, 2, 5, 10, 15, 20, 30, 50, 100, 1000)

critical_values_table <- data.table()
for (alpha_val in alpha_levels) {
  for (df_val in df_values) {
    z_crit <- qnorm(1 - alpha_val / 2)
    t_crit <- qt(1 - alpha_val / 2, df = df_val)
    chi2_lower <- qchisq(alpha_val / 2, df = df_val)
    chi2_upper <- qchisq(1 - alpha_val / 2, df = df_val)

    critical_values_table <- rbind(
      critical_values_table,
      data.table(
        alpha = alpha_val,
        df = df_val,
        z_alpha2 = round(z_crit, 3),
        t_alpha2 = round(t_crit, 3),
        chi2_alpha2 = round(chi2_lower, 3),
        chi2_1_alpha2 = round(chi2_upper, 3)
      )
    )
  }
}

# Create formatted table for display
critical_display <- critical_values_table[alpha %in% c(0.05, 0.01) &
  df %in% c(5, 10, 20, 30, 100)]

kbl(critical_display,
  col.names = c("α", "df", "z_{α/2}", "t_{α/2,df}", "χ²_{α/2,df}", "χ²_{1-α/2,df}"),
  caption = "Critical Values for Common Significance Levels"
) %>%
  kable_styling() %>%
  add_header_above(c(" " = 2, "Normal" = 1, "t-distribution" = 1, "Chi-square" = 2)) %>%
  pack_rows("α = 0.05", 1, 5) %>%
  pack_rows("α = 0.01", 6, 10)

# Sample size determination examples