kapitals-pi & SEN: June 2017

Friday, June 02, 2017

x̄ - > The limit theorem with R programming

Understanding Limit Theorems in R

Introduction to Limit Theorems

The limit theorem refers to a fundamental concept in mathematics and statistics that describes the behavior of a sequence of random variables or the sum of a large number of independent and identically distributed random variables. There are several types of limit theorems, but the most well-known ones are the law of large numbers and the central limit theorem.

1. Law of Large Numbers

The law of large numbers states that as the number of independent and identically distributed random variables increases, their average (or sum) converges to the expected value of the random variable. In simpler terms, it suggests that if you repeat an experiment a large number of times, the average outcome will approach the true expected value.

2. Central Limit Theorem

The central limit theorem states that the sum or average of a large number of independent and identically distributed random variables will be approximately normally distributed, regardless of the shape of the original distribution. This theorem is particularly important because it enables the use of statistical techniques that assume a normal distribution, even when the underlying variables may not be normally distributed themselves.

These limit theorems are crucial in probability theory and statistics as they provide a theoretical foundation for many statistical methods and allow us to make inferences about populations based on sample data.

Demonstrating the Law of Large Numbers in R

Below, we'll simulate rolling a fair six-sided die and calculate the average value as we roll it more and more times.

# Number of die rolls
num_rolls <- 1000

# Simulating die rolls
rolls <- sample(1:6, num_rolls, replace = TRUE)

# Calculate the cumulative average
cumulative_average <- cumsum(rolls) / (1:num_rolls)

# Plot the cumulative average
plot(1:num_rolls, cumulative_average, type = "l", xlab = "Number of Rolls", ylab = "Average")

# Add a horizontal line at the expected value (3.5 for a fair die)
abline(h = 3.5, col = "red")

In this code, we start by specifying the number of rolls (num_rolls). We then simulate rolling a fair die num_rolls times using the sample function. The cumulative_average variable calculates the cumulative average of the rolls up to each roll number.

Finally, we plot the cumulative average against the number of rolls using plot. We also add a horizontal line at the expected value of 3.5, representing the true expected value for a fair six-sided die.

If you run this code, you'll observe that as the number of rolls increases, the cumulative average will converge towards the expected value of 3.5, demonstrating the Law of Large Numbers.

Demonstrating the Central Limit Theorem in R

In this case, we'll simulate the sum of a large number of random variables drawn from a non-normal distribution (exponential distribution) and observe the resulting distribution.

# Number of random variables to sum
num_variables <- 1000

# Number of simulations
num_simulations <- 10000

# Simulating exponential random variables
simulations <- replicate(num_simulations, sum(rexp(num_variables)))

# Plotting the histogram of the simulation results
hist(simulations, breaks = 30, prob = TRUE, col = "lightblue", main = "Sum of Exponential Random Variables")

In this code, we first specify the number of random variables to sum (num_variables) and the number of simulations (num_simulations). We then use the replicate function to generate num_simulations samples of the sum of num_variables exponential random variables using the rexp function.

Finally, we plot the histogram of the simulation results using hist. The resulting histogram will approximate a bell-shaped, approximately normal distribution, illustrating the Central Limit Theorem.

Keep in mind that the Central Limit Theorem is an asymptotic result, meaning that it holds as the number of random variables approaches infinity. In practice, even with a moderately large number of variables, you can observe the approximation to a normal distribution.