T-score Vs. Z-score as Sample Size Increases

It is quite common practice to approximate the t-score by z-score when the sample size is above 30 when the data is relatively normal (which is an arbitrary cut off that the old statistics books advice). T-score becomes equivalent to z-score as the sample size increases, but they are different when the sample size is quite small, see figure below. Several dataset with different sample size which are relatively normally distributed are simulated and the corresponding t-score for 95% confidence interval were calculated for each sample size and displayed graphically; the blue horizontal line is when t-score=1.96 which is equivalent to z-score for normally distributed data, and the vertical blue line is when the sample size is 30. So, for any inference such as calculating confidence interval it is better to use the corresponding t-score for specific sample size rather than approximating using z-score especially when the sample size is small. PS. the confidence interval from t-score is always wider than from z-score for small dataset.

n <- seq(15, 2000, by = 10)
t_score <- qt(0.975,df=n-1)

plot(n,t_score, type="o", ylim=c(1.94,2.16), cex = .8,ylab="T-score", xlab="Sample-size")

abline(h = median(t_score), v = 30, col = "blue", cex = 1.8)

round(samm[2], 3)

legend(500, 2.16, legend = c(paste("Min =", round(samm[1], 3)),
paste("1st Qu =",round(samm[2], 3)),
paste("Median =", round(samm[3], 3)),
paste("Mean =",round(samm[4], 3)),
paste("3rd Qu. =", round(samm[5], 3)),
paste("Max =", round(samm[6], 3))),
bty = "n")

t_score1 Continue reading “T-score Vs. Z-score as Sample Size Increases”