More Accurate Estimators of Multiple Correlation Coefficient?
Bingjiang Li and Lu Peng
Nanjing University of Posts and Telecommunications, China
1218084111@njupt.edu.cn
Kentaro Hayashi
University of Hawaii at Manoa, USA
Ke-Hai Yuan
University of Notre Dame, USA
Abstract. The squared multiple correlation ($R^2$) is commonly used to measure how well the outcome variable is linearly related to a set of predictors. Unfortunately, $R^2$ is biased for its population counterpart ($\rho^2$), and the bias increases as the number of variables ($p$) increases. Efforts have been made to modify $R^2$. The most notable result is the adjusted $R^2$ ($R_{adj}^2$), which incorporates the influence of the sample size ($N$) and $p$. However, $R_{adj}^2$ is still biased, and an unbiased estimator of $\rho^2$ does not exist. Using empirical modeling and statistical learning, this article develops new formulas for estimating the population $\rho$. The development involves obtaining formulas for the empirical bias of $R$ via Monte Carlo simulation across many conditions. Values of the empirical bias are then predicted by functions of $N$, $p$ and the observed values of the $R$. Best-subset regression are used to identify the best predictors for the empirical bias. Improved formulas for estimating $\rho$ are obtained via a bias correction to $R$. Results of cross validation show that empirically corrected estimators contain little bias and perform better than both $R$ and $R_{adj}$ in mean squared error and variance.
Keywords: Empirical modeling • Monte Carlo simulation • Bias correction • Best-subset regression.
DOI: https://doi.org/10.35566/isdsa2019c11
To buy this chapter or the book, use the form below