Trace:

isdsapress:books:isdsa2019:isdsa2019c11

More Accurate Estimators of Multiple Correlation Coefficient?

Bingjiang Li and Lu Peng
Nanjing University of Posts and Telecommunications, China
1218084111@njupt.edu.cn

Kentaro Hayashi
University of Hawaii at Manoa, USA

Ke-Hai Yuan
University of Notre Dame, USA

Abstract. The squared multiple correlation ($R^2$) is commonly used to measure how well the outcome variable is linearly related to a set of predictors. Unfortunately, $R^2$ is biased for its population counterpart ($\rho^2$), and the bias increases as the number of variables ($p$) increases. Efforts have been made to modify $R^2$. The most notable result is the adjusted $R^2$ ($R_{adj}^2$), which incorporates the influence of the sample size ($N$) and $p$. However, $R_{adj}^2$ is still biased, and an unbiased estimator of $\rho^2$ does not exist. Using empirical modeling and statistical learning, this article develops new formulas for estimating the population $\rho$. The development involves obtaining formulas for the empirical bias of $R$ via Monte Carlo simulation across many conditions. Values of the empirical bias are then predicted by functions of $N$, $p$ and the observed values of the $R$. Best-subset regression are used to identify the best predictors for the empirical bias. Improved formulas for estimating $\rho$ are obtained via a bias correction to $R$. Results of cross validation show that empirically corrected estimators contain little bias and perform better than both $R$ and $R_{adj}$ in mean squared error and variance.

Keywords: Empirical modeling • Monte Carlo simulation • Bias correction • Best-subset regression.

### Page Tools

ISDSA About Membership Jobs at ISDSA Privacy ISDSA Press About Journal of Behavioral Data Science Books Annual Meeting Current Meeting Donate ISDSA is an exempt organization under section 501(c)(3) of the Internal Revenue Code. To make tax deductible contribution for the growth of ISDSA, click here.