Using R to Select an Optimum Stock Portfolio

Part III: Building the Portfolio


After calculating the correlations between each stock and the market as a whole, we need to check that the model assumptions are met. We then adjust the model estimates to mitigate historical bias, and finally build an optimum portfolio by choosing stocks that exhibit the best risk-adjusted performance relative to the market.

Check Model Assumptions are Met
Adjust Model Estimates
Estimate Excess Return of Each Stock
Select Stocks to Include in the Portfolio

This is the third of three pages on this topic: you can view the first two pages here:

Part I–Preliminaries
Part II–Selecting the Stocks

You can download the entire script (and related files) here.

Check Model Assumptions are Met

After the regression, the R script checks the model of each stock to ensure that the residuals are normally distributed and are not correlated: this is important to ensure that the underlying assumptions of linear regression are met. Note that we do NOT check for heteroskedasticty, or that the stock returns as a whole are a stationary process with a constant variance: this is because these checks are difficult to automate, and prone to false positives. Nevertheless, this is an important limitation of this script, since stock returns are known to exhibit volatility clustering.

# Test model residuals for normality and autocorrelation

Beta.model.residuals <- lapply(Beta.model, residuals)

# Normality

normal.test <- lapply(Beta.model.residuals, ad.test)
normal.test.adj <- lapply(normal.test, function (x) {
	temp <- p.adjust(x$p.value, method="hochberg", n=length(normal.test)) return (temp > 0.05)

Beta.model.normal <- Beta.model[unlist(normal.test.adj)]

# Auto-correlation

ac.test <- lapply(Beta.model.normal, dwtest)
ac.test.adj <- lapply(ac.test, function (x) {
	temp <- p.adjust(x$p.value, method="hochberg", n=length(ac.test)) return (temp > 0.05)

Beta.model.uncor <- Beta.model.normal[unlist(ac.test.adj)]

Adjust Model Estimates

Next, we adjust each estimate of Beta using Vasicek’s method: this accounts for the fact that estimated Betas tend to exhibit greater variation than historical ones. The adjustment itself is a weighted sum of the average portfolio Beta and the individual stock’s Beta. Here is the adjustment formula, with the weights being given by w_1 and w_2 (note that the denominator is the same for both weights).

w_1 = \frac{\sigma_{stock}^2}{\sigma_{stock}^2 + \sigma_{(\beta)portfolio}^2} \qquad
w_2 = \frac{\sigma_{(\beta)portfolio}^2}{\sigma_{stock}^2 + \sigma_{(\beta)portfolio}^2}

\beta_{adj} = w_1 * \mu_{(\beta)portfolio} + w_2 * \beta_{stock}

Here is the part of the script that performs the adjustment; we create a new list called portfolio to aid in the calculation.

# Combine model estimates and diagnostics into a new list
# Include both model SEE and an annualized SEE, since we'll use it to adjust Beta

portfolio <- lapply(Beta.model.uncor, function (x) {
		Model.see.annual=summary(x)$sigma * sqrt(252),

# Adjust each Beta using Vasicek's method
# Beta.mean and Beta.var are over all the Betas in the portfolio

Beta.mean <- mean(sapply(portfolio, function(x) {x$Beta}))
Beta.var <- var(sapply(portfolio, function(x) {x$Beta}))

portfolio.adj <- lapply(portfolio, function (x) {
	beta.weight.1 <- (x$Model.see.annual)^2 / ((x$Model.see.annual)^2 + Beta.var)
	beta.weight.2 <- Beta.var / ((x$Model.see.annual)^2 + Beta.var)
	x$Beta.adj <- (beta.weight.1 * Beta.mean) + (beta.weight.2 * x$Beta)

Estimate Excess Return of Each Stock

Having estimated and adjusted the portfolio Betas, we calculate the excess return to Beta for each stock: this is an estimate of how much additional return we can expect from the stock when its risk is taken into account; it will be the primary metric we use to determine which stocks belong in our portfolio.

The idea behind this is that individual stocks are always riskier investments than the market as a whole: that is, a stock will always have more variation in its return than the market will. As such, we expect greater returns from individual stocks than from the market simply because their returns show more variation. The excess return to Beta is an estimate of the additional return we can expect on top of this variation: the greater the excess return, the more valuable the stock, and the more likely it should be included in our portfolio. The formula for calculating it is:

Y_{excess} = \frac{Y_{return} – r_{riskfree}}{\beta_{adj}}

Intuitively, the excess return of a stock is the ratio of its (estimated) actual return to the risk of holding it: the only wrinkle is that we must subtract the risk-free rate from the estimated return, so that the numerator is the additional return received from taking on any level of market risk.

This part of the script calculates the excess return to Beta for each stock: it uses the estimated regression coefficients for each stock and the average market return to estimate the stock’s expected return, then divides that estimate by the stock’s adjusted Beta. The excess return itself is stored in the new variable ranking, and the list of portfolio candidates is sorted in decreasing order of excess return.

# Get rid of poor estimates <- portfolio.adj[unlist(sapply(portfolio.adj, function(x) {x$Model.r.squared > .5}))]

# Calculated estimated returns for each stock, then rank them using standard MPT method
# of excess return to Beta

mean.spx.return <- mean(spx.stock.return) * 252
var.spx.return <- var(spx.stock.return)					# we'll need this for the portfolio cutoff

portfolio.est.return <- lapply(, function (x) {
	x$est.return <- x$Alpha + (x$Beta.adj * mean.spx.return)

portfolio.ranking <- lapply(portfolio.est.return, function(x) {
	x$ranking <-(x$est.return - rf.rate) / x$Beta.adj

# Sort by decreasing rank (just for convenience)

portfolio.ranking <- portfolio.ranking[order(-sapply(portfolio.ranking, function(x) {x$ranking}))]

Select Stocks to Include in the Portfolio

The final step in portfolio selection is to find the cutoff point below which the additional risk in holding a stock outweighs its inclusion in the portfolio. The list of stocks with excess returns above the cutoff point make up the portfolio that we should hold. The calculation is a bit complicated: first, we calculate a cutoff numerator and denominator for each stock as follows:

C_{num} = \frac{(Y_{return} – r_{riskfree}) \beta_{adj}}{\sigma_{stock}^2} \qquad
C_{denom} = \frac{\beta_{adj}^2}{\sigma_{stock}^2}

Essentially, the cutoff numerator is the risk-adjusted expected return for the stock, while the denominator is the squared risk; both terms are normalized by dividing them by the squared SEE of the regression model used to estimate Beta. The cutoff itself is calculated from the cumulative sums of each numerator and denominator and the variance of the market return:

C^* = \frac{\sigma_{market}^2 \displaystyle \sum_{i=1}^{j} C_{num}}{1 + \sigma_{market}^2 \sum_{i=1}^{j} C_{denom}}

Here, the sum for each stock is the cumulative sum of all terms with an excess return to Beta greater than or equal to that of the stock. Since we ordered the portfolio.ranking according to excess return, this means that the stock with the greatest excess return will have j=1, the next stock will have j=2, and so on.

# Estimate cut-off value for inclusion of stocks into the optimal portfolio
# Then subset optimal portfolio

# Need to convert our list to a data frame for this part (for the cumsum() function)

stocks <- names(portfolio.ranking)
values <- matrix(unlist(portfolio.ranking),nrow=length(portfolio.ranking), byrow=TRUE)
portfolio.optimal <- data.frame(stocks,values)
colnames(portfolio.optimal) <- c("Stock", "Alpha", "Beta", "Model.see", "Model.see.anual","Model.r.squared", "Beta.adj", "est.return", "ranking")

# Now estimate the cutoff; attach() just makes the calculation easier to follow

cutoff.numerator <- ((est.return - rf.rate) * Beta.adj) / Model.see^2
cutoff.denom <- Beta.adj^2 / Model.see^2

cutoff <- (var.spx.return * cumsum(cutoff.numerator)) / (1 + var.spx.return * cumsum(cutoff.denom))
portfolio.cutoff <- cbind(portfolio.optimal, cutoff) <- subset(portfolio.cutoff, ranking > cutoff)


Is this portfolio truly “optimal”? Answering this question requires you to take a stance on the assumptions of Modern Portfolio Theory: that stock returns are correlated with the market and can be represented as random variables with a stable mean and variance. This particular script also uses the single-index model: all stock returns can be estimated from the overall market. MPT contains other models that use separate indexes for different market sectors, and these will generally select a different portfolio (whether that portfolio will outperform the single-index model is another matter entirely).

Personally, I find Modern Portfolio Theory a plausible and compelling methodology for selecting stocks to invest in, particularly when you have a long-term perspective: individual stocks are certainly prone to idiosyncratic events (think the Volkswagen emissions scandal), but over the long term, they behave as a stationary process. Methodology aside, the tried-and-true road to wealth is to invest on a regular schedule for several years: do this, and whether you invest in the optimum portfolio selected by this script, or simply an index fund, you will put yourself in the best possible position to maximize your savings and realize your investment goals.

Troubleshooting at a Standstill? Feel Free to Reach Out!