Using R to Select an Optimum Stock Portfolio

Part I: Preliminaries


[mathjax]The portfolio selection problem is to decide which combination of securities you should hold given your investment objectives and tolerance for risk. As with investing in general, the trade-off is between risk and return: the more diversified your portfolio (i.e. the more securities you hold), the more insulated you are from large decreases in any particular security. One the other hand, the more securities you hold, the closer your average return will be to the market as a whole: substantial increases in some securities will be mitigated by decreases in most others.

Script Description

The following R script uses Modern Portfolio Theory to provide one answer to this question. Here’s how it works:

  • First, the script loads daily stock price data for all stocks in the S&P 500 index, and the index itself. It trims extreme values and verifies that the stock returns are normally distributed, which is required by the model.
  • Next, it calculates the relationship between each stock and the market as a whole by regressing daily stock returns on daily index values. The coefficient Beta (\beta) is a measure of how much the return on each stock varies with movements in the overall market.
  • Third, it calculates the excess return to Beta: this is a measure of how much additional return the stock provides relative to the market: the greater the excess return, the greater the potential reward relative to the risk of holding that stock, compared to the overall risk of the market as a whole.
  • Finally, the script estimates a cutoff value based on these excess returns: in this model, the optimum portfolio contains all stocks with an excess return to Beta greater than this cutoff value.

Files Used in This Post

You can download all files referenced here at the following address. This includes the full R script, both shell scripts, and the list of stocks; I’ve also included an example of the risk-free rate calculation.

Files for post “Using R to Select an Optimum Stock Portfolio with Modern Portfolio Theory”

Now, let’s get started!

Before You Begin

Before you can run the script, you need to do the following:

  1. Download daily prices for all stocks in the S&P 500 index
  2. Download daily S&P 500 index values
  3. Calculate the risk-free rate

Download Daily Stock Prices

The R script uses the daily closing price of each stock in the S&P 500 index, so we need to provide it with that data. To do this, run the StockDownloadScript shell script, which reads as follows:


if [ -z "$2" ]
    then directory="."

while read -r symbol
    wget "$symbol" -P "$directory"
done < "$filename"

You run this from the terminal as follows:

./ StockList StockData

Here, StockList is just a list of the stock symbols that make up the S&P 500, and StockData is the download directory (or current directory, if not specified). This will download daily closing prices for each stock to a separate file in the download directory. The files will look like this:


The upper-case letters are the stock symbols: for example, “A” is the symbol for Agilent Technologies, Inc. The “table.csv?s=” is a leftover part of how Yahoo stores their stock data; we can remove this with the RenameStockFiles script, which looks like this:

if [ -z "$1" ]
    then directory="."

for file in "$directory"/table.csv?s=*; do
    mv "$file" "${file/table.csv?s=/}"

You run this from the terminal as follows:

./ StockData

This will strip out the table prefixes so that your directory contains one file for each stock.

Download S&P 500 Index Values

The main part of the R script estimates the correlation between the closing price of each stock and the closing value of the S&P 500 index on the same day. You can download the index values directly from Yahoo! Finance at S&P 500 Index Historical Values. Set the date range from January 1, 2016 to today’s date, and make sure you select Daily Returns. Download the file to your stock data directory and rename it “SPX”.

Calculate the Risk-Free Rate

Since we are considering the relative risk of owning stocks compared to other investments, we need to know what return we could expect from an asset without any risk. Of course, there’s no such thing, but the yield on the 10-year Treasury bond is often used as a good approximation, since it’s backed by the full faith and credit of the United States government.

You can find the daily yield rates on the U.S. Treasury Department’s website at Daily Treasury Yield Curve Rates. You’ll need to highlight the rates with your mouse and copy them to a spreadsheet, and you’ll need to do this separately for both 2016 and 2017 data (as of this post, the XML download feature was broken). In the spreadsheet, calculate the geometric mean of the 10-year rate using the GEOMEAN() function: this will be the risk-free rate.

Next Steps

Now that the preliminaries are done, we can move on to selecting an optimum portfolio, as described in this page:

Selecting the Stocks

If something isn’t working, please fill out the contact form below, and I’ll do my best to help.

Troubleshooting at a Standstill? Feel Free to Reach Out: