Introducing Stepwise Correlation Rank

So in the last post, I attempted to replicate the Flexible Asset Allocation paper. I’d like to offer a thanks to Pat of Intelligent Trading Tech (not updated recently, hopefully this will change) for helping me corroborate the results, so that I have more confidence there isn’t an error in my code.

One of the procedures the authors of the FAA paper used is a correlation rank, which I interpreted as the average correlation of each security to the others.

The issue, pointed out to me in a phone conversation I had with David Varadi is that when considering correlation, shouldn’t the correlations the investor is concerned about be between instruments within the portfolio, as opposed to simply all the correlations, including to instruments not in the portfolio? To that end, when selecting assets (or possibly features in general), conceptually, it makes more sense to select in a stepwise fashion–that is, start off at a subset of the correlation matrix, and then rank assets in order of their correlation to the heretofore selected assets, as opposed to all of them. This was explained in Mr. Varadi’s recent post.

Here’s a work in progress function I wrote to formally code this idea:

stepwiseCorRank <- function(corMatrix, startNames=NULL, stepSize=1, bestHighestRank=FALSE) {
  #edge cases
  if(dim(corMatrix)[1] == 1) {
  } else if (dim(corMatrix)[1] == 2) {
    ranks <- c(1.5, 1.5)
    names(ranks) <- colnames(corMatrix)
  if(is.null(startNames)) {
    corSums <- rowSums(corMatrix)
    corRanks <- rank(corSums)
    startNames <- names(corRanks)[corRanks <= stepSize]
  nameList <- list()
  nameList[[1]] <- startNames
  rankList <- list()
  rankCount <- 1
  rankList[[1]] <- rep(rankCount, length(startNames))
  rankedNames <-, nameList)
  while(length(rankedNames) < nrow(corMatrix)) {
    rankCount <- rankCount+1
    subsetCor <- corMatrix[, rankedNames]
    if(class(subsetCor) != "numeric") {
      subsetCor <- subsetCor[!rownames(corMatrix) %in% rankedNames,]
      if(class(subsetCor) != "numeric") {
        corSums <- rowSums(subsetCor)
        corSumRank <- rank(corSums)
        lowestCorNames <- names(corSumRank)[corSumRank <= stepSize]
        nameList[[rankCount]] <- lowestCorNames
        rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
      } else { #1 name remaining
        nameList[[rankCount]] <- rownames(corMatrix)[!rownames(corMatrix) %in% names(subsetCor)]
        rankList[[rankCount]] <- rankCount
    } else {  #first iteration, subset on first name
      subsetCorRank <- rank(subsetCor)
      lowestCorNames <- names(subsetCorRank)[subsetCorRank <= stepSize]
      nameList[[rankCount]] <- lowestCorNames
      rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
    rankedNames <-, nameList)
  ranks <-, rankList)
  names(ranks) <- rankedNames
  if(bestHighestRank) {
    ranks <- 1+length(ranks)-ranks
  ranks <- ranks[colnames(corMatrix)] #return to original order

So the way the function works is that it takes in a correlation matrix, a starting name (if provided), and a step size (that is, how many assets to select per step, so that the process doesn’t become extremely long when dealing with larger amounts of assets/features). Then, it iterates–subset the correlation matrix on the starting name, and find the minimum value, and add it to a list of already-selected names. Next, subset the correlation matrix columns on the selected names, and the rows on the not selected names, and repeat, until all names have been accounted for. Due to R’s little habit of wiping out labels when a matrix becomes a vector, I had to write some special case code, which is the reason for two nested if/else statements (the first one being for the first column subset, and the second being for when there’s only one row remaining).

Also, if there’s an edge case (1 or 2 securities), then there is some functionality to handle those trivial cases.

Here’s a test script I wrote to test this function out:


#mid 1997 to end of 2012
getSymbols(mutualFunds, from="1997-06-30", to="2012-12-31")
tmp <- list()
for(fund in mutualFunds) {
  tmp[[fund]] <- Ad(get(fund))

#always use a list hwne intending to cbind/rbind large quantities of objects
adPrices <-, args = tmp)
colnames(adPrices) <- gsub(".Adjusted", "", colnames(adPrices))

adRets <- Return.calculate(adPrices)

subset <- adRets["2012"]
corMat <- cor(subset)

tmp <- list()
for(i in 1:length(mutualFunds)) {
  rankRow <- stepwiseCorRank(corMat, startNames=mutualFunds[i])
  tmp[[i]] <- rankRow
rankDemo <-, tmp)
rownames(rankDemo) <- mutualFunds
origRank <- rank(rowSums(corMat))
rankDemo <- rbind(rankDemo, origRank)
rownames(rankDemo)[8] <- "Original (VBMFX)"

heatmap(-rankDemo, Rowv=NA, Colv=NA, col=heat.colors(8), margins=c(6,6))

Essentially, using the 2012 year of returns for the 7 FAA mutual funds, I compared how different starting securities changed the correlation ranking sequence.

Here are the results:

VTSMX              1     6     7     4     2     3     5
FDIVX              6     1     7     4     2     5     3
VEIEX              6     7     1     4     2     3     5
VFISX              2     6     7     1     3     4     5
VBMFX              2     6     7     4     1     3     5
QRAAX              5     6     7     4     2     1     3
VGSIX              5     6     7     4     2     3     1
Non-Sequential     5     6     7     2     1     3     4

In short, the algorithm is rather robust to starting security selection, at least judging by this small example. However, comparing VBMFX start to the non-sequential ranking, we see that VFISX changes from rank 2 in the non-sequential to rank 4, with VTSMX going from rank 5 to rank 2. From an intuitive perspective, this makes sense, as both VBMFX and VFISX are bond funds, which have a low correlation with the other 5 equity-based mutual funds, but a higher correlation with each other, thus signifying that the algorithm seems to be working as intended, at least insofar as this small example demonstrates. Here’s a heatmap to demonstrate this in visual form.

The ranking order (starting security) is the vertical axis, and the horizontal are the ranks, from white being first, to red being last. Notice once again that the ranking orders are robust in general (consider each column of colors descending), but each particular ranking order is unique.

So far, this code still has to be tested in terms of its applications to portfolio management and asset allocation, but for those interested in such an idea, it’s my hope that this provides a good reference point.

Thanks for reading.

7 thoughts on “Introducing Stepwise Correlation Rank

  1. Pingback: The Whole Street’s Daily Wrap for 10/27/2014 | The Whole Street

  2. Pingback: Combining FAA and Stepwise Correlation | QuantStrat TradeR

    • I wouldn’t say I recompute correlations, so much as recompute the relevant sums of correlations. And that’s really interesting how I inadvertently stumbled on something from genomics. I suppose it makes sense though, since when dealing with high-dimensional data, collinearity, correlated features, and so on, which seem to be everywhere in genomics, trying to obtain the correct features is such an issue. If you find a use for this algorithm, I’d love to know how much it helps out.

  3. Pingback: An Update on Flexible Asset Allocation | QuantStrat TradeR

  4. Pingback: Comparing Flexible and Elastic Asset Allocation | QuantStrat TradeR

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s