A Return.Portfolio Wrapper to Automate Harry Long Seeking Alpha Backtests

This post will cover a function to simplify creating Harry Long type rebalancing strategies from SeekingAlpha for interested readers. As Harry Long has stated, most, if not all of his strategies are more for demonstrative purposes rather than actual recommended investments.

So, since Harry Long has been posting some more articles on Seeknig Alpha, I’ve had a reader or two ask me to analyze his strategies (again). Instead of doing that, however, I’ll simply put this tool here, which is a wrapper that automates the acquisition of data and simulates portfolio rebalancing with one line of code.

Here’s the tool.


LongSeeker <- function(symbols, weights, rebalance_on = "years", 
                       displayStats = TRUE, outputReturns = FALSE) {
  getSymbols(symbols, src='yahoo', from = '1990-01-01')
  prices <- list()
  for(i in 1:length(symbols)) {
    if(symbols[i] == "ZIV") {
      download("https://www.dropbox.com/s/jk3ortdyru4sg4n/ZIVlong.TXT", destfile="ziv.txt")
      ziv <- xts(read.zoo("ziv.txt", header=TRUE, sep=",", format="%Y-%m-%d"))
      prices[[i]] <- Cl(ziv)
    } else if (symbols[i] == "VXX") {
      vxx <- xts(read.zoo("vxx.txt", header=TRUE, sep=",", format="%Y-%m-%d"))
      prices[[i]] <- Cl(vxx)
    else {
      prices[[i]] <- Ad(get(symbols[i]))
  prices <- do.call(cbind, prices)
  prices <- na.locf(prices)
  returns <- na.omit(Return.calculate(prices))
  returns$zeroes <- 0
  weights <- c(weights, 1-sum(weights))
  stratReturns <- Return.portfolio(R = returns, weights = weights, rebalance_on = rebalance_on)
  if(displayStats) {
    stats <- rbind(table.AnnualizedReturns(stratReturns), maxDrawdown(stratReturns), CalmarRatio(stratReturns))
    rownames(stats)[4] <- "Max Drawdown"
  if(outputReturns) {

It fetches the data for you (usually from Yahoo, but a big thank you to Mr. Helumth Vollmeier in the case of ZIV and VXX), and has the option of either simply displaying an equity curve and some statistics (CAGR, annualized standard dev, Sharpe, max Drawdown, Calmar), or giving you the return stream as an output if you wish to do more analysis in R.

Here’s an example of simply getting the statistics, with an 80% XLP/SPLV (they’re more or less interchangeable) and 20% TMF (aka 60% TLT, so an 80/60 portfolio), from one of Harry Long’s articles.

LongSeeker(c("XLP", "TLT"), c(.8, .6))


Annualized Return                 0.1321000
Annualized Std Dev                0.1122000
Annualized Sharpe (Rf=0%)         1.1782000
Max Drawdown                      0.2330366
Calmar Ratio                      0.5670285

Equity curve:

Nothing out of the ordinary of what we might expect from a balanced equity/bonds portfolio. Generally does well, has its largest drawdown in the financial crisis, and some other bumps in the road, but overall, I’d think a fairly vanilla “set it and forget it” sort of thing.

And here would be the way to get the stream of individual daily returns, assuming you wanted to rebalance these two instruments weekly, instead of yearly (as is the default).

tmp <- LongSeeker(c("XLP", "TLT"), c(.8, .6), rebalance_on="weeks",
                    displayStats = FALSE, outputReturns = TRUE)

And now let’s get some statistics.


Which give:

> table.AnnualizedReturns(tmp)
Annualized Return                    0.1328
Annualized Std Dev                   0.1137
Annualized Sharpe (Rf=0%)            1.1681
> maxDrawdown(tmp)
[1] 0.2216417
> CalmarRatio(tmp)
Calmar Ratio         0.5990087

Turns out, moving the rebalancing from annually to weekly didn’t have much of an effect here (besides give a bunch of money to your broker, if you factored in transaction costs, which this doesn’t).

So, that’s how this tool works. The results, of course, begin from the latest instrument’s inception. The trick, in my opinion, is to try and find proxy substitutes with longer histories for newer ETFs that are simply leveraged ETFs, such as using a 60% weight in TLT with an 80% weight in XLP instead of a 20% weight in TMF with 80% allocation in SPLV.

For instance, here are some proxies:

TMF = TLT * 3

That said, I’ve worked with Harry Long before, and he develops more sophisticated strategies behind the scenes, so I’d recommend that SeekingAlpha readers take his publicly released strategies as concept demonstrations, as opposed to fully-fledged investment ideas, and contact Mr. Long himself about more customized, private solutions for investment institutions if you are so interested.

Thanks for reading.

NOTE: I am currently in the northeast. While I am currently contracting, I am interested in networking with individuals or firms with regards to potential collaboration opportunities.

Hypothesis-Driven Development Part V: Stop-Loss, Deflating Sharpes, and Out-of-Sample

This post will demonstrate a stop-loss rule inspired by Andrew Lo’s paper “when do stop-loss rules stop losses”? Furthermore, it will demonstrate how to deflate a Sharpe ratio to account for the total number of trials conducted, which is presented in a paper written by David H. Bailey and Marcos Lopez De Prado. Lastly, the strategy will be tested on the out-of-sample ETFs, rather than the mutual funds that have been used up until now (which actually cannot be traded more than once every two months, but have been used simply for the purpose of demonstration).

First, however, I’d like to fix some code from the last post and append some results.

A reader asked about displaying the max drawdown for each of the previous rule-testing variants based off of volatility control, and Brian Peterson also recommended displaying max leverage, which this post will provide.

Here’s the updated rule backtest code:

ruleBacktest <- function(returns, nMonths, dailyReturns,
nSD=126, volTarget = .1) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- na.omit(xts(nMonthAverage, order.by = index(returns)))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(nMonthAverage))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
constantVolComponents <- cbind(dailyBacktest, 0)
out <- Return.portfolio(R = constantVolComponents, weights = wts)
out <- apply.monthly(out, Return.cumulative)
maxLeverage <- max(monthlyLeverage, na.rm = TRUE)
return(list(out, maxLeverage))

t1 <- Sys.time()
allPermutations <- list()
allDDs <- list()
leverages <- list()
for(i in seq(21, 252, by = 21)) {
monthVariants <- list()
ddVariants <- list()
leverageVariants <- list()
for(j in 1:12) {
trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
sharpe <- table.AnnualizedReturns(trial[[1]])[3,]
dd <- maxDrawdown(trial[[1]])
monthVariants[[j]] <- sharpe
ddVariants[[j]] <- dd
leverageVariants[[j]] <- trial[[2]]
allPermutations[[i]] <- do.call(c, monthVariants)
allDDs[[i]] <- do.call(c, ddVariants)
leverages[[i]] <- do.call(c, leverageVariants)
allPermutations <- do.call(rbind, allPermutations)
allDDs <- do.call(rbind, allDDs)
leverages <- do.call(rbind, leverages)
t2 <- Sys.time()



Here are the results presented as a hypothesis test–a linear regression of drawdowns and leverage against momentum formation period and volatility calculation period:

ddLM <- lm(meltedDDs$MaxDD~meltedDDs$volFormation + meltedDDs$momentumFormation)

lm(formula = meltedDDs$MaxDD ~ meltedDDs$volFormation + meltedDDs$momentumFormation)

Min 1Q Median 3Q Max
-0.08022 -0.03434 -0.00135 0.02911 0.20077

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.240146 0.010922 21.99 < 2e-16 ***
meltedDDs$volFormation -0.000484 0.000053 -9.13 6.5e-16 ***
meltedDDs$momentumFormation 0.001533 0.001112 1.38 0.17
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0461 on 141 degrees of freedom
Multiple R-squared: 0.377, Adjusted R-squared: 0.368
F-statistic: 42.6 on 2 and 141 DF, p-value: 3.32e-15

levLM <- lm(meltedLeverage$MaxLeverage~meltedLeverage$volFormation + meltedDDs$momentumFormation)

lm(formula = meltedLeverage$MaxLeverage ~ meltedLeverage$volFormation +

Min 1Q Median 3Q Max
-0.9592 -0.5179 -0.0908 0.3679 3.1022

Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.076870 0.164243 24.82 <2e-16 ***
meltedLeverage$volFormation -0.009916 0.000797 -12.45 <2e-16 ***
meltedDDs$momentumFormation 0.009869 0.016727 0.59 0.56
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.693 on 141 degrees of freedom
Multiple R-squared: 0.524, Adjusted R-squared: 0.517
F-statistic: 77.7 on 2 and 141 DF, p-value: <2e-16

Easy interpretation here–the shorter-term volatility estimates are unstable due to the one-asset rotation nature of the system. Particularly silly is using the one-month volatility estimate. Imagine the system just switched from the lowest-volatility instrument to the highest. It would then take excessive leverage and get blown up that month for no particularly good reason. A longer-term volatility estimate seems to do much better for this system. So, while the Sharpe is generally improved, the results become far more palatable when using a more stable calculation for volatility, which sets maximum leverage to about 2 when targeting an annualized volatility of 10%. Also, to note, the period to compute volatility matters far more than the momentum formation period when addressing volatility targeting, which lends credence (at least in this case) to so many people that say “the individual signal rules matter far less than the position-sizing rules!”. According to some, position sizing is often a way for people to mask only marginally effective (read: bad) strategies with a separate layer to create a better result. I’m not sure which side of the debate (even assuming there is one) I fall upon, but for what it’s worth, there it is.

Moving on, I want to test out one more rule, which is inspired by Andrew Lo’s stop-loss rule. Essentially, the way it works is this (to my interpretation): it evaluates a running standard deviation, and if the drawdown exceeds some threshold of the running standard deviation, to sit out for some fixed period of time, and then re-enter. According to Andrew Lo, stop-losses help momentum strategies, so it seems as good a rule to test as any.

However, rather than test different permutations of the stop rule on all 144 prior combinations of volatility-adjusted configurations, I’m going to take an ensemble strategy, inspired by a conversation I had with Adam Butler, the CEO of ReSolve Asset Management, who stated that “we know momentum exists, but we don’t know the perfect way to measure it”, from the section I just finished up and use an equal weight of all 12 of the momentum formation periods with a 252-day rolling annualized volatility calculation, and equal weight them every month.

Here are the base case results from that trial (bringing our total to 169).

strat <- list()
for(i in 1:12) {
strat[[i]] <- ruleBacktest(returns = monthRets, nMonths = i, dailyReturns = sample, nSD = 252)[[1]]
strat <- do.call(cbind, strat)
strat <- Return.portfolio(R = na.omit(strat), rebalance_on="months")

rbind(table.AnnualizedReturns(strat), maxDrawdown(strat), CalmarRatio(strat))

With the following result:

Annualized Return 0.12230
Annualized Std Dev 0.10420
Annualized Sharpe (Rf=0%) 1.17340
Worst Drawdown 0.09616
Calmar Ratio 1.27167

Of course, also worth nothing is that the annualized standard deviation is indeed very close to 10%, even with the ensemble. And it’s nice that there is a Sharpe past 1. Of course, given that these are mutual funds being backtested, these results are optimistic due to the unrealistic execution assumptions (can’t trade sooner than once every *two* months).

Anyway, let’s introduce our stop-loss rule, inspired by Andrew Lo’s paper.

loStopLoss <- function(returns, sdPeriod = 12, sdScaling = 1, sdThresh = 1.5, cooldown = 3) {
stratRets <- list()
count <- 1
stratComplete <- FALSE
originalRets <- returns
ddThresh <- -runSD(returns, n = sdPeriod) * sdThresh * sdScaling
while(!stratComplete) {
retDD <- PerformanceAnalytics:::Drawdowns(returns)
DDbreakthrough <- retDD < ddThresh & lag(retDD) > ddThresh
firstBreak <- which.max(DDbreakthrough) #first threshold breakthrough, if 1, we have no breakthrough
#the above line is unintuitive since this is a boolean vector, so it returns the first value of TRUE
if(firstBreak > 1) { #we have a drawdown breakthrough if this is true
stratRets[[count]] <- returns[1:firstBreak,] #subset returns through our threshold breakthrough
nextPoint <- firstBreak + cooldown + 1 #next point of re-entry is the point after the cooldown period
if(nextPoint <= (nrow(returns)-1)) { #if we can re-enter, subset the returns and return to top of loop
returns <- returns[nextPoint:nrow(returns),]
ddThresh <- ddThresh[nextPoint:nrow(ddThresh),]
count <- count+1
} else { #re-entry point is after data exhausted, end strategy
stratComplete <- TRUE
} else { #there are no more critical drawdown breakthroughs, end strategy
stratRets[[count]] <- returns
stratComplete <- TRUE
stratRets <- do.call(rbind, stratRets) #combine returns
expandRets <- cbind(stratRets, originalRets) #account for all the days we missed
expandRets[is.na(expandRets[,1]), 1] <- 0 #cash positions will be zero
rets <- expandRets[,1]
colnames(rets) <- paste(cooldown, sdThresh, sep="_")

Essentially, the way it works is like this: the function computes all the drawdowns for a return series, along with its running standard deviation (non-annualized–if you want to annualize it, change the sdScaling parameter to something like sqrt(12) for monthly or sqrt(252) for daily data). Next, it looks for when the drawdown crossed a critical threshold, then cuts off that portion of returns and standard deviation history, and moves ahead in history by the cooldown period specified, and repeats. Most of the code is simply dealing with corner cases (is there even a time to use the stop rule? What about iterating when there isn’t enough data left?), and then putting the results back together again.

In any case, for the sake of simplicity, this function doesn’t use two different time scales (IE compute volatility using daily data, make decisions monthly), so I’m sticking with using a 12-month rolling volatility, as opposed to 252 day rolling volatility multiplied by the square root of 21.

Finally, here are another 54 runs to see if Andrew Lo’s stop-loss rule works here. Essentially, the intuition behind this is that if the strategy breaks down, it’ll continue to break down, so it would be prudent to just turn it off for a little while.

Here are the trial runs:

threshVec <- seq(0, 2, by=.25)
cooldownVec <- c(1:6)
sharpes <- list()
params <- expand.grid(threshVec, cooldownVec)
for(i in 1:nrow(params)) {
configuration <- loStopLoss(returns = strat, sdThresh = params[i,1],
cooldown = params[i, 2])
sharpes[[i]] <- table.AnnualizedReturns(configuration)[3,]
sharpes <- do.call(c, sharpes)

loStoplossFrame <- cbind(params, sharpes)
loStoplossFrame$improvement <- loStoplossFrame[,3] - table.AnnualizedReturns(strat)[3,]

colnames(loStoplossFrame) <- c("Threshold", "Cooldown", "Sharpe", "Improvement")

And a plot of the results.

ggplot(loStoplossFrame, aes(x = Threshold, y = Cooldown, fill=Improvement)) +
geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = 0)


Result: at this level, and at this frequency (retaining the monthly decision-making process), the stop-loss rule basically does nothing in order to improve the risk-reward trade-off in the best case scenarios, and in most scenarios, simply hurts. 54 trials down the drain, bringing us up to 223 trials. So, what does the final result look like?


Here’s the final in-sample equity curve–and the first one featured in this entire series. This is, of course, a *feature* of hypothesis-driven development. Playing whack-a-mole with equity curve bumps is what is a textbook case of overfitting. So, without further ado:

And now we can see why stop-loss rules generally didn’t add any value to this strategy. Simply, it had very few periods of sustained losses at the monthly frequency, and thus, very little opportunity for a stop-loss rule to add value. Sure, the occasional negative month crept in, but there was no period of sustained losses. Furthermore, Yahoo Finance may not have perfect fidelity on dividends on mutual funds from the late 90s to early 2000s, so the initial flat performance may also be a rather conservative estimate on the strategy’s performance (then again, as I stated before, using mutual funds themselves is optimistic given the unrealistic execution assumptions, so maybe it cancels out). Now, if this equity curve were to be presented without any context, one may easily question whether or not it was curve-fit. To an extent, one can argue that the volatility computation period may be optimized, though I’d hardly call a 252-day (one-year) rolling volatility estimate a curve-fit.

Next, I’d like to introduce another concept on this blog that I’ve seen colloquially addressed in other parts of the quantitative blogging space, particularly by Mike Harris of Price Action Lab, namely that of multiple hypothesis testing, and about the need to correct for that.

Luckily for that, Drs. David H. Bailey and Marcos Lopez De Prado wrote a paper to address just that. Also, I’d like to note one very cool thing about this paper: it actually has a worked-out numerical example! In my opinion, there are very few things as helpful as showing a simple result that transforms a collection of mathematical symbols into a result to demonstrate what those symbols actually mean in the span of one page. Oh, and it also includes *code* in the appendix (albeit Python — even though, you know, R is far more developed. If someone can get Marcos Lopez De Prado to switch to R–aka the better research language, that’d be a godsend!).

In any case, here’s the formula for the deflated Sharpe ratio, implemented straight from the paper.

deflatedSharpe <- function(sharpe, nTrials, varTrials, skew, kurt, numPeriods, periodsInYear) {
emc <- .5772
sr0_term1 <- (1 - emc) * qnorm(1 - 1/nTrials)
sr0_term2 <- emc * qnorm(1 - 1/nTrials * exp(-1))
sr0 <- sqrt(varTrials * 1/periodsInYear) * (sr0_term1 + sr0_term2)

numerator <- (sharpe/sqrt(periodsInYear) - sr0)*sqrt(numPeriods - 1)

skewnessTerm <- 1 - skew * sharpe/sqrt(periodsInYear)
kurtosisTerm <- (kurt-1)/4*(sharpe/sqrt(periodsInYear))^2

denominator <- sqrt(skewnessTerm + kurtosisTerm)

result <- pnorm(numerator/denominator)
pval <- 1-result

The inputs are the strategy’s Sharpe ratio, the number of backtest runs, the variance of the sharpe ratios of those backtest runs, the skewness of the candidate strategy, its non-excess kurtosis, the number of periods in the backtest, and the number of periods in a year. Unlike the De Prado paper, I choose to return the p-value (EG 1-.

Let’s collect all our Sharpe ratios now.

allSharpes <- c(as.numeric(table.AnnualizedReturns(sigBoxplots)[3,]),

And now, let’s plug and chug!

stratSignificant <- deflatedSharpe(sharpe = as.numeric(table.AnnualizedReturns(strat)[3,]),
nTrials = length(allSharpes), varTrials = var(allSharpes),
skew = as.numeric(skewness(strat)), kurt = as.numeric(kurtosis(strat)) + 3,
numPeriods = nrow(strat), periodsInYear = 12)

And the result!

> stratSignificant
[1] 0.01311

Success! At least at the 5% level…and a rejection at the 1% level, and any level beyond that.

So, one last thing! Out-of-sample testing on ETFs (and mutual funds during the ETF burn-in period)!

symbols2 <- c("CWB", "JNK", "TLT", "SHY", "PCY")
getSymbols(symbols2, from='1900-01-01')
prices2 <- list()
for(tmp in symbols2) {
prices2[[tmp]] <- Ad(get(tmp))
prices2 <- do.call(cbind, prices2)
colnames(prices2) <- substr(colnames(prices2), 1, 3)
returns2 <- na.omit(Return.calculate(prices2))

monthRets2 <- apply.monthly(returns2, Return.cumulative)

oosStrat <- list()
for(i in 1:12) {
oosStrat[[i]] <- ruleBacktest(returns = monthRets2, nMonths = i, dailyReturns = returns2, nSD = 252)[[1]]
oosStrat <- do.call(cbind, oosStrat)
oosStrat <- Return.portfolio(R = na.omit(oosStrat), rebalance_on="months")

symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
prices[[symbol]] <- Ad(get(symbol))
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
oosMFreturns <- na.omit(Return.calculate(prices))
oosMFmonths <- apply.monthly(oosMFreturns, Return.cumulative)

oosMF <- list()
for(i in 1:12) {
oosMF[[i]] <- ruleBacktest(returns = oosMFmonths, nMonths = i, dailyReturns = oosMFreturns, nSD=252)[[1]]
oosMF <- do.call(cbind, oosMF)
oosMF <- Return.portfolio(R = na.omit(oosMF), rebalance_on="months")
oosMF <- oosMF["2009-04/2011-03"]

fullOOS <- rbind(oosMF, oosStrat)

rbind(table.AnnualizedReturns(fullOOS), maxDrawdown(fullOOS), CalmarRatio(fullOOS))

And the results:

Annualized Return 0.1273
Annualized Std Dev 0.0901
Annualized Sharpe (Rf=0%) 1.4119
Worst Drawdown 0.1061
Calmar Ratio 1.1996

And one more equity curve (only the second!).

In other words, the out-of-sample statistics compare to the in-sample statistics. The Sharpe ratio is higher, the Calmar slightly lower. But on a whole, the performance has kept up. Unfortunately, the strategy is currently in a drawdown, but that’s the breaks.

So, whew. That concludes my first go at hypothesis-driven development, and has hopefully at least demonstrated the process to a satisfactory degree. What started off as a toy strategy instead turned from a rejection to a not rejection to demonstrating ideas from three separate papers, and having out-of-sample statistics that largely matched if not outperformed the in-sample statistics. For those thinking about investing in this strategy (again, here is the strategy: take 12 different portfolios, each selecting the asset with the highest momentum over months 1-12, target an annualized volatility of 10%, with volatility defined as the rolling annualized 252-day standard deviation, and equal-weight them every month), what I didn’t cover was turnover and taxes (this is a bond ETF strategy, so dividends will play a large role).

Now, one other request–many of the ideas for this blog come from my readers. I am especially interested in things to think about from readers with line-management responsibilities, as I think many of the questions from those individuals are likely the most universally interesting ones. If you’re one such individual, I’d appreciate an introduction, and knowing who more of the individuals in my reader base are.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up, consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Hypothesis Driven Development Part IV: Testing The Barroso/Santa Clara Rule

This post will deal with applying the constant-volatility procedure written about by Barroso and Santa Clara in their paper “Momentum Has Its Moments”.

The last two posts dealt with evaluating the intelligence of the signal-generation process. While the strategy showed itself to be marginally better than randomly tossing darts against a dartboard and I was ready to reject it for want of moving onto better topics that are slightly less of a toy in terms of examples than a little rotation strategy, Brian Peterson told me to see this strategy through to the end, including testing out rule processes.

First off, to make a distinction, rules are not signals. Rules are essentially a way to quantify what exactly to do assuming one acts upon a signal. Things such as position sizing, stop-loss processes, and so on, all fall under rule processes.

This rule deals with using leverage in order to target a constant volatility.

So here’s the idea: in their paper, Pedro Barroso and Pedro Santa Clara took the Fama-French momentum data, and found that the classic WML strategy certainly outperforms the market, but it has a critical downside, namely that of momentum crashes, in which being on the wrong side of a momentum trade will needlessly expose a portfolio to catastrophically large drawdowns. While this strategy is a long-only strategy (and with fixed-income ETFs, no less), and so would seem to be more robust against such massive drawdowns, there’s no reason to leave money on the table. To note, not only have Barroso and Santa Clara covered this phenomena, but so have others, such as Tony Cooper in his paper “Alpha Generation and Risk Smoothing Using Volatility of Volatility”.

In any case, the setup here is simple: take the previous portfolios, consisting of 1-12 month momentum formation periods, and every month, compute the annualized standard deviation, using a 21-252 (by 21) formation period, for a total of 12 x 12 = 144 trials. (So this will put the total trials run so far at 24 + 144 = 168…bonus points if you know where this tidbit is going to go).

Here’s the code (again, following on from the last post, which follows from the second post, which follows from the first post in this series).


ruleBacktest <- function(returns, nMonths, dailyReturns,
                         nSD=126, volTarget = .1) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
  nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
  selection <- (nMonthAvgRank==5) * 1 #select highest average performance
  dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
  constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
  monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
  wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
  constantVolComponents <- cbind(dailyBacktest, 0)
  out <- Return.portfolio(R = constantVolComponents, weights = wts)
  out <- apply.monthly(out, Return.cumulative)

t1 <- Sys.time()
allPermutations <- list()
for(i in seq(21, 252, by = 21)) {
  monthVariants <- list()
  for(j in 1:12) {
    trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
    sharpe <- table.AnnualizedReturns(trial)[3,]
    monthVariants[[j]] <- sharpe
  allPermutations[[i]] <- do.call(c, monthVariants)
allPermutations <- do.call(rbind, allPermutations)
t2 <- Sys.time()

rownames(allPermutations) <- seq(21, 252, by = 21)
colnames(allPermutations) <- 1:12

baselineSharpes <- table.AnnualizedReturns(algoPortfolios)[3,]
baselineSharpeMat <- matrix(rep(baselineSharpes, 12), ncol=12, byrow=TRUE)

diffs <- allPermutations - as.numeric(baselineSharpeMat)
meltedDiffs <-melt(diffs)

colnames(meltedDiffs) <- c("volFormation", "momentumFormation", "sharpeDifference")
ggplot(meltedDiffs, aes(x = momentumFormation, y = volFormation, fill=sharpeDifference)) + 
  geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red")

meltedSharpes <- melt(allPermutations)
colnames(meltedSharpes) <- c("volFormation", "momentumFormation", "Sharpe")
ggplot(meltedSharpes, aes(x = momentumFormation, y = volFormation, fill=Sharpe)) + 
  geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = mean(allPermutations))

Again, there’s no parallel code since this is a relatively small example, and I don’t know which OS any given instance of R runs on (windows/linux have different parallelization infrastructure).

So the idea here is to simply compare the Sharpe ratios with different volatility lookback periods against the baseline signal-process-only portfolios. The reason I use Sharpe ratios, and not say, CAGR, volatility, or drawdown is that Sharpe ratios are scale-invariant. In this case, I’m targeting an annualized volatility of 10%, but with a higher targeted volatility, one can obtain higher returns at the cost of higher drawdowns, or be more conservative. But the Sharpe ratio should stay relatively consistent within reasonable bounds.

So here are the results:

Sharpe improvements:

In this case, the diagram shows that on a whole, once the volatility estimation period becomes long enough, the results are generally positive. Namely, that if one uses a very short estimation period, that volatility estimate is more dependent on the last month’s choice of instrument, as opposed to the longer-term volatility of the system itself, which can create poor forecasts. Also to note is that the one-month momentum formation period doesn’t seem overly amenable to the constant volatility targeting scheme (there’s basically little improvement if not a slight drag on risk-adjusted performance). This is interesting in that the baseline Sharpe ratio for the one-period formation is among the best of the baseline performances. However, on a whole, the volatility targeting actually does improve risk-adjusted performance of the system, even one as simple as throwing all your money into one asset every month based on a single momentum signal.

Absolute Sharpe ratios:

In this case, the absolute Sharpe ratios look fairly solid for such a simple system. The 3, 7, and 9 month variants are slightly lower, but once the volatility estimation period reaches between 126 and 252 days, the results are fairly robust. The Barroso and Santa Clara paper uses a period of 126 days to estimate annualized volatility, which looks solid across the entire momentum formation period spectrum.

In any case, it seems the verdict is that a constant volatility target improves results.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests

This post will show how to use Monte Carlo to test for signal intelligence.

Although I had rejected this strategy in the last post, I was asked to do a monte-carlo analysis of a thousand random portfolios to see how the various signal processes performed against said distribution. Essentially, the process is quite simple: as I’m selecting one asset each month to hold, I simply generate a random number between 1 and the amount of assets (5 in this case), and hold it for the month. Repeat this process for the number of months, and then repeat this process a thousand times, and see where the signal processes fall across that distribution.

I didn’t use parallel processing here since Windows and Linux-based R have different parallel libraries, and in the interest of having the code work across all machines, I decided to leave it off.

Here’s the code:

randomAssetPortfolio <- function(returns) {
  numAssets <- ncol(returns)
  numPeriods <- nrow(returns)
  assetSequence <- sample.int(numAssets, numPeriods, replace=TRUE)
  wts <- matrix(nrow = numPeriods, ncol=numAssets, 0)
  wts <- xts(wts, order.by=index(returns))
  for(i in 1:nrow(wts)) {
    wts[i,assetSequence[i]] <- 1
  randomPortfolio <- Return.portfolio(R = returns, weights = wts)

t1 <- Sys.time()
randomPortfolios <- list()
for(i in 1:1000) {
  randomPortfolios[[i]] <- randomAssetPortfolio(monthRets)
randomPortfolios <- do.call(cbind, randomPortfolios)
t2 <- Sys.time()

algoPortfolios <- sigBoxplots[,1:12]
randomStats <- table.AnnualizedReturns(randomPortfolios)
algoStats <- table.AnnualizedReturns(algoPortfolios)

hist(as.numeric(randomStats[1,]), breaks = 20, main = 'histogram of monte carlo annualized returns',
     xlab='annualized returns')
abline(v=as.numeric(algoStats[1,]), col='red')
hist(as.numeric(randomStats[2,]), breaks = 20, main = 'histogram of monte carlo volatilities',
     xlab='annualized vol')
abline(v=as.numeric(algoStats[2,]), col='red')
hist(as.numeric(randomStats[3,]), breaks = 20, main = 'histogram of monte carlo Sharpes',
     xlab='Sharpe ratio')
abline(v=as.numeric(algoStats[3,]), col='red')

allStats <- cbind(randomStats, algoStats)
aggregateMean <- apply(allStats, 1, mean)
aggregateDevs <- apply(allStats, 1, sd)

algoPs <- 1-pnorm(as.matrix((algoStats - aggregateMean)/aggregateDevs))

plot(as.numeric(algoPs[1,])~c(1:12), main='Return p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(1-as.numeric(algoPs[2,])~c(1:12), ylim=c(0, .5), main='Annualized vol p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(as.numeric(algoPs[3,])~c(1:12), main='Sharpe p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

And here are the results:

In short, compared to monkeys throwing darts, to use some phrasing from the Price Action Lab blog, these signal processes are only marginally intelligent, if at all, depending on the variation one chooses. Still, I was recommended to see this process through the end, and evaluate rules, so next time, I’ll evaluate one easy-to-implement rule.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

This post will begin to apply a hypothesis-driven development framework (that is, the framework written by Brian Peterson on how to do strategy construction correctly, found here) to a strategy I’ve come across on SeekingAlpha. Namely, Cliff Smith posted about a conservative bond rotation strategy, which makes use of short-term treasuries, long-term treasuries, convertibles, emerging market debt, and high-yield corporate debt–that is, SHY, TLT, CWB, PCY, and JNK. What this post will do is try to put a more formal framework on whether or not this strategy is a valid one to begin with.

One note: For the sake of balancing succinctness for blog consumption and to demonstrate the computational techniques more quickly, I’ll be glossing over background research write-ups for this post/strategy, since it’s yet another take on time-series/cross-sectional momentum, except pared down to something more implementable for individual investors, as opposed to something that requires a massive collection of different instruments for massive, institutional-class portfolios.

Introduction, Overview, Objectives, Constraints, Assumptions, and Hypotheses to be Tested:

Momentum. It has been documented many times. For the sake of brevity, I’ll let readers follow the links if they’re so inclined, but among them are Jegadeesh and Titman’s seminal 1993 paper, Mark Carhart’s 1997 paper, Andreu et. Al (2012), Barroso and Santa-Clara (2013), Ilmanen’s Expected Returns (which covers momentum), and others. This list, of course, is far from exhaustive, but the point stands. Formation periods of several months (up to a year) should predict returns moving forward on some holding period, be it several months, or as is more commonly seen, one month.

Furthermore, momentum applies in two varieties–cross sectional, and time-series. Cross-sectional momentum asserts that assets that outperformed among a group will continue to outperform, while time-series momentum asserts that assets that have risen in price during a formation period will continue to do so for the short-term future.

Cliff Smith’s strategy depends on the latter, effectively, among a group of five bond ETFs. I am not certain of the objective of the strategy (he didn’t mention it), as PCY, JNK, and CWB, while they may be fixed-income in name, possess volatility on the order of equities. I suppose one possible “default” objective would be to achieve an outperforming total return against an equal-weighted benchmark, both rebalanced monthly.

The constraints are that one would need a sufficient amount of capital such that fixed transaction costs are negligible, since the strategy is a single-instrument rotation type, meaning that each month may have two-way turnover of 200% (sell one ETF, buy another). On the other hand, one would assume that the amount of capital deployed is small enough such that execution costs of trading do not materially impact the performance of the strategy. That is to say, moving multiple billions from one of these ETFs to the other is a non-starter. As all returns are computed close-to-close for the sake of simplicity, this creates the implicit assumption that the market impact and execution costs are very small compared to overall returns.

There are two overarching hypotheses to be tested in order to validate the efficacy of this strategy:

1) Time-series momentum: while it has been documented for equities and even industry/country ETFs, it may not have been formally done so yet for fixed-income ETFs, and their corresponding mutual funds. In order to validate this strategy, it should be investigated if the particular instruments it selects adhere to the same phenomena.

2) Cross-sectional momentum: again, while this has been heavily demonstrated in the past with regards to equities, ETFs are fairly new, and of the five mutual funds Cliff Smith selected, the latest one only has data going back to 1997, thus allowing less sophisticated investors to easily access diversified fixed income markets a relatively new innovation.

Essentially, both of these can be tested over a range of parameters (1-24 months).

Another note: with hypothesis-driven strategy development, the backtest is to be *nothing more than a confirmation of all the hypotheses up to that point*. That is, re-optimizing on the backtest itself means overfitting. Any proposed change to a strategy should be done in the form of tested hypotheses, as opposed to running a bunch of backtests and selecting the best trials. Taken another way, this means that every single proposed element of a strategy needs to have some form of strong hypothesis accompanying it, in order to be justified.

So, here are the two hypotheses I tested on the corresponding mutual funds:

symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
  prices[[symbol]] <- Ad(get(symbol))
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
returns <- na.omit(Return.calculate(prices))

sample <- returns['1997-08/2009-03']
monthRets <- apply.monthly(sample, Return.cumulative)

returnRegression <- function(returns, nMonths) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAverage <- na.omit(lag(nMonthAverage))
  returns <- returns[index(nMonthAverage)]
  rankAvg <- t(apply(nMonthAverage, 1, rank))
  rankReturn <- t(apply(returns, 1, rank))
  meltedAverage <- melt(data.frame(nMonthAverage))
  meltedReturns <- melt(data.frame(returns))
  meltedRankAvg <- melt(data.frame(rankAvg))
  meltedRankReturn <- melt(data.frame(rankReturn))
  lmfit <- lm(meltedReturns$value ~ meltedAverage$value - 1)
  rankLmfit <- lm(meltedRankReturn$value ~ meltedRankAvg$value)
  return(rbind(summary(lmfit)$coefficients, summary(rankLmfit)$coefficients))

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

Essentially, in this case, I take a pooled regression (that is, take the five instruments and pool them together into one giant vector), and regress the cumulative sum of monthly returns against the next month’s return. Also, I do the same thing as the above, except also using cross-sectional ranks for each month, and performing a rank-rank regression. The sample I used was the five mutual funds (CNSAX, FAHDX, VUSTX, VFISX, and PREMX) since their inception to March 2009, since the data for the final ETF begins in April of 2009, so I set aside the ETF data for out-of-sample backtesting.

Here are the results:

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

plot(estimates, type='h', xlab = 'Months regressed on', ylab='momentum coefficient', 
     main='future returns regressed on past momentum')
plot(pvals, type='h', xlab='Months regressed on', ylab='p-value', main='momentum significance')
abline(h=.05, col='green')
abline(h=.1, col='red')

plot(rankEstimates, type='h', xlab='Months regressed on', ylab="Rank coefficient",
     main='future return ranks regressed on past momentum ranks', ylim=c(0,3))
plot(rankPs, type='h', xlab='Months regressed on', ylab='P-values')

Of interest to note is that while much of the momentum literature specifies a reversion effect on time-series momentum at 12 months or greater, all the regression coefficients in this case (even up to 24 months!) proved to be positive, with the very long-term coefficients possessing more statistical significance than the short-term ones. Nevertheless, Cliff Smith’s chosen parameters (the two and four month settings) possess statistical significance at least at the 10% level. However, if one were to be highly conservative in terms of rejecting strategies, that in and of itself may be reason enough to reject this strategy right here.

However, the rank-rank regression (that is, regressing the future month’s cross-sectional rank on the past n month sum cross sectional rank) proved to be statistically significant beyond any doubt, with all p-values being effectively zero. In short, there is extremely strong evidence for cross-sectional momentum among these five assets, which extends out to at least two years. Furthermore, since SHY or VFISX, aka the short-term treasury fund, is among the assets chosen, since it’s a proxy for the risk-free rate, by including it among the cross-sectional rankings, the cross-sectional rankings also implicitly state that in order to be invested into (as this strategy is a top-1 asset rotation strategy), it must outperform the risk-free asset, otherwise, by process of elimination, the strategy will invest into the risk-free asset itself.

In upcoming posts, I’ll look into testing hypotheses on signals and rules.

Lastly, Volatility Made Simple has just released a blog post on the performance of volatility-based strategies for the month of August. Given the massive volatility spike, the dispersion in performance of strategies is quite interesting. I’m happy that in terms of YTD returns, the modified version of my strategy is among the top 10 for the year.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

I’m Back, A New Harry Long Strategy, And Plans For Hypothesis-Driven Development

I’m back. Anyone that wants to know “what happened at Graham”, I felt there was very little scaffolding/on-boarding, and Graham’s expectations/requirements changed, though I have a reference from my direct boss, an accomplished quantitative director In any case, moving on.

Harry Long recently came out with a new strategy posted on SeekingAlpha, and I’d like to test it for robustness to see if it has merit.

Here’s the link to the post.

So, the rules are fairly simple:

ZIV 15%
SPLV 50%
TMF 10%
UUP 20%
VXX 5%

TMF can be approximated with a 3x leveraged TLT. SPLV is also highly similar to XLP — aka the consumer staples SPY sector. Here’s the equity curve comparison to prove it.

So, let’s test this thing.


getSymbols('XLP', from = '1900-01-01')
getSymbols('TLT', from = '1900-01-01')
getSymbols('UUP', from = '1900-01-01')
download('https://www.dropbox.com/s/jk3ortdyru4sg4n/ZIVlong.TXT', destfile='ZIVlong.csv')
download('https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT', destfile = 'VXXlong.csv')
ZIV <- xts(read.zoo('ZIVlong.csv', header=TRUE, sep=','))
VXX <- xts(read.zoo('VXXlong.csv', header=TRUE, sep=','))

symbols <- na.omit(cbind(Return.calculate(Cl(ZIV)), Return.calculate(Ad(XLP)), Return.calculate(Ad(TLT))*3,
                         Return.calculate(Ad(UUP)), Return.calculate(Cl(VXX))))
strat <- Return.portfolio(symbols, weights = c(.15, .5, .1, .2, .05), rebalance_on='years')

Here are the results:

compare <- na.omit(cbind(strat, Return.calculate(Ad(XLP))))
rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare))

Equity curve (compared against buy and hold XLP)


                          portfolio.returns XLP.Adjusted
Annualized Return                 0.0864000    0.0969000
Annualized Std Dev                0.0804000    0.1442000
Annualized Sharpe (Rf=0%)         1.0747000    0.6720000
Worst Drawdown                    0.1349957    0.3238755
Calmar Ratio                      0.6397665    0.2993100

In short, this strategy definitely offers a lot more bang for your risk in terms of drawdown, and volatility, and so, offers noticeably higher risk/reward tradeoffs. However, it’s not something that beats the returns of instruments in the category of twice its volatility.

Here are the statistics from 2010 onwards.

rbind(table.AnnualizedReturns(compare['2010::']), maxDrawdown(compare['2010::']), CalmarRatio(compare['2010::']))

                          portfolio.returns XLP.Adjusted
Annualized Return                0.12050000    0.1325000
Annualized Std Dev               0.07340000    0.1172000
Annualized Sharpe (Rf=0%)        1.64210000    1.1308000
Worst Drawdown                   0.07382878    0.1194072
Calmar Ratio                     1.63192211    1.1094371

Equity curve:

Definitely a smoother ride, and for bonus points, it seems some of the hedges helped with the recent market dip. Again, while aggregate returns aren’t as high as simply buying and holding XLP, the Sharpe and Calmar ratios do better on a whole.

Now, let’s do some robustness analysis. While I do not know how Harry Long arrived at the individual asset weights he did, what can be tested much more easily is what effect offsetting the rebalancing day has on the performance of the strategy. As this is a strategy rebalanced once a year, it can easily be tested for what effect the rebalancing date has on its performance.

yearlyEp <- endpoints(symbols, on = 'years')
rebalanceDays <- list()
for(i in 0:251) {
  offset <- yearlyEp+i
  offset[offset > nrow(symbols)] <- nrow(symbols)
  offset[offset==0] <- 1
  wts <- matrix(rep(c(.15, .5, .1, .2, .05), length(yearlyEp)), ncol=5, byrow=TRUE)
  wts <- xts(wts, order.by=as.Date(index(symbols)[offset]))
  offsetRets <- Return.portfolio(R = symbols, weights = wts)
  colnames(offsetRets) <- paste0("offset", i)
  rebalanceDays[[i+1]] <- offsetRets
rebalanceDays <- do.call(cbind, rebalanceDays)
rebalanceDays <- na.omit(rebalanceDays)
stats <- rbind(table.AnnualizedReturns(rebalanceDays), maxDrawdown(rebalanceDays))
stats[5,] <- stats[1,]/stats[4,]

Here are the plots of return, Sharpe, and Calmar vs. offset.

plot(as.numeric(stats[1,])~c(0:251), type='l', ylab='CAGR', xlab='offset', main='CAGR vs. offset')
plot(as.numeric(stats[3,])~c(0:251), type='l', ylab='Sharpe Ratio', xlab='offset', main='Sharpe vs. offset')
plot(as.numeric(stats[5,])~c(0:251), type='l', ylab='Calmar Ratio', xlab='offset', main='Calmar vs. offset')
plot(as.numeric(stats[4,])~c(0:251), type='l', ylab='Drawdown', xlab='offset', main='Drawdown vs. offset')

In short, this strategy seems to be somewhat dependent upon the rebalancing date, which was left unsaid. Here are the quantiles for the five statistics for the given offsets:

rownames(stats)[5] <- "Calmar"
apply(stats, 1, quantile)
     Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%) Worst Drawdown    Calmar
0%            0.072500             0.0802                  0.881000      0.1201198 0.4207922
25%           0.081925             0.0827                  0.987625      0.1444921 0.4755600
50%           0.087650             0.0837                  1.037250      0.1559238 0.5364758
75%           0.092000             0.0843                  1.090900      0.1744123 0.6230789
100%          0.105100             0.0867                  1.265900      0.1922916 0.8316698

While the standard deviation seems fairly robust, the Sharpe can decrease by about 33%, the Calmar can get cut in half, and the CAGR can also vary fairly substantially. That said, even using conservative estimates, the Sharpe ratio is fairly solid, and the Calmar outperforms that of XLP in any given variation, but nevertheless, performance can vary.

Is this strategy investible in its current state? Maybe, depending on your standards for rigor. Up to this point, rebalancing sometime in December-early January seems to substantially outperform other rebalance dates. Maybe a Dec/January anomaly effect exists in literature to justify this. However, the article makes no mention of that. Furthermore, the article doesn’t explain how it arrived at the weights it did.

Which brings me to my next topic, namely about a change with this blog going forward. Namely, hypothesis-driven trading system development. While this process doesn’t require complicated math, it does require statistical justification for multiple building blocks of a strategy, and a change in mindset, which a great deal of publicly available trading system ideas either gloss over, or omit entirely. As one of my most important readers praised this blog for “showing how the sausage is made”, this seems to be the next logical step in this progression.

Here’s the reasoning as to why.

It seems that when presenting trading ideas, there are two schools of thought: those that go off of intuition, build a backtest based off of that intuition, and see if it generally lines up with some intuitively expected result–and those that believe in a much more systematic, hypothesis-driven step-by-step framework, justifying as many decisions (ideally every decision) in creating a trading system. The advantage of the former is that it allows for displaying many more ideas in a much shorter timeframe. However, it has several major drawbacks: first off, it hides many concerns about potential overfitting. If what one sees is one final equity curve, there is nothing said about the sensitivity of said equity curve to however many various input parameters, and what other ideas were thrown out along the way. Secondly, without a foundation of strong hypotheses about the economic phenomena exploited, there is no proof that any strategy one comes across won’t simply fail once it’s put into live trading.

And third of all, which I find most important, is that such activities ultimately don’t sufficiently impress the industry’s best practitioners. For instance, Tony Cooper took issue with my replication of Trading The Odds’ volatility trading strategy, namely how data-mined it was (according to him in the comments section), and his objections seem to have been completely borne out by in out-of-sample performance.

So, for those looking for plug-and-crank system ideas, that may still happen every so often if someone sends me something particularly interesting, but there’s going to be some all-new content on this blog.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Momentum, Markowitz, and Solving Rank-Deficient Covariance Matrices — The Constrained Critical Line Algorithm

This post will feature the differences in the implementation of my constrained critical line algorithm with that of Dr. Clarence Kwan’s. The constrained critical line algorithm is a form of gradient descent that incorporates elements of momentum. My implementation includes a volatility-targeting binary search algorithm.

First off, rather than try and explain the algorithm piece by piece, I’ll defer to Dr. Clarence Kwan’s paper and excel spreadsheet, from where I obtained my original implementation. Since that paper and excel spreadsheet explains the functionality of the algorithm, I won’t repeat that process here. Essentially, the constrained critical line algorithm incorporates its lambda constraints into the structure of the covariance matrix itself. This innovation actually allows the algorithm to invert previously rank-deficient matrices.

Now, while Markowitz mean-variance optimization may be a bit of old news for some, the ability to use a short lookback for momentum with monthly data has allowed me and my two coauthors (Dr. Wouter Keller, who came up with flexible and elastic asset allocation, and Adam Butler, of GestaltU) to perform a backtest on a century’s worth of assets, with more than 30 assets in the backtest, despite using only a 12-month formation period. That paper can be found here.

Let’s look at the code for the function.

CCLA <- function(covMat, retForecast, maxIter = 1000, 
                 verbose = FALSE, scale = 252, 
                 weightLimit = .7, volThresh = .1) {
  if(length(retForecast) > length(unique(retForecast))) {
    sequentialNoise <- seq(1:length(retForecast)) * 1e-12
    retForecast <- retForecast + sequentialNoise
  #initialize original out/in/up status
  if(length(weightLimit) == 1) {
    weightLimit <- rep(weightLimit, ncol(covMat))
  rankForecast <- length(retForecast) - rank(retForecast) + 1
  remainingWeight <- 1 #have 100% of weight to allocate
  upStatus <- inStatus <- rep(0, ncol(covMat))
  i <- 1
  while(remainingWeight > 0) {
    securityLimit <- weightLimit[rankForecast == i]
    if(securityLimit < remainingWeight) {
      upStatus[rankForecast == i] <- 1 #if we can't invest all remaining weight into the security
      remainingWeight <- remainingWeight - securityLimit
    } else {
      inStatus[rankForecast == i] <- 1
      remainingWeight <- 0
    i <- i + 1
  #initial matrices (W, H, K, identity, negative identity)
  covMat <- as.matrix(covMat)
  retForecast <- as.numeric(retForecast)
  init_W <- cbind(2*covMat, rep(-1, ncol(covMat)))
  init_W <- rbind(init_W, c(rep(1, ncol(covMat)), 0))
  H_vec <- c(rep(0, ncol(covMat)), 1)
  K_vec <- c(retForecast, 0)
  negIdentity <- -1*diag(ncol(init_W))
  identity <- diag(ncol(init_W))
  matrixDim <- nrow(init_W)
  weightLimMat <- matrix(rep(weightLimit, matrixDim), ncol=ncol(covMat), byrow=TRUE)
  #out status is simply what isn't in or up
  outStatus <- 1 - inStatus - upStatus
  #initialize expected volatility/count/turning points data structure
  expVol <- Inf
  lambda <- 100
  count <- 0
  turningPoints <- list()
  while(lambda > 0 & count < maxIter) {
    #old lambda and old expected volatility for use with numerical algorithms
    oldLambda <- lambda
    oldVol <- expVol
    count <- count + 1
    #compute W, A, B
    inMat <- matrix(rep(c(inStatus, 1), matrixDim), nrow = matrixDim, byrow = TRUE)
    upMat <- matrix(rep(c(upStatus, 0), matrixDim), nrow = matrixDim, byrow = TRUE)
    outMat <- matrix(rep(c(outStatus, 0), matrixDim), nrow = matrixDim, byrow = TRUE)
    W <- inMat * init_W + upMat * identity + outMat * negIdentity
    inv_W <- solve(W)
    modified_H <- H_vec - rowSums(weightLimMat* upMat[,-matrixDim] * init_W[,-matrixDim])
    A_vec <- inv_W %*% modified_H
    B_vec <- inv_W %*% K_vec
    #remove the last elements from A and B vectors
    truncA <- A_vec[-length(A_vec)]
    truncB <- B_vec[-length(B_vec)]
    #compute in Ratio (aka Ratio(1) in Kwan.xls)
    inRatio <- rep(0, ncol(covMat))
    inRatio[truncB > 0] <- -truncA[truncB > 0]/truncB[truncB > 0]
    #compute up Ratio (aka Ratio(2) in Kwan.xls)
    upRatio <- rep(0, ncol(covMat))
    upRatioIndices <- which(inStatus==TRUE & truncB < 0)
    if(length(upRatioIndices) > 0) {
      upRatio[upRatioIndices] <- (weightLimit[upRatioIndices] - truncA[upRatioIndices]) / truncB[upRatioIndices]
    #find lambda -- max of up and in ratios
    maxInRatio <- max(inRatio)
    maxUpRatio <- max(upRatio)
    lambda <- max(maxInRatio, maxUpRatio)
    #compute new weights
    wts <- inStatus*(truncA + truncB * lambda) + upStatus * weightLimit + outStatus * 0
    #compute expected return and new expected volatility
    expRet <- t(retForecast) %*% wts
    expVol <- sqrt(wts %*% covMat %*% wts) * sqrt(scale)
    #create turning point data row and append it to turning points
    turningPoint <- cbind(count, expRet, lambda, expVol, t(wts))
    colnames(turningPoint) <- c("CP", "Exp. Ret.", "Lambda", "Exp. Vol.", colnames(covMat))
    turningPoints[[count]] <- turningPoint
    #binary search for volatility threshold -- if the first iteration is lower than the threshold,
    #then immediately return, otherwise perform the binary search until convergence of lambda
    if(oldVol == Inf & expVol < volThresh) {
      turningPoints <- do.call(rbind, turningPoints)
      threshWts <- tail(turningPoints, 1)
      return(list(turningPoints, threshWts))
    } else if(oldVol > volThresh & expVol < volThresh) {
      upLambda <- oldLambda
      dnLambda <- lambda
      meanLambda <- (upLambda + dnLambda)/2
      while(upLambda - dnLambda > .00001) {
        #compute mean lambda and recompute weights, expected return, and expected vol
        meanLambda <- (upLambda + dnLambda)/2
        wts <- inStatus*(truncA + truncB * meanLambda) + upStatus * weightLimit + outStatus * 0
        expRet <- t(retForecast) %*% wts
        expVol <- sqrt(wts %*% covMat %*% wts) * sqrt(scale)
        #if new expected vol is less than threshold, mean becomes lower bound
        #otherwise, it becomes the upper bound, and loop repeats
        if(expVol < volThresh) {
          dnLambda <- meanLambda
        } else {
          upLambda <- meanLambda
      #once the binary search completes, return those weights, and the corner points
      #computed until the binary search. The corner points aren't used anywhere, but they're there.
      threshWts <- cbind(count, expRet, meanLambda, expVol, t(wts))
      colnames(turningPoint) <- colnames(threshWts) <- c("CP", "Exp. Ret.", "Lambda", "Exp. Vol.", colnames(covMat))
      turningPoints[[count]] <- turningPoint
      turningPoints <- do.call(rbind, turningPoints)
      return(list(turningPoints, threshWts))
    #this is only run for the corner points during which binary search doesn't take place
    #change status of security that has new lambda
    if(maxInRatio > maxUpRatio) {
      inStatus[inRatio == maxInRatio] <- 1 - inStatus[inRatio == maxInRatio]
      upStatus[inRatio == maxInRatio] <- 0
    } else {
      upStatus[upRatio == maxUpRatio] <- 1 - upStatus[upRatio == maxUpRatio]
      inStatus[upRatio == maxUpRatio] <- 0
    outStatus <- 1 - inStatus - upStatus
  #we only get here if the volatility threshold isn't reached
  #can actually happen if set sufficiently low
  turningPoints <- do.call(rbind, turningPoints)
  threshWts <- tail(turningPoints, 1)
  return(list(turningPoints, threshWts))

Essentially, the algorithm can be divided into three parts:

The first part is the initialization, which does the following:

It creates three status vectors: in, up, and out. The up vector denotes which securities are at their weight constraint cap, the in status are securities that are not at their weight cap, and the out status are securities that receive no weighting on that iteration of the algorithm.

The rest of the algorithm essentially does the following:

It takes a gradient descent approach by changing the status of the security that minimizes lambda, which by extension minimizes the volatility at the local point. As long as lambda is greater than zero, the algorithm continues to iterate. Letting the algorithm run until convergence effectively provides the volatility-minimization portfolio on the efficient frontier.

However, one change that Dr. Keller and I made to it is the functionality of volatility targeting, allowing the algorithm to stop between iterations. As the SSRN paper shows, a higher volatility threshold, over the long run (the *VERY* long run) will deliver higher returns.

In any case, the algorithm takes into account several main arguments:

A return forecast, a covariance matrix, a volatility threshold, and weight limits, which can be either one number that will result in a uniform weight limit, or a per-security weight limit. Another argument is scale, which is 252 for days, 12 for months, and so on. Lastly, there is a volatility threshold component, which allows the user to modify how aggressive or conservative the strategy can be.

In any case, to demonstrate this function, let’s run a backtest. The idea in this case will come from a recent article published by Frank Grossmann from SeekingAlpha, in which he obtained a 20% CAGR but with a 36% max drawdown.

So here’s the backtest:

symbols <- c("AFK", "ASHR", "ECH", "EGPT",
             "EIDO", "EIRL", "EIS", "ENZL",
             "EPHE", "EPI", "EPOL", "EPU",
             "EWA", "EWC", "EWD", "EWG",
             "EWH", "EWI", "EWJ", "EWK",
             "EWL", "EWM", "EWN", "EWO",
             "EWP", "EWQ", "EWS", "EWT",
             "EWU", "EWW", "EWY", "EWZ",
             "EZA", "FM", "FRN", "FXI",
             "GAF", "GULF", "GREK", "GXG",
             "IDX", "MCHI", "MES", "NORW",
             "QQQ", "RSX", "THD", "TUR",
             "VNM", "TLT"

getSymbols(symbols, from = "2003-01-01")

prices <- list()
entryRets <- list()
for(i in 1:length(symbols)) {
  prices[[i]] <- Ad(get(symbols[i]))
prices <- do.call(cbind, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))

returns <- Return.calculate(prices)
returns <- returns[-1,]

sumIsNa <- function(col) {

appendZeroes <- function(selected, originalSetNames) {
  zeroes <- rep(0, length(originalSetNames) - length(selected))
  names(zeroes) <- originalSetNames[!originalSetNames %in% names(selected)]
  all <- c(selected, zeroes)
  all <- all[originalSetNames]

computeStats <- function(rets) {
  stats <- rbind(table.AnnualizedReturns(rets), maxDrawdown(rets), CalmarRatio(rets))
  return(round(stats, 3))

CLAAbacktest <- function(returns, lookback = 3, volThresh = .1, assetCaps = .5, tltCap = 1,
                         returnWeights = FALSE, useTMF = FALSE) {
  if(useTMF) {
    returns$TLT <- returns$TLT * 3
  ep <- endpoints(returns, on = "months")
  weights <- list()
  for(i in 2:(length(ep) - lookback)) {
    retSubset <- returns[(ep[i]+1):ep[i+lookback],]
    retNAs <- apply(retSubset, 2, sumIsNa)
    validRets <- retSubset[, retNAs==0]
    retForecast <- Return.cumulative(validRets)
    covRets <- cov(validRets)
    weightLims <- rep(assetCaps, ncol(covRets))
    weightLims[colnames(covRets)=="TLT"] <- tltCap
    weight <- CCLA(covMat = covRets, retForecast = retForecast, weightLimit = weightLims, volThresh = volThresh)
    weight <- weight[[2]][,5:ncol(weight[[2]])]
    weight <- appendZeroes(selected = weight, colnames(retSubset))
    weight <- xts(t(weight), order.by=last(index(validRets)))
    weights[[i]] <- weight
  weights <- do.call(rbind, weights)
  stratRets <- Return.portfolio(R = returns, weights = weights)
  if(returnWeights) {
    return(list(weights, stratRets))

In essence, we take the returns over a specified monthly lookback period, specify a volatility threshold, specify asset caps, specify the bond asset cap, and whether or not we wish to use TLT or TMF (a 3x leveraged variant, which just multiplies all returns of TLT by 3, for simplicity). The output of the CCLA (Constrained Critical Line Algorithm) is a list that contains the corner points, and the volatility threshold corner point which contains the corner point number, expected return, expected volatility, and the lambda value. So, we want the fifth element onward of the second element of the list.

Here are some results:

config1 <- CLAAbacktest(returns = returns)
config2 <- CLAAbacktest(returns = returns, useTMF = TRUE)
config3 <- CLAAbacktest(returns = returns, lookback = 4)
config4 <- CLAAbacktest(returns = returns, lookback = 2, useTMF = TRUE)

comparison <- na.omit(cbind(config1, config2, config3, config4))
colnames(comparison) <- c("Default", "TMF instead of TLT", "Lookback 4", "Lookback 2 and TMF")

With the following statistics:

> computeStats(comparison)
                          Default TMF instead of TLT Lookback 4 Lookback 2 and TMF
Annualized Return           0.137              0.146      0.133              0.138
Annualized Std Dev          0.126              0.146      0.125              0.150
Annualized Sharpe (Rf=0%)   1.081              1.000      1.064              0.919
Worst Drawdown              0.219              0.344      0.186              0.357
Calmar Ratio                0.625              0.424      0.714              0.386

The variants that use TMF instead of TLT suffer far worse drawdowns. Not much of a hedge, apparently.

Here’s the equity curve:

Taking the 4 month lookback configuration (strongest Calmar), we’ll play around with the volatility setting.

Here’s the backtest:

config5 <- CLAAbacktest(returns = returns, lookback = 4, volThresh = .15)
config6 <- CLAAbacktest(returns = returns, lookback = 4, volThresh = .2)

comparison2 <- na.omit(cbind(config3, config5, config6))
colnames(comparison2) <- c("Vol10", "Vol15", "Vol20")

With the results:

> computeStats(comparison2)
                          Vol10 Vol15 Vol20
Annualized Return         0.133 0.153 0.180
Annualized Std Dev        0.125 0.173 0.204
Annualized Sharpe (Rf=0%) 1.064 0.886 0.882
Worst Drawdown            0.186 0.212 0.273
Calmar Ratio              0.714 0.721 0.661

In this case, more risk, more reward, lower risk/reward ratios as you push the volatility threshold. So for once, the volatility puzzle doesn’t rear its head, and higher risk indeed does translate to higher returns (at the cost of everything else, though).

Here’s the equity curve.

Lastly, let’s try toggling the asset cap limits with the vol threshold back at 10.

config7 <- CLAAbacktest(returns = returns, lookback = 4, assetCaps = .1)
config8 <- CLAAbacktest(returns = returns, lookback = 4, assetCaps = .25)
config9 <- CLAAbacktest(returns = returns, lookback = 4, assetCaps = 1/3)
config10 <- CLAAbacktest(returns = returns, lookback = 4, assetCaps = 1)

comparison3 <- na.omit(cbind(config7, config8, config9, config3, config10))
colnames(comparison3) <- c("Cap10", "Cap25", "Cap33", "Cap50", "Uncapped")

With the resulting statistics:

> computeStats(comparison3)
                          Cap10 Cap25 Cap33 Cap50 Uncapped
Annualized Return         0.124 0.122 0.127 0.133    0.134
Annualized Std Dev        0.118 0.122 0.123 0.125    0.126
Annualized Sharpe (Rf=0%) 1.055 1.002 1.025 1.064    1.070
Worst Drawdown            0.161 0.185 0.186 0.186    0.186
Calmar Ratio              0.771 0.662 0.680 0.714    0.721

Essentially, in this case, there was very little actual change from simply tweaking weight limits. Here’s an equity curve:

To conclude, while not exactly achieving the same aggregate returns or Sharpe ratio that the SeekingAlpha article did, it did highlight a probable cause of its major drawdown, and also demonstrated the levers of how to apply the constrained critical line algorithm, the mechanics of which are detailed in the papers linked to earlier.

Thanks for reading.