A Return.Portfolio Wrapper to Automate Harry Long Seeking Alpha Backtests

This post will cover a function to simplify creating Harry Long type rebalancing strategies from SeekingAlpha for interested readers. As Harry Long has stated, most, if not all of his strategies are more for demonstrative purposes rather than actual recommended investments.

So, since Harry Long has been posting some more articles on Seeknig Alpha, I’ve had a reader or two ask me to analyze his strategies (again). Instead of doing that, however, I’ll simply put this tool here, which is a wrapper that automates the acquisition of data and simulates portfolio rebalancing with one line of code.

Here’s the tool.

require(quantmod)
require(PerformanceAnalytics)
require(downloader)

LongSeeker <- function(symbols, weights, rebalance_on = "years", 
                       displayStats = TRUE, outputReturns = FALSE) {
  getSymbols(symbols, src='yahoo', from = '1990-01-01')
  prices <- list()
  for(i in 1:length(symbols)) {
    if(symbols[i] == "ZIV") {
      download("https://www.dropbox.com/s/jk3ortdyru4sg4n/ZIVlong.TXT", destfile="ziv.txt")
      ziv <- xts(read.zoo("ziv.txt", header=TRUE, sep=",", format="%Y-%m-%d"))
      prices[[i]] <- Cl(ziv)
    } else if (symbols[i] == "VXX") {
      download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
               destfile="vxx.txt")
      vxx <- xts(read.zoo("vxx.txt", header=TRUE, sep=",", format="%Y-%m-%d"))
      prices[[i]] <- Cl(vxx)
    }
    else {
      prices[[i]] <- Ad(get(symbols[i]))
    }
  }
  prices <- do.call(cbind, prices)
  prices <- na.locf(prices)
  returns <- na.omit(Return.calculate(prices))
  
  returns$zeroes <- 0
  weights <- c(weights, 1-sum(weights))
  stratReturns <- Return.portfolio(R = returns, weights = weights, rebalance_on = rebalance_on)
  
  if(displayStats) {
    stats <- rbind(table.AnnualizedReturns(stratReturns), maxDrawdown(stratReturns), CalmarRatio(stratReturns))
    rownames(stats)[4] <- "Max Drawdown"
    print(stats)
    charts.PerformanceSummary(stratReturns)
  }
  
  if(outputReturns) {
    return(stratReturns)
  }
} 

It fetches the data for you (usually from Yahoo, but a big thank you to Mr. Helumth Vollmeier in the case of ZIV and VXX), and has the option of either simply displaying an equity curve and some statistics (CAGR, annualized standard dev, Sharpe, max Drawdown, Calmar), or giving you the return stream as an output if you wish to do more analysis in R.

Here’s an example of simply getting the statistics, with an 80% XLP/SPLV (they’re more or less interchangeable) and 20% TMF (aka 60% TLT, so an 80/60 portfolio), from one of Harry Long’s articles.

LongSeeker(c("XLP", "TLT"), c(.8, .6))

Statistics:


                          portfolio.returns
Annualized Return                 0.1321000
Annualized Std Dev                0.1122000
Annualized Sharpe (Rf=0%)         1.1782000
Max Drawdown                      0.2330366
Calmar Ratio                      0.5670285

Equity curve:

Nothing out of the ordinary of what we might expect from a balanced equity/bonds portfolio. Generally does well, has its largest drawdown in the financial crisis, and some other bumps in the road, but overall, I’d think a fairly vanilla “set it and forget it” sort of thing.

And here would be the way to get the stream of individual daily returns, assuming you wanted to rebalance these two instruments weekly, instead of yearly (as is the default).

tmp <- LongSeeker(c("XLP", "TLT"), c(.8, .6), rebalance_on="weeks",
                    displayStats = FALSE, outputReturns = TRUE)

And now let’s get some statistics.

table.AnnualizedReturns(tmp)
maxDrawdown(tmp)
CalmarRatio(tmp)

Which give:

> table.AnnualizedReturns(tmp)
                          portfolio.returns
Annualized Return                    0.1328
Annualized Std Dev                   0.1137
Annualized Sharpe (Rf=0%)            1.1681
> maxDrawdown(tmp)
[1] 0.2216417
> CalmarRatio(tmp)
             portfolio.returns
Calmar Ratio         0.5990087

Turns out, moving the rebalancing from annually to weekly didn’t have much of an effect here (besides give a bunch of money to your broker, if you factored in transaction costs, which this doesn’t).

So, that’s how this tool works. The results, of course, begin from the latest instrument’s inception. The trick, in my opinion, is to try and find proxy substitutes with longer histories for newer ETFs that are simply leveraged ETFs, such as using a 60% weight in TLT with an 80% weight in XLP instead of a 20% weight in TMF with 80% allocation in SPLV.

For instance, here are some proxies:

SPXL = XLP
SPXL/UPRO = SPY * 3
TMF = TLT * 3

That said, I’ve worked with Harry Long before, and he develops more sophisticated strategies behind the scenes, so I’d recommend that SeekingAlpha readers take his publicly released strategies as concept demonstrations, as opposed to fully-fledged investment ideas, and contact Mr. Long himself about more customized, private solutions for investment institutions if you are so interested.

Thanks for reading.

NOTE: I am currently in the northeast. While I am currently contracting, I am interested in networking with individuals or firms with regards to potential collaboration opportunities.

Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results)

This post will outline an easy-to-make mistake in writing vectorized backtests–namely in using a signal obtained at the end of a period to enter (or exit) a position in that same period. The difference in results one obtains is massive.

Today, I saw two separate posts from Alpha Architect and Mike Harris both referencing a paper by Valeriy Zakamulin on the fact that some previous trend-following research by Glabadanidis was done with shoddy results, and that Glabadanidis’s results were only reproducible through instituting lookahead bias.

The following code shows how to reproduce this lookahead bias.

First, the setup of a basic moving average strategy on the S&P 500 index from as far back as Yahoo data will provide.

require(quantmod)
require(xts)
require(TTR)
require(PerformanceAnalytics)

getSymbols('^GSPC', src='yahoo', from = '1900-01-01')
monthlyGSPC <- Ad(GSPC)[endpoints(GSPC, on = 'months')]

# change this line for signal lookback
movAvg <- SMA(monthlyGSPC, 10)

signal <- monthlyGSPC > movAvg
gspcRets <- Return.calculate(monthlyGSPC)

And here is how to institute the lookahead bias.

lookahead <- signal * gspcRets
correct <- lag(signal) * gspcRets

These are the “results”:

compare <- na.omit(cbind(gspcRets, lookahead, correct))
colnames(compare) <- c("S&P 500", "Lookahead", "Correct")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), maxDrawdown(compare), CalmarRatio(compare))
logRets <- log(cumprod(1+compare))
chart.TimeSeries(logRets, legend.loc='topleft')

Of course, this equity curve is of no use, so here’s one in log scale.

As can be seen, lookahead bias makes a massive difference.

Here are the numerical results:

                            S&P 500  Lookahead   Correct
Annualized Return         0.0740000 0.15550000 0.0695000
Annualized Std Dev        0.1441000 0.09800000 0.1050000
Annualized Sharpe (Rf=0%) 0.5133000 1.58670000 0.6623000
Worst Drawdown            0.5255586 0.08729914 0.2699789
Calmar Ratio              0.1407286 1.78119192 0.2575219

Again, absolutely ridiculous.

Note that when using Return.Portfolio (the function in PerformanceAnalytics), that package will automatically give you the next period’s return, instead of the current one, for your weights. However, for those writing “simple” backtests that can be quickly done using vectorized operations, an off-by-one error can make all the difference between a backtest in the realm of reasonable, and pure nonsense. However, should one wish to test for said nonsense when faced with impossible-to-replicate results, the mechanics demonstrated above are the way to do it.

Now, onto other news: I’d like to thank Gerald M for staying on top of one of the Logical Invest strategies–namely, their simple global market rotation strategy outlined in an article from an earlier blog post.

Up until March 2015 (the date of the blog post), the strategy had performed well. However, after said date?

It has been a complete disaster, which, in hindsight, was evident when I passed it through the hypothesis-driven development framework process I wrote about earlier.

So, while there has been a great deal written about not simply throwing away a strategy because of short-term underperformance, and that anomalies such as momentum and value exist because of career risk due to said short-term underperformance, it’s never a good thing when a strategy creates historically large losses, particularly after being published in such a humble corner of the quantitative financial world.

In any case, this was a post demonstrating some mechanics, and an update on a strategy I blogged about not too long ago.

Thanks for reading.

NOTE: I am always interested in hearing about new opportunities which may benefit from my expertise, and am always happy to network. You can find my LinkedIn profile here.

How well can you scale your strategy?

This post will deal with a quick, finger in the air way of seeing how well a strategy scales–namely, how sensitive it is to latency between signal and execution, using a simple volatility trading strategy as an example. The signal will be the VIX/VXV ratio trading VXX and XIV, an idea I got from Volatility Made Simple’s amazing blog, particularly this post. The three signals compared will be the “magical thinking” signal (observe the close, buy the close, named from the ruleOrderProc setting in quantstrat), buy on next-day open, and buy on next-day close.

Let’s get started.

require(downloader)
require(PerformanceAnalytics)
require(IKTrading)
require(TTR)

download("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxvdailyprices.csv", 
         destfile="vxvData.csv")
download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
         destfile="longXIV.txt")
download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
         destfile="longVXX.txt") #requires downloader package
getSymbols('^VIX', from = '1990-01-01')


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vixVxv <- Cl(VIX)/Cl(vxv)


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

vxxCloseRets <- Return.calculate(Cl(vxx))
vxxOpenRets <- Return.calculate(Op(vxx))
xivCloseRets <- Return.calculate(Cl(xiv))
xivOpenRets <- Return.calculate(Op(xiv))

vxxSig <- vixVxv > 1
xivSig <- 1-vxxSig

magicThinking <- vxxCloseRets * lag(vxxSig) + xivCloseRets * lag(xivSig)
nextOpen <- vxxOpenRets * lag(vxxSig, 2) + xivOpenRets * lag(xivSig, 2)
nextClose <- vxxCloseRets * lag(vxxSig, 2) + xivCloseRets * lag(xivSig, 2)
tradeWholeDay <- (nextOpen + nextClose)/2

compare <- na.omit(cbind(magicThinking, nextOpen, nextClose, tradeWholeDay))
colnames(compare) <- c("Magic Thinking", "Next Open", 
                       "Next Close", "Execute Through Next Day")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), 
      maxDrawdown(compare), CalmarRatio(compare))

par(mfrow=c(1,1))
chart.TimeSeries(log(cumprod(1+compare), base = 10), legend.loc='topleft', ylab='log base 10 of additional equity',
                 main = 'VIX vx. VXV different execution times')

So here’s the run-through. In addition to the magical thinking strategy (observe the close, buy that same close), I tested three other variants–a variant which transacts the next open, a variant which transacts the next close, and the average of those two. Effectively, I feel these three could give a sense of a strategy’s performance under more realistic conditions–that is, how well does the strategy perform if transacted throughout the day, assuming you’re managing a sum of money too large to just plow into the market in the closing minutes (and if you hope to get rich off of trading, you will have a larger sum of money than the amount you can apply magical thinking to). Ideally, I’d use VWAP pricing, but as that’s not available for free anywhere I know of, that means that readers can’t replicate it even if I had such data.

In any case, here are the results.

Equity curves:

Log scale (for Mr. Tony Cooper and others):

Stats:

                          Magic Thinking Next Open Next Close Execute Through Next Day
Annualized Return               0.814100 0.8922000  0.5932000                 0.821900
Annualized Std Dev              0.622800 0.6533000  0.6226000                 0.558100
Annualized Sharpe (Rf=0%)       1.307100 1.3656000  0.9529000                 1.472600
Worst Drawdown                  0.566122 0.5635336  0.6442294                 0.601014
Calmar Ratio                    1.437989 1.5831686  0.9208586                 1.367510

My reaction? The execute on next day’s close performance being vastly lower than the other configurations (and that deterioration occurring in the most recent years) essentially means that the fills will have to come pretty quickly at the beginning of the day. While the strategy seems somewhat scalable through the lens of this finger-in-the-air technique, in my opinion, if the first full day of possible execution after signal reception will tank a strategy from a 1.44 Calmar to a .92, that’s a massive drop-off, after holding everything else constant. In my opinion, I think this is quite a valid question to ask anyone who simply sells signals, as opposed to manages assets. Namely, how sensitive are the signals to execution on the next day? After all, unless those signals come at 3:55 PM, one is most likely going to be getting filled the next day.

Now, while this strategy is a bit of a tomato can in terms of how good volatility trading strategies can get (they can get a *lot* better in my opinion), I think it made for a simple little demonstration of this technique. Again, a huge thank you to Mr. Helmuth Vollmeier for so kindly keeping up his dropbox all this time for the volatility data!

Thanks for reading.

NOTE: I am currently contracting in a data science capacity in Chicago. You can email me at ilya.kipnis@gmail.com, or find me on my LinkedIn here. I’m always open to beers after work if you’re in the Chicago area.

NOTE 2: Today, on October 21, 2015, if you’re in Chicago, there’s a Chicago R Users Group conference at Jaks Tap at 6:00 PM. Free pizza, networking, and R, hosted by Paul Teetor, who’s a finance guy. Hope to see you there.

Hypothesis-Driven Development Part V: Stop-Loss, Deflating Sharpes, and Out-of-Sample

This post will demonstrate a stop-loss rule inspired by Andrew Lo’s paper “when do stop-loss rules stop losses”? Furthermore, it will demonstrate how to deflate a Sharpe ratio to account for the total number of trials conducted, which is presented in a paper written by David H. Bailey and Marcos Lopez De Prado. Lastly, the strategy will be tested on the out-of-sample ETFs, rather than the mutual funds that have been used up until now (which actually cannot be traded more than once every two months, but have been used simply for the purpose of demonstration).

First, however, I’d like to fix some code from the last post and append some results.

A reader asked about displaying the max drawdown for each of the previous rule-testing variants based off of volatility control, and Brian Peterson also recommended displaying max leverage, which this post will provide.

Here’s the updated rule backtest code:

ruleBacktest <- function(returns, nMonths, dailyReturns,
nSD=126, volTarget = .1) {
nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
nMonthAverage <- na.omit(xts(nMonthAverage, order.by = index(returns)))
nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(nMonthAverage))
selection <- (nMonthAvgRank==5) * 1 #select highest average performance
dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
constantVolComponents <- cbind(dailyBacktest, 0)
out <- Return.portfolio(R = constantVolComponents, weights = wts)
out <- apply.monthly(out, Return.cumulative)
maxLeverage <- max(monthlyLeverage, na.rm = TRUE)
return(list(out, maxLeverage))
}

t1 <- Sys.time()
allPermutations <- list()
allDDs <- list()
leverages <- list()
for(i in seq(21, 252, by = 21)) {
monthVariants <- list()
ddVariants <- list()
leverageVariants <- list()
for(j in 1:12) {
trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
sharpe <- table.AnnualizedReturns(trial[[1]])[3,]
dd <- maxDrawdown(trial[[1]])
monthVariants[[j]] <- sharpe
ddVariants[[j]] <- dd
leverageVariants[[j]] <- trial[[2]]
}
allPermutations[[i]] <- do.call(c, monthVariants)
allDDs[[i]] <- do.call(c, ddVariants)
leverages[[i]] <- do.call(c, leverageVariants)
}
allPermutations <- do.call(rbind, allPermutations)
allDDs <- do.call(rbind, allDDs)
leverages <- do.call(rbind, leverages)
t2 <- Sys.time()
print(t2-t1)

Drawdowns:

Leverage:

Here are the results presented as a hypothesis test–a linear regression of drawdowns and leverage against momentum formation period and volatility calculation period:

ddLM <- lm(meltedDDs$MaxDD~meltedDDs$volFormation + meltedDDs$momentumFormation)
summary(ddLM)

Call:
lm(formula = meltedDDs$MaxDD ~ meltedDDs$volFormation + meltedDDs$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.08022 -0.03434 -0.00135 0.02911 0.20077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.240146 0.010922 21.99 < 2e-16 ***
meltedDDs$volFormation -0.000484 0.000053 -9.13 6.5e-16 ***
meltedDDs$momentumFormation 0.001533 0.001112 1.38 0.17
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0461 on 141 degrees of freedom
Multiple R-squared: 0.377, Adjusted R-squared: 0.368
F-statistic: 42.6 on 2 and 141 DF, p-value: 3.32e-15

levLM <- lm(meltedLeverage$MaxLeverage~meltedLeverage$volFormation + meltedDDs$momentumFormation)
summary(levLM)

Call:
lm(formula = meltedLeverage$MaxLeverage ~ meltedLeverage$volFormation +
meltedDDs$momentumFormation)

Residuals:
Min 1Q Median 3Q Max
-0.9592 -0.5179 -0.0908 0.3679 3.1022

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.076870 0.164243 24.82 <2e-16 ***
meltedLeverage$volFormation -0.009916 0.000797 -12.45 <2e-16 ***
meltedDDs$momentumFormation 0.009869 0.016727 0.59 0.56
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.693 on 141 degrees of freedom
Multiple R-squared: 0.524, Adjusted R-squared: 0.517
F-statistic: 77.7 on 2 and 141 DF, p-value: <2e-16

Easy interpretation here–the shorter-term volatility estimates are unstable due to the one-asset rotation nature of the system. Particularly silly is using the one-month volatility estimate. Imagine the system just switched from the lowest-volatility instrument to the highest. It would then take excessive leverage and get blown up that month for no particularly good reason. A longer-term volatility estimate seems to do much better for this system. So, while the Sharpe is generally improved, the results become far more palatable when using a more stable calculation for volatility, which sets maximum leverage to about 2 when targeting an annualized volatility of 10%. Also, to note, the period to compute volatility matters far more than the momentum formation period when addressing volatility targeting, which lends credence (at least in this case) to so many people that say “the individual signal rules matter far less than the position-sizing rules!”. According to some, position sizing is often a way for people to mask only marginally effective (read: bad) strategies with a separate layer to create a better result. I’m not sure which side of the debate (even assuming there is one) I fall upon, but for what it’s worth, there it is.

Moving on, I want to test out one more rule, which is inspired by Andrew Lo’s stop-loss rule. Essentially, the way it works is this (to my interpretation): it evaluates a running standard deviation, and if the drawdown exceeds some threshold of the running standard deviation, to sit out for some fixed period of time, and then re-enter. According to Andrew Lo, stop-losses help momentum strategies, so it seems as good a rule to test as any.

However, rather than test different permutations of the stop rule on all 144 prior combinations of volatility-adjusted configurations, I’m going to take an ensemble strategy, inspired by a conversation I had with Adam Butler, the CEO of ReSolve Asset Management, who stated that “we know momentum exists, but we don’t know the perfect way to measure it”, from the section I just finished up and use an equal weight of all 12 of the momentum formation periods with a 252-day rolling annualized volatility calculation, and equal weight them every month.

Here are the base case results from that trial (bringing our total to 169).

strat <- list()
for(i in 1:12) {
strat[[i]] <- ruleBacktest(returns = monthRets, nMonths = i, dailyReturns = sample, nSD = 252)[[1]]
}
strat <- do.call(cbind, strat)
strat <- Return.portfolio(R = na.omit(strat), rebalance_on="months")

rbind(table.AnnualizedReturns(strat), maxDrawdown(strat), CalmarRatio(strat))

With the following result:

portfolio.returns
Annualized Return 0.12230
Annualized Std Dev 0.10420
Annualized Sharpe (Rf=0%) 1.17340
Worst Drawdown 0.09616
Calmar Ratio 1.27167

Of course, also worth nothing is that the annualized standard deviation is indeed very close to 10%, even with the ensemble. And it’s nice that there is a Sharpe past 1. Of course, given that these are mutual funds being backtested, these results are optimistic due to the unrealistic execution assumptions (can’t trade sooner than once every *two* months).

Anyway, let’s introduce our stop-loss rule, inspired by Andrew Lo’s paper.

loStopLoss <- function(returns, sdPeriod = 12, sdScaling = 1, sdThresh = 1.5, cooldown = 3) {
stratRets <- list()
count <- 1
stratComplete <- FALSE
originalRets <- returns
ddThresh <- -runSD(returns, n = sdPeriod) * sdThresh * sdScaling
while(!stratComplete) {
retDD <- PerformanceAnalytics:::Drawdowns(returns)
DDbreakthrough <- retDD < ddThresh & lag(retDD) > ddThresh
firstBreak <- which.max(DDbreakthrough) #first threshold breakthrough, if 1, we have no breakthrough
#the above line is unintuitive since this is a boolean vector, so it returns the first value of TRUE
if(firstBreak > 1) { #we have a drawdown breakthrough if this is true
stratRets[[count]] <- returns[1:firstBreak,] #subset returns through our threshold breakthrough
nextPoint <- firstBreak + cooldown + 1 #next point of re-entry is the point after the cooldown period
if(nextPoint <= (nrow(returns)-1)) { #if we can re-enter, subset the returns and return to top of loop
returns <- returns[nextPoint:nrow(returns),]
ddThresh <- ddThresh[nextPoint:nrow(ddThresh),]
count <- count+1
} else { #re-entry point is after data exhausted, end strategy
stratComplete <- TRUE
}
} else { #there are no more critical drawdown breakthroughs, end strategy
stratRets[[count]] <- returns
stratComplete <- TRUE
}
}
stratRets <- do.call(rbind, stratRets) #combine returns
expandRets <- cbind(stratRets, originalRets) #account for all the days we missed
expandRets[is.na(expandRets[,1]), 1] <- 0 #cash positions will be zero
rets <- expandRets[,1]
colnames(rets) <- paste(cooldown, sdThresh, sep="_")
return(rets)
}

Essentially, the way it works is like this: the function computes all the drawdowns for a return series, along with its running standard deviation (non-annualized–if you want to annualize it, change the sdScaling parameter to something like sqrt(12) for monthly or sqrt(252) for daily data). Next, it looks for when the drawdown crossed a critical threshold, then cuts off that portion of returns and standard deviation history, and moves ahead in history by the cooldown period specified, and repeats. Most of the code is simply dealing with corner cases (is there even a time to use the stop rule? What about iterating when there isn’t enough data left?), and then putting the results back together again.

In any case, for the sake of simplicity, this function doesn’t use two different time scales (IE compute volatility using daily data, make decisions monthly), so I’m sticking with using a 12-month rolling volatility, as opposed to 252 day rolling volatility multiplied by the square root of 21.

Finally, here are another 54 runs to see if Andrew Lo’s stop-loss rule works here. Essentially, the intuition behind this is that if the strategy breaks down, it’ll continue to break down, so it would be prudent to just turn it off for a little while.

Here are the trial runs:


threshVec <- seq(0, 2, by=.25)
cooldownVec <- c(1:6)
sharpes <- list()
params <- expand.grid(threshVec, cooldownVec)
for(i in 1:nrow(params)) {
configuration <- loStopLoss(returns = strat, sdThresh = params[i,1],
cooldown = params[i, 2])
sharpes[[i]] <- table.AnnualizedReturns(configuration)[3,]
}
sharpes <- do.call(c, sharpes)

loStoplossFrame <- cbind(params, sharpes)
loStoplossFrame$improvement <- loStoplossFrame[,3] - table.AnnualizedReturns(strat)[3,]

colnames(loStoplossFrame) <- c("Threshold", "Cooldown", "Sharpe", "Improvement")

And a plot of the results.

ggplot(loStoplossFrame, aes(x = Threshold, y = Cooldown, fill=Improvement)) +
geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = 0)

Result:

Result: at this level, and at this frequency (retaining the monthly decision-making process), the stop-loss rule basically does nothing in order to improve the risk-reward trade-off in the best case scenarios, and in most scenarios, simply hurts. 54 trials down the drain, bringing us up to 223 trials. So, what does the final result look like?

charts.PerformanceSummary(strat)

Here’s the final in-sample equity curve–and the first one featured in this entire series. This is, of course, a *feature* of hypothesis-driven development. Playing whack-a-mole with equity curve bumps is what is a textbook case of overfitting. So, without further ado:

And now we can see why stop-loss rules generally didn’t add any value to this strategy. Simply, it had very few periods of sustained losses at the monthly frequency, and thus, very little opportunity for a stop-loss rule to add value. Sure, the occasional negative month crept in, but there was no period of sustained losses. Furthermore, Yahoo Finance may not have perfect fidelity on dividends on mutual funds from the late 90s to early 2000s, so the initial flat performance may also be a rather conservative estimate on the strategy’s performance (then again, as I stated before, using mutual funds themselves is optimistic given the unrealistic execution assumptions, so maybe it cancels out). Now, if this equity curve were to be presented without any context, one may easily question whether or not it was curve-fit. To an extent, one can argue that the volatility computation period may be optimized, though I’d hardly call a 252-day (one-year) rolling volatility estimate a curve-fit.

Next, I’d like to introduce another concept on this blog that I’ve seen colloquially addressed in other parts of the quantitative blogging space, particularly by Mike Harris of Price Action Lab, namely that of multiple hypothesis testing, and about the need to correct for that.

Luckily for that, Drs. David H. Bailey and Marcos Lopez De Prado wrote a paper to address just that. Also, I’d like to note one very cool thing about this paper: it actually has a worked-out numerical example! In my opinion, there are very few things as helpful as showing a simple result that transforms a collection of mathematical symbols into a result to demonstrate what those symbols actually mean in the span of one page. Oh, and it also includes *code* in the appendix (albeit Python — even though, you know, R is far more developed. If someone can get Marcos Lopez De Prado to switch to R–aka the better research language, that’d be a godsend!).

In any case, here’s the formula for the deflated Sharpe ratio, implemented straight from the paper.

deflatedSharpe <- function(sharpe, nTrials, varTrials, skew, kurt, numPeriods, periodsInYear) {
emc <- .5772
sr0_term1 <- (1 - emc) * qnorm(1 - 1/nTrials)
sr0_term2 <- emc * qnorm(1 - 1/nTrials * exp(-1))
sr0 <- sqrt(varTrials * 1/periodsInYear) * (sr0_term1 + sr0_term2)

numerator <- (sharpe/sqrt(periodsInYear) - sr0)*sqrt(numPeriods - 1)

skewnessTerm <- 1 - skew * sharpe/sqrt(periodsInYear)
kurtosisTerm <- (kurt-1)/4*(sharpe/sqrt(periodsInYear))^2

denominator <- sqrt(skewnessTerm + kurtosisTerm)

result <- pnorm(numerator/denominator)
pval <- 1-result
return(pval)
}

The inputs are the strategy’s Sharpe ratio, the number of backtest runs, the variance of the sharpe ratios of those backtest runs, the skewness of the candidate strategy, its non-excess kurtosis, the number of periods in the backtest, and the number of periods in a year. Unlike the De Prado paper, I choose to return the p-value (EG 1-.

Let’s collect all our Sharpe ratios now.

allSharpes <- c(as.numeric(table.AnnualizedReturns(sigBoxplots)[3,]),
meltedSharpes$Sharpe,
as.numeric(table.AnnualizedReturns(strat)[3,]),
loStoplossFrame$Sharpe)

And now, let’s plug and chug!

stratSignificant <- deflatedSharpe(sharpe = as.numeric(table.AnnualizedReturns(strat)[3,]),
nTrials = length(allSharpes), varTrials = var(allSharpes),
skew = as.numeric(skewness(strat)), kurt = as.numeric(kurtosis(strat)) + 3,
numPeriods = nrow(strat), periodsInYear = 12)

And the result!

> stratSignificant
[1] 0.01311

Success! At least at the 5% level…and a rejection at the 1% level, and any level beyond that.

So, one last thing! Out-of-sample testing on ETFs (and mutual funds during the ETF burn-in period)!

symbols2 <- c("CWB", "JNK", "TLT", "SHY", "PCY")
getSymbols(symbols2, from='1900-01-01')
prices2 <- list()
for(tmp in symbols2) {
prices2[[tmp]] <- Ad(get(tmp))
}
prices2 <- do.call(cbind, prices2)
colnames(prices2) <- substr(colnames(prices2), 1, 3)
returns2 <- na.omit(Return.calculate(prices2))

monthRets2 <- apply.monthly(returns2, Return.cumulative)

oosStrat <- list()
for(i in 1:12) {
oosStrat[[i]] <- ruleBacktest(returns = monthRets2, nMonths = i, dailyReturns = returns2, nSD = 252)[[1]]
}
oosStrat <- do.call(cbind, oosStrat)
oosStrat <- Return.portfolio(R = na.omit(oosStrat), rebalance_on="months")

symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
prices[[symbol]] <- Ad(get(symbol))
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
oosMFreturns <- na.omit(Return.calculate(prices))
oosMFmonths <- apply.monthly(oosMFreturns, Return.cumulative)

oosMF <- list()
for(i in 1:12) {
oosMF[[i]] <- ruleBacktest(returns = oosMFmonths, nMonths = i, dailyReturns = oosMFreturns, nSD=252)[[1]]
}
oosMF <- do.call(cbind, oosMF)
oosMF <- Return.portfolio(R = na.omit(oosMF), rebalance_on="months")
oosMF <- oosMF["2009-04/2011-03"]

fullOOS <- rbind(oosMF, oosStrat)

rbind(table.AnnualizedReturns(fullOOS), maxDrawdown(fullOOS), CalmarRatio(fullOOS))
charts.PerformanceSummary(fullOOS)

And the results:

portfolio.returns
Annualized Return 0.1273
Annualized Std Dev 0.0901
Annualized Sharpe (Rf=0%) 1.4119
Worst Drawdown 0.1061
Calmar Ratio 1.1996

And one more equity curve (only the second!).

In other words, the out-of-sample statistics compare to the in-sample statistics. The Sharpe ratio is higher, the Calmar slightly lower. But on a whole, the performance has kept up. Unfortunately, the strategy is currently in a drawdown, but that’s the breaks.

So, whew. That concludes my first go at hypothesis-driven development, and has hopefully at least demonstrated the process to a satisfactory degree. What started off as a toy strategy instead turned from a rejection to a not rejection to demonstrating ideas from three separate papers, and having out-of-sample statistics that largely matched if not outperformed the in-sample statistics. For those thinking about investing in this strategy (again, here is the strategy: take 12 different portfolios, each selecting the asset with the highest momentum over months 1-12, target an annualized volatility of 10%, with volatility defined as the rolling annualized 252-day standard deviation, and equal-weight them every month), what I didn’t cover was turnover and taxes (this is a bond ETF strategy, so dividends will play a large role).

Now, one other request–many of the ideas for this blog come from my readers. I am especially interested in things to think about from readers with line-management responsibilities, as I think many of the questions from those individuals are likely the most universally interesting ones. If you’re one such individual, I’d appreciate an introduction, and knowing who more of the individuals in my reader base are.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up, consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Hypothesis Driven Development Part IV: Testing The Barroso/Santa Clara Rule

This post will deal with applying the constant-volatility procedure written about by Barroso and Santa Clara in their paper “Momentum Has Its Moments”.

The last two posts dealt with evaluating the intelligence of the signal-generation process. While the strategy showed itself to be marginally better than randomly tossing darts against a dartboard and I was ready to reject it for want of moving onto better topics that are slightly less of a toy in terms of examples than a little rotation strategy, Brian Peterson told me to see this strategy through to the end, including testing out rule processes.

First off, to make a distinction, rules are not signals. Rules are essentially a way to quantify what exactly to do assuming one acts upon a signal. Things such as position sizing, stop-loss processes, and so on, all fall under rule processes.

This rule deals with using leverage in order to target a constant volatility.

So here’s the idea: in their paper, Pedro Barroso and Pedro Santa Clara took the Fama-French momentum data, and found that the classic WML strategy certainly outperforms the market, but it has a critical downside, namely that of momentum crashes, in which being on the wrong side of a momentum trade will needlessly expose a portfolio to catastrophically large drawdowns. While this strategy is a long-only strategy (and with fixed-income ETFs, no less), and so would seem to be more robust against such massive drawdowns, there’s no reason to leave money on the table. To note, not only have Barroso and Santa Clara covered this phenomena, but so have others, such as Tony Cooper in his paper “Alpha Generation and Risk Smoothing Using Volatility of Volatility”.

In any case, the setup here is simple: take the previous portfolios, consisting of 1-12 month momentum formation periods, and every month, compute the annualized standard deviation, using a 21-252 (by 21) formation period, for a total of 12 x 12 = 144 trials. (So this will put the total trials run so far at 24 + 144 = 168…bonus points if you know where this tidbit is going to go).

Here’s the code (again, following on from the last post, which follows from the second post, which follows from the first post in this series).

require(reshape2)
require(ggplot2)

ruleBacktest <- function(returns, nMonths, dailyReturns,
                         nSD=126, volTarget = .1) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
  nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
  selection <- (nMonthAvgRank==5) * 1 #select highest average performance
  dailyBacktest <- Return.portfolio(R = dailyReturns, weights = selection)
  constantVol <- volTarget/(runSD(dailyBacktest, n = nSD) * sqrt(252))
  monthlyLeverage <- na.omit(constantVol[endpoints(constantVol), on ="months"])
  wts <- cbind(monthlyLeverage, 1-monthlyLeverage)
  constantVolComponents <- cbind(dailyBacktest, 0)
  out <- Return.portfolio(R = constantVolComponents, weights = wts)
  out <- apply.monthly(out, Return.cumulative)
  return(out)
}

t1 <- Sys.time()
allPermutations <- list()
for(i in seq(21, 252, by = 21)) {
  monthVariants <- list()
  for(j in 1:12) {
    trial <- ruleBacktest(returns = monthRets, nMonths = j, dailyReturns = sample, nSD = i)
    sharpe <- table.AnnualizedReturns(trial)[3,]
    monthVariants[[j]] <- sharpe
  }
  allPermutations[[i]] <- do.call(c, monthVariants)
}
allPermutations <- do.call(rbind, allPermutations)
t2 <- Sys.time()
print(t2-t1)

rownames(allPermutations) <- seq(21, 252, by = 21)
colnames(allPermutations) <- 1:12

baselineSharpes <- table.AnnualizedReturns(algoPortfolios)[3,]
baselineSharpeMat <- matrix(rep(baselineSharpes, 12), ncol=12, byrow=TRUE)

diffs <- allPermutations - as.numeric(baselineSharpeMat)
require(reshape2)
require(ggplot2)
meltedDiffs <-melt(diffs)

colnames(meltedDiffs) <- c("volFormation", "momentumFormation", "sharpeDifference")
ggplot(meltedDiffs, aes(x = momentumFormation, y = volFormation, fill=sharpeDifference)) + 
  geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red")

meltedSharpes <- melt(allPermutations)
colnames(meltedSharpes) <- c("volFormation", "momentumFormation", "Sharpe")
ggplot(meltedSharpes, aes(x = momentumFormation, y = volFormation, fill=Sharpe)) + 
  geom_tile()+scale_fill_gradient2(high="green", mid="yellow", low="red", midpoint = mean(allPermutations))

Again, there’s no parallel code since this is a relatively small example, and I don’t know which OS any given instance of R runs on (windows/linux have different parallelization infrastructure).

So the idea here is to simply compare the Sharpe ratios with different volatility lookback periods against the baseline signal-process-only portfolios. The reason I use Sharpe ratios, and not say, CAGR, volatility, or drawdown is that Sharpe ratios are scale-invariant. In this case, I’m targeting an annualized volatility of 10%, but with a higher targeted volatility, one can obtain higher returns at the cost of higher drawdowns, or be more conservative. But the Sharpe ratio should stay relatively consistent within reasonable bounds.

So here are the results:

Sharpe improvements:

In this case, the diagram shows that on a whole, once the volatility estimation period becomes long enough, the results are generally positive. Namely, that if one uses a very short estimation period, that volatility estimate is more dependent on the last month’s choice of instrument, as opposed to the longer-term volatility of the system itself, which can create poor forecasts. Also to note is that the one-month momentum formation period doesn’t seem overly amenable to the constant volatility targeting scheme (there’s basically little improvement if not a slight drag on risk-adjusted performance). This is interesting in that the baseline Sharpe ratio for the one-period formation is among the best of the baseline performances. However, on a whole, the volatility targeting actually does improve risk-adjusted performance of the system, even one as simple as throwing all your money into one asset every month based on a single momentum signal.

Absolute Sharpe ratios:

In this case, the absolute Sharpe ratios look fairly solid for such a simple system. The 3, 7, and 9 month variants are slightly lower, but once the volatility estimation period reaches between 126 and 252 days, the results are fairly robust. The Barroso and Santa Clara paper uses a period of 126 days to estimate annualized volatility, which looks solid across the entire momentum formation period spectrum.

In any case, it seems the verdict is that a constant volatility target improves results.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests

This post will show how to use Monte Carlo to test for signal intelligence.

Although I had rejected this strategy in the last post, I was asked to do a monte-carlo analysis of a thousand random portfolios to see how the various signal processes performed against said distribution. Essentially, the process is quite simple: as I’m selecting one asset each month to hold, I simply generate a random number between 1 and the amount of assets (5 in this case), and hold it for the month. Repeat this process for the number of months, and then repeat this process a thousand times, and see where the signal processes fall across that distribution.

I didn’t use parallel processing here since Windows and Linux-based R have different parallel libraries, and in the interest of having the code work across all machines, I decided to leave it off.

Here’s the code:

randomAssetPortfolio <- function(returns) {
  numAssets <- ncol(returns)
  numPeriods <- nrow(returns)
  assetSequence <- sample.int(numAssets, numPeriods, replace=TRUE)
  wts <- matrix(nrow = numPeriods, ncol=numAssets, 0)
  wts <- xts(wts, order.by=index(returns))
  for(i in 1:nrow(wts)) {
    wts[i,assetSequence[i]] <- 1
  }
  randomPortfolio <- Return.portfolio(R = returns, weights = wts)
  return(randomPortfolio)
}

t1 <- Sys.time()
randomPortfolios <- list()
set.seed(123)
for(i in 1:1000) {
  randomPortfolios[[i]] <- randomAssetPortfolio(monthRets)
}
randomPortfolios <- do.call(cbind, randomPortfolios)
t2 <- Sys.time()
print(t2-t1)

algoPortfolios <- sigBoxplots[,1:12]
randomStats <- table.AnnualizedReturns(randomPortfolios)
algoStats <- table.AnnualizedReturns(algoPortfolios)

par(mfrow=c(3,1))
hist(as.numeric(randomStats[1,]), breaks = 20, main = 'histogram of monte carlo annualized returns',
     xlab='annualized returns')
abline(v=as.numeric(algoStats[1,]), col='red')
hist(as.numeric(randomStats[2,]), breaks = 20, main = 'histogram of monte carlo volatilities',
     xlab='annualized vol')
abline(v=as.numeric(algoStats[2,]), col='red')
hist(as.numeric(randomStats[3,]), breaks = 20, main = 'histogram of monte carlo Sharpes',
     xlab='Sharpe ratio')
abline(v=as.numeric(algoStats[3,]), col='red')

allStats <- cbind(randomStats, algoStats)
aggregateMean <- apply(allStats, 1, mean)
aggregateDevs <- apply(allStats, 1, sd)

algoPs <- 1-pnorm(as.matrix((algoStats - aggregateMean)/aggregateDevs))

plot(as.numeric(algoPs[1,])~c(1:12), main='Return p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(1-as.numeric(algoPs[2,])~c(1:12), ylim=c(0, .5), main='Annualized vol p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(as.numeric(algoPs[3,])~c(1:12), main='Sharpe p-values',
     xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

And here are the results:


In short, compared to monkeys throwing darts, to use some phrasing from the Price Action Lab blog, these signal processes are only marginally intelligent, if at all, depending on the variation one chooses. Still, I was recommended to see this process through the end, and evaluate rules, so next time, I’ll evaluate one easy-to-implement rule.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post.

The last time around, we saw that rank regression had a very statistically significant result. Therefore, the next step would be to evaluate the basic signals — whether or not there is statistical significance in the actual evaluation of the signal–namely, since the strategy from SeekingAlpha simply selects the top-ranked ETF every month, this is a very easy signal to evaluate.

Simply, using the 1-24 month formation periods for cumulative sum of monthly returns, select the highest-ranked ETF and hold it for one month.

Here’s the code to evaluate the signal (continued from the last post), given the returns, a month parameter, and an EW portfolio to compare with the signal.


signalBacktest <- function(returns, nMonths, ewPortfolio) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAvgRank <- t(apply(nMonthAverage, 1, rank))
  nMonthAvgRank <- xts(nMonthAvgRank, order.by=index(returns))
  selection <- (nMonthAvgRank==5) * 1 #select highest average performance
  sigTest <- Return.portfolio(R = returns, weights = selection)
  difference <- sigTest - ewPortfolio
  diffZscore <- mean(difference)/sd(difference)
  sigZscore <- mean(sigTest)/sd(sigTest)
  return(list(sigTest, difference, mean(sigTest), sigZscore, mean(difference), diffZscore))
}

ewPortfolio <- Return.portfolio(monthRets, rebalance_on="months")

sigBoxplots <- list()
excessBoxplots <- list()
sigMeans <- list()
sigZscores <- list()
diffMeans <- list()
diffZscores <- list()
for(i in 1:24) {
  tmp <- signalBacktest(monthRets, nMonths = i, ewPortfolio)
  sigBoxplots[[i]] <- tmp[[1]]
  excessBoxplots[[i]] <- tmp[[2]]
  sigMeans[[i]] <- tmp[[3]]
  sigZscores[[i]] <- tmp[[4]]
  diffMeans[[i]] <- tmp[[5]]
  diffZscores[[i]] <- tmp[[6]]
}

sigBoxplots <- do.call(cbind, sigBoxplots)
excessBoxplots <- do.call(cbind, excessBoxplots)
sigMeans <- do.call(c, sigMeans)
sigZscores <- do.call(c, sigZscores)
diffMeans <- do.call(c, diffMeans)
diffZscores <- do.call(c, diffZscores)

par(mfrow=c(2,1))
plot(as.numeric(sigMeans)*100, type='h', main = 'signal means', 
     ylab = 'percent per month', xlab='formation period')
plot(as.numeric(sigZscores), type='h', main = 'signal Z scores', 
     ylab='Z scores', xlab='formation period')

plot(as.numeric(diffMeans)*100, type='h', main = 'mean difference between signal and EW',
     ylab = 'percent per month', xlab='formation period')
plot(as.numeric(diffZscores), type='h', main = 'difference Z scores',
     ylab = 'Z score', xlab='formation period')

boxplot(as.matrix(sigBoxplots), main = 'signal boxplots', xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(sigBoxplots[,1:12]), main = 'signal boxplots 1 through 12 month formations', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

boxplot(as.matrix(excessBoxplots), main = 'difference (signal - EW) boxplots', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans, col='blue')

boxplot(as.matrix(excessBoxplots[,1:12]), main = 'difference (signal - EW) boxplots 1 through 12 month formations', 
        xlab='formation period')
abline(h=0, col='red')
points(sigMeans[1:12], col='blue')

Okay, so what’s going on here is that I compare the signal against the equal weight portfolio, and take means and z scores of both the signal values in general, and against the equal weight portfolio. I plot these values, along with boxplots of the distributions of both the signal process, and the difference between the signal process and the equal weight portfolio.

Here are the results:




To note, the percents are already multiplied by 100, so in the best cases, the rank strategy outperforms the equal weight strategy by about 30 basis points per month. However, these results are…not even in the same parking lot as statistical significance, let alone in the same ballpark.

Now, at this point, in case some people haven’t yet read Brian Peterson’s paper on strategy development, the point of hypothesis-driven development is to *reject* hypothetical strategies ASAP before looking at any sort of equity curve and trying to do away with periods of underperformance. So, at this point, I would like to reject this entire strategy because there’s no statistical evidence to actually continue. Furthermore, because August 2015 was a rather interesting month, especially in terms of volatility dispersion, I want to return to volatility trading strategies, now backed by hypothesis-driven development.

If anyone wants to see me continue to rule testing with this process, let me know. If not, I have more ideas on the way.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.