How well can you scale your strategy?

This post will deal with a quick, finger in the air way of seeing how well a strategy scales–namely, how sensitive it is to latency between signal and execution, using a simple volatility trading strategy as an example. The signal will be the VIX/VXV ratio trading VXX and XIV, an idea I got from Volatility Made Simple’s amazing blog, particularly this post. The three signals compared will be the “magical thinking” signal (observe the close, buy the close, named from the ruleOrderProc setting in quantstrat), buy on next-day open, and buy on next-day close.

Let’s get started.

require(downloader)
require(PerformanceAnalytics)
require(IKTrading)
require(TTR)

download("http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vxvdailyprices.csv", 
         destfile="vxvData.csv")
download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT",
         destfile="longXIV.txt")
download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", 
         destfile="longVXX.txt") #requires downloader package
getSymbols('^VIX', from = '1990-01-01')


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxv <- xts(read.zoo("vxvData.csv", header=TRUE, sep=",", format="%m/%d/%Y", skip=2))
vixVxv <- Cl(VIX)/Cl(vxv)


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

vxxCloseRets <- Return.calculate(Cl(vxx))
vxxOpenRets <- Return.calculate(Op(vxx))
xivCloseRets <- Return.calculate(Cl(xiv))
xivOpenRets <- Return.calculate(Op(xiv))

vxxSig <- vixVxv > 1
xivSig <- 1-vxxSig

magicThinking <- vxxCloseRets * lag(vxxSig) + xivCloseRets * lag(xivSig)
nextOpen <- vxxOpenRets * lag(vxxSig, 2) + xivOpenRets * lag(xivSig, 2)
nextClose <- vxxCloseRets * lag(vxxSig, 2) + xivCloseRets * lag(xivSig, 2)
tradeWholeDay <- (nextOpen + nextClose)/2

compare <- na.omit(cbind(magicThinking, nextOpen, nextClose, tradeWholeDay))
colnames(compare) <- c("Magic Thinking", "Next Open", 
                       "Next Close", "Execute Through Next Day")
charts.PerformanceSummary(compare)
rbind(table.AnnualizedReturns(compare), 
      maxDrawdown(compare), CalmarRatio(compare))

par(mfrow=c(1,1))
chart.TimeSeries(log(cumprod(1+compare), base = 10), legend.loc='topleft', ylab='log base 10 of additional equity',
                 main = 'VIX vx. VXV different execution times')

So here’s the run-through. In addition to the magical thinking strategy (observe the close, buy that same close), I tested three other variants–a variant which transacts the next open, a variant which transacts the next close, and the average of those two. Effectively, I feel these three could give a sense of a strategy’s performance under more realistic conditions–that is, how well does the strategy perform if transacted throughout the day, assuming you’re managing a sum of money too large to just plow into the market in the closing minutes (and if you hope to get rich off of trading, you will have a larger sum of money than the amount you can apply magical thinking to). Ideally, I’d use VWAP pricing, but as that’s not available for free anywhere I know of, that means that readers can’t replicate it even if I had such data.

In any case, here are the results.

Equity curves:

Log scale (for Mr. Tony Cooper and others):

Stats:

                          Magic Thinking Next Open Next Close Execute Through Next Day
Annualized Return               0.814100 0.8922000  0.5932000                 0.821900
Annualized Std Dev              0.622800 0.6533000  0.6226000                 0.558100
Annualized Sharpe (Rf=0%)       1.307100 1.3656000  0.9529000                 1.472600
Worst Drawdown                  0.566122 0.5635336  0.6442294                 0.601014
Calmar Ratio                    1.437989 1.5831686  0.9208586                 1.367510

My reaction? The execute on next day’s close performance being vastly lower than the other configurations (and that deterioration occurring in the most recent years) essentially means that the fills will have to come pretty quickly at the beginning of the day. While the strategy seems somewhat scalable through the lens of this finger-in-the-air technique, in my opinion, if the first full day of possible execution after signal reception will tank a strategy from a 1.44 Calmar to a .92, that’s a massive drop-off, after holding everything else constant. In my opinion, I think this is quite a valid question to ask anyone who simply sells signals, as opposed to manages assets. Namely, how sensitive are the signals to execution on the next day? After all, unless those signals come at 3:55 PM, one is most likely going to be getting filled the next day.

Now, while this strategy is a bit of a tomato can in terms of how good volatility trading strategies can get (they can get a *lot* better in my opinion), I think it made for a simple little demonstration of this technique. Again, a huge thank you to Mr. Helmuth Vollmeier for so kindly keeping up his dropbox all this time for the volatility data!

Thanks for reading.

NOTE: I am currently contracting in a data science capacity in Chicago. You can email me at ilya.kipnis@gmail.com, or find me on my LinkedIn here. I’m always open to beers after work if you’re in the Chicago area.

NOTE 2: Today, on October 21, 2015, if you’re in Chicago, there’s a Chicago R Users Group conference at Jaks Tap at 6:00 PM. Free pizza, networking, and R, hosted by Paul Teetor, who’s a finance guy. Hope to see you there.

Volatility Stat-Arb Shenanigans

This post deals with an impossible-to-implement statistical arbitrage strategy using VXX and XIV. The strategy is simple: if the average daily return of VXX and XIV was positive, short both of them at the close. This strategy makes two assumptions of varying dubiousness: that one can “observe the close and act on the close”, and that one can short VXX and XIV.

So, recently, I decided to play around with everyone’s two favorite instruments on this blog–VXX and XIV, with the idea that “hey, these two instruments are diametrically opposed, so shouldn’t there be a stat-arb trade here?”

So, in order to do a lick-finger-in-the-air visualization, I implemented Mike Harris’s momersion indicator.

momersion <- function(R, n, returnLag = 1) {
  momentum <- sign(R * lag(R, returnLag))
  momentum[momentum < 0] <- 0
  momersion <- runSum(momentum, n = n)/n * 100
  colnames(momersion) <- "momersion"
  return(momersion)
}

And then I ran the spread through it.


xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE))
vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE))

xivRets <- Return.calculate(Cl(xiv))
vxxRets <- Return.calculate(Cl(vxx))

volSpread <- xivRets + vxxRets
volSpreadMomersion <- momersion(volSpread, n = 252)
plot(volSpreadMomersion)

In other words, this spread is certainly mean-reverting at just about all times.

And here is the code for the results from 2011 onward, from when the XIV and VXX actually started trading.

#both sides
sig <- -lag(sign(volSpread))
longShort <- sig * volSpread
charts.PerformanceSummary(longShort['2011::'], main = 'long and short spread')

#long spread only
sig <- -lag(sign(volSpread))
sig[sig < 0] <- 0
longOnly <- sig * volSpread
charts.PerformanceSummary(longOnly['2011::'], main = 'long spread only')


#short spread only
sig <- -lag(sign(volSpread))
sig[sig > 0] <- 0
shortOnly <- sig * volSpread
charts.PerformanceSummary(shortOnly['2011::'], main = 'short spread only')

threeStrats <- na.omit(cbind(longShort, longOnly, shortOnly))["2011::"]
colnames(threeStrats) <- c("LongShort", "Long", "Short")
rbind(table.AnnualizedReturns(threeStrats), CalmarRatio(threeStrats))

Here are the equity curves:

Long-short:

Long-only:

Short-only:

With the following statistics:

                          LongShort      Long    Short
Annualized Return          0.115400 0.0015000 0.113600
Annualized Std Dev         0.049800 0.0412000 0.027900
Annualized Sharpe (Rf=0%)  2.317400 0.0374000 4.072100
Calmar Ratio               1.700522 0.0166862 7.430481

In other words, the short side is absolutely amazing as a trade–except for the one small fact of having it be impossible to actually execute, or at least as far as I’m aware. Anyhow, this was simply a for-fun post, but hopefully it served some purpose.

Thanks for reading.

NOTE: I am currently contracting and am looking to network in the Chicago area. You can find my LinkedIn here.

Introduction to Hypothesis Driven Development — Overview of a Simple Strategy and Indicator Hypotheses

This post will begin to apply a hypothesis-driven development framework (that is, the framework written by Brian Peterson on how to do strategy construction correctly, found here) to a strategy I’ve come across on SeekingAlpha. Namely, Cliff Smith posted about a conservative bond rotation strategy, which makes use of short-term treasuries, long-term treasuries, convertibles, emerging market debt, and high-yield corporate debt–that is, SHY, TLT, CWB, PCY, and JNK. What this post will do is try to put a more formal framework on whether or not this strategy is a valid one to begin with.

One note: For the sake of balancing succinctness for blog consumption and to demonstrate the computational techniques more quickly, I’ll be glossing over background research write-ups for this post/strategy, since it’s yet another take on time-series/cross-sectional momentum, except pared down to something more implementable for individual investors, as opposed to something that requires a massive collection of different instruments for massive, institutional-class portfolios.

Introduction, Overview, Objectives, Constraints, Assumptions, and Hypotheses to be Tested:

Momentum. It has been documented many times. For the sake of brevity, I’ll let readers follow the links if they’re so inclined, but among them are Jegadeesh and Titman’s seminal 1993 paper, Mark Carhart’s 1997 paper, Andreu et. Al (2012), Barroso and Santa-Clara (2013), Ilmanen’s Expected Returns (which covers momentum), and others. This list, of course, is far from exhaustive, but the point stands. Formation periods of several months (up to a year) should predict returns moving forward on some holding period, be it several months, or as is more commonly seen, one month.

Furthermore, momentum applies in two varieties–cross sectional, and time-series. Cross-sectional momentum asserts that assets that outperformed among a group will continue to outperform, while time-series momentum asserts that assets that have risen in price during a formation period will continue to do so for the short-term future.

Cliff Smith’s strategy depends on the latter, effectively, among a group of five bond ETFs. I am not certain of the objective of the strategy (he didn’t mention it), as PCY, JNK, and CWB, while they may be fixed-income in name, possess volatility on the order of equities. I suppose one possible “default” objective would be to achieve an outperforming total return against an equal-weighted benchmark, both rebalanced monthly.

The constraints are that one would need a sufficient amount of capital such that fixed transaction costs are negligible, since the strategy is a single-instrument rotation type, meaning that each month may have two-way turnover of 200% (sell one ETF, buy another). On the other hand, one would assume that the amount of capital deployed is small enough such that execution costs of trading do not materially impact the performance of the strategy. That is to say, moving multiple billions from one of these ETFs to the other is a non-starter. As all returns are computed close-to-close for the sake of simplicity, this creates the implicit assumption that the market impact and execution costs are very small compared to overall returns.

There are two overarching hypotheses to be tested in order to validate the efficacy of this strategy:

1) Time-series momentum: while it has been documented for equities and even industry/country ETFs, it may not have been formally done so yet for fixed-income ETFs, and their corresponding mutual funds. In order to validate this strategy, it should be investigated if the particular instruments it selects adhere to the same phenomena.

2) Cross-sectional momentum: again, while this has been heavily demonstrated in the past with regards to equities, ETFs are fairly new, and of the five mutual funds Cliff Smith selected, the latest one only has data going back to 1997, thus allowing less sophisticated investors to easily access diversified fixed income markets a relatively new innovation.

Essentially, both of these can be tested over a range of parameters (1-24 months).

Another note: with hypothesis-driven strategy development, the backtest is to be *nothing more than a confirmation of all the hypotheses up to that point*. That is, re-optimizing on the backtest itself means overfitting. Any proposed change to a strategy should be done in the form of tested hypotheses, as opposed to running a bunch of backtests and selecting the best trials. Taken another way, this means that every single proposed element of a strategy needs to have some form of strong hypothesis accompanying it, in order to be justified.

So, here are the two hypotheses I tested on the corresponding mutual funds:

require(quantmod)
require(PerformanceAnalytics)
require(reshape2)
symbols <- c("CNSAX", "FAHDX", "VUSTX", "VFISX", "PREMX")
getSymbols(symbols, from='1900-01-01')
prices <- list()
for(symbol in symbols) {
  prices[[symbol]] <- Ad(get(symbol))
}
prices <- do.call(cbind, prices)
colnames(prices) <- substr(colnames(prices), 1, 5)
returns <- na.omit(Return.calculate(prices))

sample <- returns['1997-08/2009-03']
monthRets <- apply.monthly(sample, Return.cumulative)

returnRegression <- function(returns, nMonths) {
  nMonthAverage <- apply(returns, 2, runSum, n = nMonths)
  nMonthAverage <- xts(nMonthAverage, order.by = index(returns))
  nMonthAverage <- na.omit(lag(nMonthAverage))
  returns <- returns[index(nMonthAverage)]
  
  rankAvg <- t(apply(nMonthAverage, 1, rank))
  rankReturn <- t(apply(returns, 1, rank))
  
  
  meltedAverage <- melt(data.frame(nMonthAverage))
  meltedReturns <- melt(data.frame(returns))
  meltedRankAvg <- melt(data.frame(rankAvg))
  meltedRankReturn <- melt(data.frame(rankReturn))
  lmfit <- lm(meltedReturns$value ~ meltedAverage$value - 1)
  rankLmfit <- lm(meltedRankReturn$value ~ meltedRankAvg$value)
  return(rbind(summary(lmfit)$coefficients, summary(rankLmfit)$coefficients))
}

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)

Essentially, in this case, I take a pooled regression (that is, take the five instruments and pool them together into one giant vector), and regress the cumulative sum of monthly returns against the next month’s return. Also, I do the same thing as the above, except also using cross-sectional ranks for each month, and performing a rank-rank regression. The sample I used was the five mutual funds (CNSAX, FAHDX, VUSTX, VFISX, and PREMX) since their inception to March 2009, since the data for the final ETF begins in April of 2009, so I set aside the ETF data for out-of-sample backtesting.

Here are the results:

pvals <- list()
estimates <- list()
rankPs <- list()
rankEstimates <- list()
for(i in 1:24) {
  tmp <- returnRegression(monthRets, nMonths=i)
  pvals[[i]] <- tmp[1,4]
  estimates[[i]] <- tmp[1,1]
  rankPs[[i]] <- tmp[2,4]
  rankEstimates[[i]] <- tmp[2,1]
}
pvals <- do.call(c, pvals)
estimates <- do.call(c, estimates)
rankPs <- do.call(c, rankPs)
rankEstimates <- do.call(c, rankEstimates)


plot(estimates, type='h', xlab = 'Months regressed on', ylab='momentum coefficient', 
     main='future returns regressed on past momentum')
plot(pvals, type='h', xlab='Months regressed on', ylab='p-value', main='momentum significance')
abline(h=.05, col='green')
abline(h=.1, col='red')

plot(rankEstimates, type='h', xlab='Months regressed on', ylab="Rank coefficient",
     main='future return ranks regressed on past momentum ranks', ylim=c(0,3))
plot(rankPs, type='h', xlab='Months regressed on', ylab='P-values')




Of interest to note is that while much of the momentum literature specifies a reversion effect on time-series momentum at 12 months or greater, all the regression coefficients in this case (even up to 24 months!) proved to be positive, with the very long-term coefficients possessing more statistical significance than the short-term ones. Nevertheless, Cliff Smith’s chosen parameters (the two and four month settings) possess statistical significance at least at the 10% level. However, if one were to be highly conservative in terms of rejecting strategies, that in and of itself may be reason enough to reject this strategy right here.

However, the rank-rank regression (that is, regressing the future month’s cross-sectional rank on the past n month sum cross sectional rank) proved to be statistically significant beyond any doubt, with all p-values being effectively zero. In short, there is extremely strong evidence for cross-sectional momentum among these five assets, which extends out to at least two years. Furthermore, since SHY or VFISX, aka the short-term treasury fund, is among the assets chosen, since it’s a proxy for the risk-free rate, by including it among the cross-sectional rankings, the cross-sectional rankings also implicitly state that in order to be invested into (as this strategy is a top-1 asset rotation strategy), it must outperform the risk-free asset, otherwise, by process of elimination, the strategy will invest into the risk-free asset itself.

In upcoming posts, I’ll look into testing hypotheses on signals and rules.

Lastly, Volatility Made Simple has just released a blog post on the performance of volatility-based strategies for the month of August. Given the massive volatility spike, the dispersion in performance of strategies is quite interesting. I’m happy that in terms of YTD returns, the modified version of my strategy is among the top 10 for the year.

Thanks for reading.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

PELTing a Competing Changepoint Algorithm

This post will demonstrate the PELT algorithm from the changepoint package–a competing algorithm to the twitter package’s breakout detection algorithm. While neither of these algorithms produce satisfactory results, one change point location approximation algorithm that makes no distributional assumptions shows potentially promising results.

I received some feedback regarding my first foray into change point analysis from Brian Peterson. While some of it was good, a fair bit more was how I can add more to my analysis, more boxes I could check off, and so on. One of those boxes was the PELT algorithm, devised by one Rebecca Killick of Lancaster University, which I’ll give a quick run-through of below.

In the twitter paper, PELT was the competing algorithm that the paper compared itself to, and while I didn’t think that replicating the competing algorithm would be necessary at first go, it turns out, that, well, it was necessary. So, going forward, I’m going to have more points demonstrating some more aspects of these change point detection algorithms. Thus far, the most impressive one has been David Matteson’s e.divisive algorithm in his ecp package. However, its one caveat for me is its massively long running time.

Anyhow, without further ado, let’s replicate a diagram found on page 7 of the original paper in the lower right hand corner. Turns out, that is the Scribe data set that comes with the BreakoutDetection package, so we have all we need.

require(BreakoutDetection)
require(changepoint)
data(Scribe)
bd <- breakout(Scribe)
pelt <- cpt.meanvar(Scribe)
plot(Scribe, type="l")
abline(v=bd$loc, col="red")
abline(v=pelt@cpts[1], col="blue")
legend(legend = c("Breakout", "PELT"), x = 2, y = 1000, fill = c("red", "blue"))

This gives us the following diagram.

In short, the paper’s claim is corroborated. PELT underperforms even in a simple example, using both packages’ only-one-changepoint methodology. Furthermore, PELT is actually an S4-class type of object (so for those wondering what the @ character is doing, it’s the equivalent of the $ elsewhere in R).

Let’s move onto the GSPC data.

suppressMessages(require(quantmod))
suppressMessages(require(PerformanceAnalytics))

suppressMessages(getSymbols("^GSPC", from = "1984-12-25", to = "2013-05-31"))
#these two lines are only needed if Yahoo gives you holidays such as New Year's -- EG 1985-01-01
require(timeDate)
GSPC <- GSPC[!as.character(index(GSPC)) %in% as.character(holidayNYSE(1985:2013)),]

dailySqRets <- Return.calculate(Cl(GSPC))*Return.calculate(Cl(GSPC))
dailySqRets <- dailySqRets["1985::"]

Now we’ll look at a histogram of the daily squared returns and see if it looks like some sort of famous distribution.

plot(hist(dailySqRets, breaks=1000), xlim=c(0, 0.001))

Which results in the following image

So, an exponential distribution (give or take)? Well, let’s try and do some PELT changepoint analysis using the exponential distribution.

PELTcps <- cpt.meanvar(as.numeric(dailySqRets), 
                       method="PELT", test.stat="Exponential")

Did that work? No. Here’s the error message.

Error in PELT.meanvar.exp(coredata(data), pen.value) : 
  Exponential test statistic requires positive data

Now, squared returns can’t possibly be negative, because that’s just nonsensical. So what does that mean? Let’s take a look.

dailySqRets[dailySqRets==0]

And the output:

> dailySqRets[dailySqRets==0]
           GSPC.Close
1985-03-28          0
1985-10-08          0
1988-02-04          0
1988-11-02          0
1992-09-03          0
1997-01-28          0
2003-01-10          0
2008-01-03          0

So, this essentially alleges that there were some days on which the close to close didn’t move at all. Let’s take a look.

GSPC["1985-03-27::1985-04-01"]

This gives us the following output:

> GSPC["1985-03-27::1985-04-01"]
           GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
1985-03-27    178.43    179.80   178.43     179.54   101000000        179.54
1985-03-28    179.54    180.60   179.43     179.54    99780000        179.54
1985-03-29    179.54    180.66   179.54     180.66   101400000        180.66
1985-04-01    180.66    181.27   180.43     181.27    89900000        181.27

Notice that the close price for the 27th and 28th day are identical, creating a return of zero, which breaks the PELT algorithm. So let’s fix that.

#the method will throw an error with zero returns, so this deals with that
dailySqRets[dailySqRets == 0] <- dailySqRets[dailySqRets==0] + 1e-100 

Essentially, this is just to get past the error messages within the changepoint package. So now, let’s try applying the algorithm once again.

peltCps <- cpt.meanvar(as.numeric(dailySqRets), 
                       method = "PELT", test.stat = "Exponential")

This time, success–it ran! Let’s check the amount of changepoints.

length(peltCps@cpts)

Which gives:

[1] 374

…which is vastly different from the e.divisive algorithm’s output from the previous investigation, or Dr. Robert Frey’s post (20, give or take). The upside is that this algorithm works quickly, but that’s not much solace if the answers are unreliable. To further dive down into the nature of the changepoints, we will remove the last change point (which is simply the length of the data) and do some summary statistics on how long some of these complete “regimes” are (that is, drop the first and last).

newCpts <- peltCps@cpts[-length(peltCps@cpts)]
regimeTime <- diff(newCpts)
summary(regimeTime)
hist(regimeTime, breaks = 100)

…which provides the following output:

> summary(regimeTime)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    3.00    8.00   19.14   26.00  149.00 

And the following histogram:

In short, most of these “regimes” last less than a month, and half of them don’t even make it out to two weeks. These results are not corroborated by the previously investigated methods. As more academic literature uses differences of log returns, and the point is to search for changes in the variance regime, that is the procedure that will be employed, and as the data is continuous and contains negative values, only the Normal distribution is available to choose from when using the PELT method.

logDiffs <- as.numeric(diff(log(Cl(GSPC)))["1985::"])
peltGaussCps <- cpt.var(logDiffs, method="PELT", test.stat="Normal")
length(peltGaussCps@cpts)
fullGaussRegimes <- diff(peltGaussCps@cpts[-length(peltGaussCps@cpts)])
summary(fullGaussRegimes)
hist(fullGaussRegimes, breaks = 100)

Which gives the following output:

> length(peltGaussCps@cpts)
[1] 93
> fullGaussRegimes <- diff(peltGaussCps@cpts[-length(peltGaussCps@cpts)])
> summary(fullGaussRegimes)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   16.00   48.00   73.76  106.50  498.00 

And the following histogram:

In short, even though the result is improved, it’s still far from reliable, with most regime switches taking place within 2 months, and many of those lasting barely a few days.

Lastly, I’d like to correct an issue from the previous changepoint post in which I used the “at most one changepoint” method from the BreakoutDetection package. Now, I’ll use the multiple change point detection method.

bdcp <- breakout(dailySqRets, min.size = 20, method = "multi")

With the following results:

> length(bdcp$loc)
[1] 0

In short, the breakout algorithm found no change points whatsoever on default settings, which, once again, does not line up with the results from both the ecp package, along with Dr. Frey’s post. Even if the beta (penalty parameter) gets decreased to .0001 (from .008, its default), it still fails to find any changes in the squared returns data, which is disappointing.

However, over the course of exploring the changepoint package, I have found that there is a method that is directly analogous to Dr. Frey’s cumulative sum of squares method (that is, if you check the help file for cpt.var, one of the test.stat distributions is “CSS”, aka cumulative sum of squares). There are two methods that employ this, neither of which are PELT, but both of which predate PELT (the binary segmentation and segment neighborhood methods), and are explained in Dr. Killick’s original paper.

First off, both algorithms contain a penalty term, and the penalty term that is the default setting is the Bayesian Information Criterion (or BIC aka SIC), which is a double log of the number of data points. In contrast, the AIC is simply the log of 2, so the BIC will be greater than AIC at 8 data points or higher. The cpt.var algorithm function is mostly a wrapper for further nested wrappers, and essentially, the “cut to the chase” is that we eventually nest down to the not-exported binary segmentation variance cumulative sum of squares algorithm, and the segmentation neighborhood sum of squares algorithm.

So here’s the code for the former:

changepoint:::binseg.var.css <- function (data, Q = 5, pen = 0) 
{
  n = length(data)
  if (n < 4) {
    stop("Data must have atleast 4 observations to fit a changepoint model.")
  }
  if (Q > ((n/2) + 1)) {
    stop(paste("Q is larger than the maximum number of segments", 
               (n/2) + 1))
  }
  y2 = c(0, cumsum(data^2))
  tau = c(0, n)
  cpt = matrix(0, nrow = 2, ncol = Q)
  oldmax = Inf
  for (q in 1:Q) {
    lambda = rep(0, n - 1)
    i = 1
    st = tau[1] + 1
    end = tau[2]
    for (j in 1:(n - 1)) {
      if (j == end) {
        st = end + 1
        i = i + 1
        end = tau[i + 1]
      }
      else {
        lambda[j] = sqrt((end - st + 1)/2) * 
                         ((y2[j + 1] - y2[st])/(y2[end + 1] - y2[st]) - 
                          (j - st + 1)/(end - st + 1))
      }
    }
    k = which.max(abs(lambda))
    cpt[1, q] = k
    cpt[2, q] = min(oldmax, max(abs(lambda), na.rm = T))
    oldmax = min(oldmax, max(abs(lambda), na.rm = T))
    tau = sort(c(tau, k))
  }
  op.cps = NULL
  p = 1:(Q - 1)
  for (i in 1:length(pen)) {
    criterion = (cpt[2, ]) >= pen[i]
    if (sum(criterion) == 0) {
      op.cps = 0
    }
    else {
      op.cps = c(op.cps, max(which((criterion) == TRUE)))
    }
  }
  if (op.cps == Q) {
    warning("The number of changepoints identified is Q, 
             it is advised to increase Q to make sure 
             changepoints have not been missed.")
  }
  return(list(cps = cpt, op.cpts = op.cps, pen = pen))
}

Essentially, the algorithm loops through all the data points (that is, it defines the start as the beginning of the data, and end as the end of the data), and appends them by the quantity as defined by the lambda[j] line, which is, again:

lambda[j] = sqrt((end - st + 1)/2) * 
                 ((y2[j + 1] - y2[st])/(y2[end + 1] - y2[st]) - 
                  (j - st + 1)/(end - st + 1))

Which, to put it in financial terms, multiplies the square root of half the range by the difference of the percent B (think Bollinger Bands) of the *values* of the data and the percent B of the *indices* of the data for a given start and end location (which are consecutive change point locations). Then, the candidate change point is defined by the maximum of the absolute value of lambda between a starting and ending point, as defined by this code:

    k = which.max(abs(lambda))
    cpt[1, q] = k
    cpt[2, q] = min(oldmax, max(abs(lambda), na.rm = T))
    oldmax = min(oldmax, max(abs(lambda), na.rm = T))
    tau = sort(c(tau, k))

The algorithm then updates tau (the collection of change point locations), which updates the start and end point segment locations, and the algorithm restarts again.

Lastly, at the end, the algorithm loops through the different penalties (in this case, the BIC is simply one constant value, so there may be a special case that allows for some dynamic sort of penalty), and all the change points that have a higher lambda value than the penalty are returned as candidate change points. Again, there is no test of significance in the binary segmentation variance cumulative sum of squares algorithm — so if the penalty is manually specified to be zero, and the user specifies the maximum number of change points (n/2, where n is the length of data), then the algorithm will indeed spit back that many change points. In short, while the default settings may do a half-decent job with finding change points, it’s possible to deliberately force this algorithm to produce nonsensical output. In other words, to be glib, this algorithm isn’t attempting to win the battle against the universe in the never-ending battle to sufficiently idiot-proof something vs. the universe’s capability to create a better idiot. However, using the out-of-the-box settings should sufficiently protect oneself from having absurd results.

Here’s an illustrative example of a few iterations of a few iterations of this algorithm.

In this case, our data will be the daily returns of the GSPC from 1985 onward.

I’ll set Q (the maximum number of change points) at 30.

Let’s start this off.

data <- as.numeric(Return.calculate(Cl(GSPC))["1985::"])
Q <- 30
n <- length(data)
y2 = c(0, cumsum(data^2))
tau = c(0, n)
cpt = matrix(0, nrow = 2, ncol = Q)
oldmax = Inf

Which gives us:

> tau
[1]    0 7328
> cpt
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] 
[1,]    0    0    0    0    0    0    0    0    0    
[2,]    0    0    0    0    0    0    0    0    0  
      [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]   
[1,]      0     0     0     0     0     0     0     0     0     0
[2,]      0     0     0     0     0     0     0     0     0     0
     [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,]     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0

In short, tau is the location for the changepoints, and and the cpt matrix will be demonstrated shortly.

Here’s the first iteration through the main loop.

q <- 1
lambda = rep(0, n - 1)
    i = 1
    st = tau[1] + 1
    end = tau[2]
    for (j in 1:(n - 1)) {
      if (j == end) {
        st = end + 1
        i = i + 1
        end = tau[i + 1]
      }
      else {
        lambda[j] = sqrt((end - st + 1)/2) * 
                         ((y2[j + 1] - y2[st])/(y2[end + 1] - y2[st]) - 
                          (j - st + 1)/(end - st + 1))
      }
    }
    k = which.max(abs(lambda))
    cpt[1, q] = k
    cpt[2, q] = min(oldmax, max(abs(lambda), na.rm = T))
    oldmax = min(oldmax, max(abs(lambda), na.rm = T))
    tau = sort(c(tau, k))

And after this first iteration of the loop completes, here are the updated values of tau and cpt:

> tau
[1]    0 5850 7328
> cpt
           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] 
[1,] 5850.00000    0    0    0    0    0    0    0    0    
[2,]   10.51362    0    0    0    0    0    0    0    0  
      [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]   
[1,]      0     0     0     0     0     0     0     0     0     0
[2,]      0     0     0     0     0     0     0     0     0     0
     [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,]     0     0     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     0     0

In short, tau is the location of the new change point (also the first row of the cpt matrix), and the second row is the absolute value of lambda. After this, the start and end vectors get appended with a new change point to allow for the binary segmentation that forms the basis of the algorithm. However, a picture’s worth a thousand words, so here are some illustrations. The blue lines denote previous change points, and the red point the new change point.

Here is the code I added in the bottom of the loop for plotting purposes:

    k = which.max(abs(lambda))
    plot(lambda, type = "l", main = paste("Changepoint", q))
    abline(v = tau, col="blue")
    cpt[1, q] = k
    cpt[2, q] = min(oldmax, max(abs(lambda), na.rm = T))
    oldmax = min(oldmax, max(abs(lambda), na.rm = T))
    tau = sort(c(tau, k))
    points(x = k, y = lambda[k], col="red", pch = 19)

And here is the result for the first change point:

The second change point:

The 10th change point:

And the 19th (of 30) change points.

Notice that A) lambda is constantly recomputed after every iteration of the main loop, as it’s updated with every new change point and that B) the values of lambda generally decrease as more change point candidates are found, such that the 19th change point is already on the border of the penalty boundary formed by the BIC. Unlike the math in the paper, this particular algorithm in R does not actually stop when lambda of k (that is, lambda[k]) is smaller than the penalty parameter, so if someone plugged in say, Q = 300, the algorithm would take around 30 seconds to run.

So, what’s the punch line to this approximate algorithm?

This:

 t1 <- Sys.time()
 binSegCss <- cpt.var(as.numeric(Return.calculate(Cl(GSPC))["1985::"]), 
                      method="BinSeg", test.stat="CSS", Q = 30)
 t2 <- Sys.time()
 print(t2 - t1)

The algorithm ran in a few seconds for me. Here is the output:

> binSegCss@cpts
 [1]  310  720  739  858 1825 2875 3192 4524 4762 5044 5850 6139 6199 6318 6548 6641 6868 6967 7328
> length(binSegCss@cpts)
[1] 19

Since the last change point is the length of the data, we’ll disregard that. In short, 18 change points (so that last picture of the four change points? That one and all subsequent ones fell in the realm of noise), which falls right into the ballpark of the post from Keplerian Finance.

So, as before, let’s create the cumulative sum of squares plot, the time series plot, and the volatility terrain map.

require(xtsExtra)
returns <- Return.calculate(Cl(GSPC))["1985::"]
dailySqReturns <- returns*returns
cumSqReturns <- cumsum(dailySqReturns)
plot(cumSqReturns)
xtsExtra::addLines(index(dailySqReturns)[binSegCss@cpts[-length(binSegCss@cpts)]], on = 1, col="blue", lwd = 2)

plot(Cl(GSPC)["1985::"])
xtsExtra::addLines(index(dailySqReturns)[binSegCss@cpts[-length(binSegCss@cpts)]], on = 1, col="blue", lwd = 2)


returns$regime <- NA
for(i in 1:(length(binSegCss@cpts)-1)) {
  returns$regime[binSegCss@cpts[i]] <- i
}
returns$regime <- na.locf(returns$regime)
returns$regime[is.na(returns$regime)] <- 0
returns$regime <- returns$regime + 1
returns$annVol <- NA
for(i in 1:max(unique(returns$regime))) {
  regime <- returns[returns$regime==i,]
  annVol <- StdDev.annualized(regime[,1])
  returns$annVol[returns$regime==i,] <- annVol
}

plot(returns$annVol)

Which gives us the following three images:



This last volatility map is even closer to the one found in Keplerian Finance, thus lending credibility to the technique.

In conclusion, it seems that the twitter breakout detection algorithm (e-divisive with medians) is “too robust” against the type of events in financial data, and thus does not detect enough change points. On the other hand, the PELT algorithm suffers from the opposite issues — by forcing the assumption of a popular distribution from probability theory (EG Normal, Exponential, Poisson, Gamma), PELT outputs far too many candidate change points, making its results highly suspect. However, a special case of the binary segmentation algorithm — the binary segmentation algorithm for variance using cumulative sum of squares — presents interesting results. Although the algorithm is a heuristic approximation, its fast running time and lack of distributional assumptions lend it usefulness for doing a “quick and dirty” analysis to potentially cut down on the running time of a more computationally-intensive changepoint detecting algorithm.

When this topic will be revisited, the segment neighborhood method will be examined.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

A New Harry Long Strategy and A Couple of New PerfA Functions

So, Harry Long came out with a new strategy on SeekingAlpha involving some usual mix of SPXL (3x leveraged SPY), TMF (3x leveraged TLT), and some volatility indices (in this case, ZIV and TVIX). Now, since we’ve tread this path before, expectations are rightfully set. It’s a strategy that’s going to look good in the sample he used, it’s going to get hit hard during the crisis, and it’ll ultimately prove to be a simple-to-implement, simple-to-backtest strategy with its own set of ups and downs.

Once again, a huge thanks to Mr. Helmuth Vollmeier for the long history volatility data.

So here’s the code (I’ll skip a lot of the comparing equity curves of my synthetic instruments to the Yahoo-finance variants, as you’ve all seen that before) to get to the initial equity curve comparison.

require(downloader)
require(PerformanceAnalytics)

download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT",
         destfile="longVXX.txt")
VXX <- xts(read.zoo("longVXX.txt", sep=",", header=TRUE))
TVIXrets <- Return.calculate(Cl(VXX))*2
getSymbols("TVIX", from="1990-01-01")
TVIX <- TVIX[-which(index(TVIX)=="2014-12-30"),] #trashy Yahoo data, removing obvious bad print
compare <- merge(TVIXrets, Return.calculate(Ad(TVIX)), join='inner')
charts.PerformanceSummary(compare)
charts.PerformanceSummary(compare["2012::"])
charts.PerformanceSummary(compare["2013::"])
charts.PerformanceSummary(compare["2014::"])
charts.PerformanceSummary(compare["2015::"]) #okay we're good

download("https://www.dropbox.com/s/jk3ortdyru4sg4n/ZIVlong.TXT",
         destfile="longZIV.txt")
ZIV <- xts(read.zoo("longZIV.txt", sep=",", header=TRUE))
ZIVrets <- Return.calculate(Cl(ZIV))

getSymbols("SPY", from="1990-01-01")
SPXLrets <- Return.calculate(Ad(SPY))*3

getSymbols("TMF", from="1990-01-01")
TMFrets <- Return.calculate(Ad(TMF))
getSymbols("TLT", from="1990-01-01")
TLTrets <- Return.calculate(Ad(TLT))
tmf3TLT <- merge(TMFrets, 3*TLTrets, join='inner')
charts.PerformanceSummary(tmf3TLT)
discrepancy <- as.numeric(Return.annualized(tmf3TLT[,2]-tmf3TLT[,1]))
tmf3TLT[,2] <- tmf3TLT[,2] - ((1+discrepancy)^(1/252)-1)
charts.PerformanceSummary(tmf3TLT)
modifiedTLT <- 3*TLTrets - ((1+discrepancy)^(1/252)-1)
TMFrets <- modifiedTLT

components <- cbind(SPXLrets, ZIVrets, TMFrets, TVIXrets)
components <- components["2004-03-29::"]
stratRets <- Return.portfolio(R = components, weights = c(.4, .2, .35, .05), rebalance_on="years")
charts.PerformanceSummary(stratRets)
SPYrets <- Return.calculate(Ad(SPY))
compare <- merge(stratRets, SPYrets, join='inner')
charts.PerformanceSummary(compare["2011::"])

With the following equity curve display:

So far, so good. Let’s look at the full backtest performance.

charts.PerformanceSummary(compare)

With the resultant equity curve:

Which, given what we’ve seen before, isn’t outside the realm of expectation.

For those interested in the log equity curves, here you go:

compare[is.na(compare)] <- 0
plot(log(cumprod(1+compare)), legend.loc="topleft")

And for fun, let’s look at the outperformance equity curve.

diff <- compare[,1] - compare[,2]
charts.PerformanceSummary(diff, main="relative performance")

And the result:

Now this is somewhere in the ballpark of what you’d love to see from your strategy against a benchmark — aside from a couple of spikes which do a number on the corresponding drawdowns chart, it looks like a steady outperformance.

However, the new features I’d like to introduce in this blog post are a quicker way of generating the usual statistics table I display, and a more in-depth drawdown analysis.

Here’s how:

rbind(table.AnnualizedReturns(compare), maxDrawdown(compare))

Which gives us the following result:

> rbind(table.AnnualizedReturns(compare), maxDrawdown(compare))
                          portfolio.returns SPY.Adjusted
Annualized Return                 0.2181000    0.0799000
Annualized Std Dev                0.2159000    0.1982000
Annualized Sharpe (Rf=0%)         1.0100000    0.4030000
Worst Drawdown                    0.4326138    0.5518552

Since this saves me typing, I’ll be using this format from now on. And as a bonus, it displays annualized standard deviation. While I don’t particularly care for that statistic as I believe that max drawdown captures the notion of “here’s the pain on the other end of your returns” better than “here’s how much your strategy wiggles from day to day”, the fact that it’s thrown in and is a statistic that a lot of other people (particularly portfolio managers, pension fund managers, etc.) are interested in, so much the better.

Now, moving onto a more in-depth analysis of drawdown, PerformanceAnalytics has the following functionality:

dd <- table.Drawdowns(compare[,1], top=100)
dd <- dd[dd$Depth < -.05,]
dd
sum(dd$"To Trough")/nrow(compare)

This brings up the following table (it seems that with multiple return streams, it’ll just default to the first one), and a derived statistic.

> dd
         From     Trough         To   Depth Length To Trough Recovery
1  2008-12-19 2009-03-09 2010-03-16 -0.4326    310        53      257
2  2007-10-30 2008-10-15 2008-12-04 -0.3275    278       243       35
3  2013-05-22 2013-06-24 2013-09-18 -0.1617     83        23       60
4  2004-04-02 2004-05-10 2004-09-17 -0.1450    116        26       90
5  2010-04-26 2010-07-02 2010-09-21 -0.1230    104        49       55
6  2006-03-20 2006-06-19 2006-09-20 -0.1229    129        64       65
7  2007-05-08 2007-08-15 2007-10-01 -0.1229    102        70       32
8  2005-07-29 2005-10-27 2005-12-14 -0.1112     97        64       33
9  2011-06-01 2011-08-11 2011-09-06 -0.1056     68        51       17
10 2005-02-09 2005-03-22 2005-05-31 -0.1051     77        29       48
11 2010-11-05 2010-11-17 2011-02-07 -0.1022     64         9       55
12 2011-09-23 2011-10-27 2011-12-19 -0.0836     61        25       36
13 2013-09-19 2013-10-09 2013-10-17 -0.0815     21        15        6
14 2012-05-02 2012-05-18 2012-06-29 -0.0803     42        13       29
15 2012-10-18 2012-11-15 2012-11-29 -0.0721     28        19        9
16 2014-09-02 2014-10-13 2014-11-05 -0.0679     47        30       17
17 2008-12-05 2008-12-08 2008-12-16 -0.0589      8         2        6
18 2011-02-18 2011-03-16 2011-04-01 -0.0580     30        18       12
19 2014-07-23 2014-08-06 2014-08-15 -0.0536     18        11        7
20 2012-04-03 2012-04-10 2012-04-26 -0.0517     17         5       12

What I did was I simply wanted to query the table for all drawdowns that were more than 5%, or the 100 biggest drawdowns (though considering that we have about 20 drawdowns over about a decade, it seems the rule is 2 drawdowns over 5% per year, give or take, and this is a pretty volatile strategy). Lastly, I wanted to know the proportion of the time that someone watching the strategy will be feeling the pain of watching the strategy go to those depths, so I took the sum of the “To Trough” column and divided it by the amount of days of the backtest. This is the result:

> sum(dd$"To Trough")/nrow(compare)
[1] 0.3005505

I’m fairly certain some individuals more seasoned than I am would do something different given this information and functionality, and if so, feel free to leave a comment, but this is just a licked-finger-in-the-air calculation I did. So, 30% of the time, whoever is investing real money into this will want to go and grab a few more drinks than usual.

Let’s do the same analysis for the relative performance.

tmp <- rbind(table.AnnualizedReturns(diff), maxDrawdown(diff))
rownames(tmp)[4] <- "Worst Drawdown"
tmp

When running only one set of returns, apparently the last row will just simply be called “4”, so I had to manually rename that row. Here’s the result:

> tmp
                          portfolio.returns
Annualized Return                 0.1033000
Annualized Std Dev                0.2278000
Annualized Sharpe (Rf=0%)         0.4535000
Worst Drawdown                    0.3874082

Far from spectacular, but there it is for what it’s worth.

Now the drawdowns.

dd <- table.Drawdowns(diff, top=100)
dd <- dd[dd$Depth < -.05,]
dd
sum(dd$"To Trough")/nrow(diff)

With the following result:

> dd
         From     Trough         To   Depth Length To Trough Recovery
1  2008-11-21 2009-06-22 2013-04-04 -0.3874   1097       145      952
2  2008-10-28 2008-11-04 2008-11-19 -0.2328     17         6       11
3  2007-11-27 2008-06-12 2008-10-07 -0.1569    218       137       81
4  2013-05-03 2013-06-25 2013-12-26 -0.1386    165        37      128
5  2008-10-13 2008-10-13 2008-10-24 -0.1327     10         1        9
6  2005-06-08 2006-06-28 2006-11-08 -0.1139    360       267       93
7  2004-03-29 2004-05-10 2004-09-20 -0.1102    121        30       91
8  2007-03-08 2007-06-12 2007-11-26 -0.0876    183        67      116
9  2005-02-10 2005-03-28 2005-05-31 -0.0827     76        31       45
10 2014-09-02 2014-09-17 2014-10-14 -0.0613     31        12       19

In short, the spikes in outperformance gave us some pretty…interesting…drawdown statistics, which just essentially meant that the strategy wasn’t roaring at the same exact time that the SPY had its bounce from the bottom. And for interest, my finger in the air pain statistic:

> sum(dd$"To Trough")/nrow(diff)
[1] 0.2689908

So approximately 27% of the time, the strategy is really underperforming its benchmark.

In short, overall, more freebies from Harry Long with a title to attract readers. The strategy is what it is–something that boasts strong absolute returns and definitely outperforms SPY, but isn’t that magical unicorn that won’t burn an investor occasionally as a price for that outperformance in better times. However, the quicker statistics table functionality combined with the more in-depth drawdown analysis is something that I am definitely happy to have stumbled upon.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

An Introduction to Change Points (packages: ecp and BreakoutDetection)

A forewarning, this post is me going out on a limb, to say the least. In fact, it’s a post/project requested from me by Brian Peterson, and it follows a new paper that he’s written on how to thoroughly replicate research papers. While I’ve replicated results from papers before (with FAA and EAA, for instance), this is a first for me in terms of what I’ll be doing here.

In essence, it is a thorough investigation into the paper “Leveraging Cloud Data to Mitigate User Experience from ‘Breaking Bad’”, and follows the process from the aforementioned paper. So, here we go.

*********************

Twitter Breakout Detection Package
Leveraging Cloud Data to Mitigate User Experience From ‘Breaking Bad’

Summary of Paper

Introduction: in a paper detailing the foundation of the breakout detection package (arXiv ID 1411.7955v1), James, Kejariwal, and Matteson demonstrate an algorithm that detects breakouts in twitter’s production-level cloud data. The paper begins by laying the mathematical foundation and motivation for energy statistics, the permutation test, and the E-divisive with medians algorithm, which create a fast way of detecting a shift in median between two nonparametric distributions that is robust to the presence of anomalies. Next, the paper demonstrates a trial run through some of twitter’s production cloud data, and compares the non-parametric E-divisive with medians to an algorithm called PELT. For the third topic, the paper discusses potential applications, one of which is quantitative trading/computational finance. Lastly, the paper states its conclusion, which is the addition of the E-divisive with medians algorithm to the existing literature of change point detection methodologies.

The quantitative and computational methodologies for the paper use a modified variant of energy statistics more resilient against anomalies through the use of robust statistics (viz. median). The idea of energy statistics is to compare the distances of means of two random variables contained within a larger time series. The hypothesis test to determine if this difference is statistically significant is called the permutation test, which permutes data from the two time series a finite number of times to make the process of comparing permuted time series computationally tractable. However, the presence of anomalies, such as in twitter’s production cloud data, would limit the effectiveness of using this process when using simple means. To that end, the paper proposes using the median, and due to the additional computational time resulting from the weaker distribution assumptions to extend the generality of the procedure, the paper devises the E-divisive with medians algorithms, one of which works off of distances between observations, and one works with the medians of the observations themselves (as far as I understand). To summarize, the E-divisive with medians algorithms exist as a way of creating a computationally tractable procedure for determining whether or not a new chunk of time series data is considerably different from the previous through the use of advanced distance statistics robust to anomalies such as those present in twitter’s cloud data.

To compare the performance of the E-divisive with medians algorithms, the paper compares the algorithms to an existing algorithm called PELT (which stands for Pruned Extract Linear Time) in various quantitative metrics, such as “Time To Detect”, meaning the exact moment of the breakout to when the algorithms report it (if at all), along with precision, recall, and the F-measure, defined as the product of precision and recall over their respective sum. Comparing PELT to the E-divisive with medians algorithm showed that the E-divisive algorithm outperformed the PELT algorithm in the majority of data sets. Even when anomalies were either smoothed by taking the rolling median of their neighbors, or by removing them altogether, the E-divisive algorithm still outperformed PELT. Of the variants of the EDM algorithm (EDM head, EDM tail, and EDM-exact), the EDM-tail variant (i.e. the one using the most recent observations) was also quickest to execute. However, due to fewer assumptions about the nature of the underlying generating distributions, the various E-divisive algorithms take longer to execute than the PELT algorithm, with its stronger assumptions, but worse general performance. To summarize, the EDM algorithms outperform PELT in the presence of anomalies, and generally speaking, the EDM-tail variant seems to work best when considering computational running time as well.

The next section dealt with the history and applications of change-point/breakout detection algorithms, in fields such as finance, medical applications, and signal processing. As finance is of a particular interest, the paper acknowledges the ARCH and various flavors of GARCH models, along with the work of James and Matteson in devising a trading strategy based on change-point detection. Applications in genomics to detect cancer exist as well. In any case, the paper cites many sources showing the extension and applications of change-point/breakout detection algorithms, of which finance is one area, especially through work done by Matteson. This will be covered further in the literature review.

To conclude, the paper proposes a new algorithm called the E-divisive with medians, complete with a new statistical permutation test using advanced distance statistics to determine whether or not a time series has had a change in its median. This method makes fewer assumptions about the nature of the underlying distribution than a competitive algorithm, and is robust in the face of anomalies, such as those found in twitter’s production cloud data. This algorithm outperforms a competing algorithm which possessed stronger assumptions about the underlying distribution, detecting a breakout sooner in a time series, even if it took longer to run. The applications of such work range from finance to medical devices, and further beyond. As change-point detection is a technique around which trading strategies can be constructed, it has particular relevance to trading applications.

Statement of Hypothesis

Breakouts can occur in data which does not conform to any known regular distribution, thus rendering techniques that assume a certain distribution less effective. Using the E-divisive with medians algorithm, the paper attempts to predict the presence of breakouts using time series with innovations from no regular distribution as inputs, and if effective, will outperform an existing algorithm that possesses stronger assumptions about distributions. To validate or refute a more general form of this hypothesis, which is the ability of the algorithm to detect breakouts in a timely fashion, this summary test it on the cumulative squared returns of the S&P 500, and compare the analysis created by the breakpoints to the analysis performed by Dr. Robert J. Frey of Keplerian Finance, a former managing director at Renaissance Technologies.

Literature Review

Motivation

A good portion of the practical/applied motivation of this paper stems from the explosion of growth in mobile internet applications, A/B testing, and other web-specific reasons to detect breakouts. For instance, longer loading time on a mobile web page necessarily results in lower revenues. To give another example, machines in the cloud regularly fail.

However, the more salient literature regarding the topic is the literature dealing with the foundations of the mathematical ideas behind the paper.

Key References

Paper 1:

David S. Matteson and Nicholas A. James. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505):334–345, 2013.

Thesis of work: this paper is the original paper for the e-divisive and e-agglomerative algorithms, which are offline, nonparametric methods of detecting change points in time series. Unlike Paper 3, this paper lays out the mathematical assumptions, lemmas, and proofs for a formal and mathematical presentation of the algorithms. Also, it documents performance against the PELT algorithm, presented in Paper 6 and technically documented in Paper 5. This performance compares favorably. The source paper being replicated builds on the exact mathematics presented in this paper, and the subject of this report uses the ecp R package that is the actual implementation/replication of this work to form a comparison for its own innovations.

Paper 2:

M. L. Rizzo and G. J. Sz´ekely. DISCO analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics, 4(2):1034–1055, 2010

Thesis of work: this paper generalizes the ANOVA using distance statistics. This technique aims to find differences among distributions outside their sample means. Through the use of distance statistics, the techniques aim to more generally answer queries about the nature of distributions (EG identical means, but different distributions as a result of different factors). Its applicability to the source paper is that it forms the basis of the ideas for the paper’s divergence measure, as detailed in its second section.

Paper 3:

Nicholas A. James and David S. Matteson. ecp: An R package for nonparametric multiple change point analysis of multivariate data. Technical report, Cornell University, 2013.

Thesis of work: the paper introduces the ecp package which contains the e-agglomerative and e-divisive algorithms for detecting change points in time series in the R statistical programming language (in use on at least one elite trading desk). The e-divisive method recursively partitions a time series and uses a permutation test to determine change points, but it is computationally intensive. The e-agglomerative algorithm allows for inputs from the user for initial time-series segmentation and is a computationally faster algorithm. Unlike most academic papers, this paper also includes examples of data and code in order to facilitate the use of these algorithms. Furthermore, the paper includes applications to real data, such as the companies found in the Dow Jones Industrial Index, further proving the effectiveness of these methods. This paper is important to the topic in question as the E-divisive algorithm created by James and Matteson form the base changepoint detection process for which the paper builds its own innovations for, and visually compares against; furthermore, the source paper restates many of the techniques found in this paper.

Paper 4:

Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. A novel technique for long-term anomaly detection in the cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), June 2014.

Thesis of work: the paper proposes the use of piecewise median and median absolute deviation statistics to detect anomalies in time series. The technique builds upon the ESD (Extreme Studentized Deviate) technique and uses piecewise medians to approximate a long-term trend, before extracting seasonality effects from periods shorter than two weeks. The piecewise median method of anomaly detection has a greater F-measure of detecting anomalies than does the standard STL (seasonality trend loess decomposition) or quantile regression techniques. Furthermore, piecewise median executes more than three times faster. The relevance of this paper to the source paper is that it forms the idea of using robust statistics and building the techniques in the paper upon the median as opposed to the mean.

Paper 5:

Rebecca Killick and Kaylea Haynes. changepoint: An R package for changepoint analysis

Thesis of work: manual for the implementation of the PELT algorithm written by Rebecca Killick and Kaylea Haynes. This package is a competing change-point detection package, mainly focused around the Pruned Extraction Linear Time algorithm, although containing other worse algorithms, such as the segment neighborhoods algorithm. Essentially, it is a computational implementation of the work in Paper 2. Its application toward the source paper is that the paper at hand compares its own methodology against PELT, and often outperforms it.

Paper 6:

Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012

Thesis of work: the paper proposes an algorithm (PELT) that scales linearly in running time with the size of the input time series to detect exact locations of change points. The paper aims to replace both an approximate binary partitioning algorithm, and an optimal segmentation algorithm that doesn’t involve a pruning mechanism to speed up the running time. The paper uses an MLE algorithm at the heart of its dynamic partitioning in order to locate change points. The relevance to the source paper is that through the use of the non-robust MLE procedure, this algorithm is vulnerable to poor performance due to the presence of anomalies/outliers in the data, and thus underperforms the new twitter change point detection methodology which employs robust statistics.

Paper 7:

Wassily Hoeffding. The strong law of large numbers for u-statistics. Institute of Statistics mimeo series, 302, 1961.

Thesis of work: this paper establishes a convergence of the mean of tuples of many random variables to the mean of said random variables, given enough such observations. This paper is a theoretical primer on establishing the above thesis. The mathematics involve use of measure theory and other highly advanced and theoretical manipulations. Its relevance to the source paper is in its use to establish a convergence of an estimated characteristic function.

Similar Work

In terms of financial applications, the papers covering direct applications of change points to financial time series are listed above. Particularly, David Matteson presented his ecp algorithms at R/Finance several years ago, and his work is already in use on at least one professional trading desk. Beyond this, the paper cites works on technical analysis and the classic ARCH and GARCH papers as similar work. However, as this change point algorithm is created to be a batch process, direct comparison with other trend-following (that is, breakout) methods would seem to be a case of apples and oranges, as indicators such as MACD, Donchian channels, and so on, are online methods (meaning they do not have access to the full data set like the e-divisive and the e-divisive with medians algorithms do). However, they are parameterized in terms of their lookback period, and are thus prone to error in terms of inaccurate parameterization resulting from a static lookback value.

In his book Cycle Analytics for Traders, Dr. John Ehlers details an algorithm for computing the dominant cycle of a security—that is, a way to dynamically parameterize the lookback parameter, and if this were to be successfully implemented in R, it may very well allow for improved breakout detection methods than the classic parameterized indicators popularized in the last century.

References With Implementation Hints

Reference 1: Breakout Detection In The Wild

This blog post is a reference contains the actual example included in the R package for the model, written by one of the authors of the source paper. As the data used in the source paper is proprietary twitter production data, and the model is already implemented in the package discussed in this blog post, this makes the package and the included data the go-to source for starting to work with the results presented in the source paper.

Reference 2: Twitter BreakoutDetection R package evaluation

This blog post is that of a blogger altering the default parameters in the model. His analysis of traffic to his blog contains valuable information as to greater flexibility in the use of the R package that is the implementation of the source paper.

Data

The data contained in the source paper comes from proprietary twitter cloud production data. Thus, it is not realistic to obtain a copy of that particular data set. However, one of the source paper’s co-authors, Arun Kejariwal, was so kind as to provide a tutorial, complete with code and sample data, for users to replicate at their convenience. It is this data that we will use for replication.

Building The Model

Stemming from the above, we are fortunate that the results of the source paper have already been implemented in twitter’s released R package, BreakoutDetection. This package has been written by Nicholas A. James, a PhD candidate at Cornell University studying under Dr. David S. Matteson. His page is located here.

In short, all that needs to be done on this end is to apply the model to the aforementioned data.

Validate the Results

To validate the results—that is, to obtain the same results as one of the source paper’s authors, we will execute the code on the data that he posted on his blog post (see Reference 1).

require(devtools)
install_github(repo="BreakoutDetection", username="twitter")
require(BreakoutDetection)

data(Scribe)
res = breakout(Scribe, min.size=24, method='multi', beta=.001, degree=1, plot=TRUE)
res$plot

This is the resulting image, identical from the blog post.

Validation of the Hypothesis

This validation was inspired by the following post:

The Relevance of History

The post was written by Dr. Robert J. Frey, professor of Applied Math and Statistics at Stony Brook University, the head of its Quantitative Finance program, and former managing director at Renaissance Technologies (yes, the Renaissance Technologies founded by Dr. Jim Simons). While the blog is inactive at the moment, I sincerely hope it will become more active again.

Essentially, it uses mathematica to detect changes in the slope of cumulative squared returns, and the final result is a map of spikes, mountains, and plains, the x-axis being time, and the y-axis the annualized standard deviation. Using the more formalized e-divisive and e-divisive with medians algorithms, this analysis will attempt to detect change points, and use the PerformanceAnalytics library to compute the annualized standard deviation from the data of the GSPC returns itself, and output a similarly-formatted plot.

Here’s the code:

require(quantmod)
require(PerformanceAnalytics)

getSymbols("^GSPC", from = "1984-12-25", to = "2013-05-31")
monthlyEp <- endpoints(GSPC, on = "months")
GSPCmoCl <- Cl(GSPC)[monthlyEp,]
GSPCmoRets <- Return.calculate(GSPCmoCl)
GSPCsqRets <- GSPCmoRets*GSPCmoRets
GSPCsqRets <- GSPCsqRets[-1,] #remove first NA as a result of return computation
GSPCcumSqRets <- cumsum(GSPCsqRets)
plot(GSPCcumSqRets)

This results in the following image:

So far, so good. Let’s now try to find the amount of changepoints that Dr. Frey’s graph alludes to.

t1 <- Sys.time()
ECPmonthRes <- e.divisive(X = GSPCsqRets, min.size = 2)
t2 <- Sys.time()
print(t2 - t1)

t1 <- Sys.time()
BDmonthRes <- breakout(Z = GSPCsqRets, min.size = 2, beta=0, degree=1)
t2 <- Sys.time()
print(t2 - t1)

ECPmonthRes$estimates
BDres$loc

With the following results:

> ECPmonthRes$estimates
[1]   1 285 293 342
> BDres$loc
[1] 47 87

In short, two changepoints for each. Far from the 20 or so regimes present in Dr. Frey’s analysis. So, not close to anything that was expected. My intuition tells me that the main reason for this is that these algorithms are data-hungry, and there is too little data for them to do much more than what they have done thus far. So let’s go the other way and use daily data.

dailySqRets <- Return.calculate(Cl(GSPC))*Return.calculate(Cl(GSPC))
dailySqRets <- dailySqRets["1985::"]

plot(cumsum(dailySqRets))

And here’s the new plot:

First, let’s try the e-divisive algorithm from the ecp package to find our changepoints, with a minimum size of 20 days between regimes. (Blog note: this is a process that takes an exceptionally long time. For me, it took more than 2 hours.)

t1 <- Sys.time()
ECPres <- e.divisive(X = dailySqRets, min.size=20)
t2 <- Sys.time()
print(t2 - t1)
Time difference of 2.214813 hours

With the following results:

index(dailySqRets)[ECPres$estimates]
 [1] "1985-01-02" "1987-10-14" "1987-11-11" "1998-07-21" "2002-07-01" "2003-07-28" "2008-09-15" "2008-12-09"
 [9] "2009-06-02" NA   

The first and last are merely the endpoints of the data. So essentially, it encapsulates Black Monday and the crisis, among other things. Let’s look at how the algorithm split the volatility regimes. For this, we will use the xtsExtra package for its plotting functionality (thanks to Ross Bennett for the work he did in implementing it).

require(xtsExtra)
plot(cumsum(dailySqRets))
xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

With the resulting plot:

In this case, the e-divisive algorithm from the ecp package does a pretty great job segmenting the various volatility regimes, as can be thought of roughly as the slope of the cumulative squared returns. The algorithm’s ability to accurately cluster the Black Monday events, along with the financial crisis, shows its industrial-strength applicability. How does this look on the price graph?

plot(Cl(GSPC))
xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

In this case, Black Monday is clearly visible, along with the end of the Clinton bull run through the dot-com bust, the consolidation, the run-up to the crisis, the crisis itself, the consolidation, and the new bull market.

Note that the presence of a new volatility regime may not necessarily signify a market top or bottom, but the volatility regime detection seems to have worked very well in this case.

For comparison, let’s examine the e-divisive with medians algorithm.

t1 <- Sys.time()
BDres <- breakout(Z = dailySqRets, min.size = 20, beta=0, degree=1)
t2 <- Sys.time()
print(t2-t1)

BDres$loc
index(dailySqRets)[BDres$loc]

With the following result:

Time difference of 2.900167 secs
> BDres$loc
[1] 5978
> BDres$loc
[1] 5978
> index(dailySqRets)[BDres$loc]
[1] "2008-09-12"

So while the algorithm is a lot faster, its volatility regime detection, it only sees the crisis as the one major change point. Beyond that, to my understanding, the e-divisive with medians algorithm may be “too robust” (even without any penalization) against anomalies (after all, the median is robust to changes in 50% of the data). In short, I think that while it clearly has applications, such as twitter cloud production data, it doesn’t seem to obtain a result that’s in the ballpark of two other separate procedures.

Lastly, let’s try and create a plot similar to Dr. Frey’s, with spikes, mountains, and plains.

require(PerformanceAnalytics)
GSPCrets <- Return.calculate(Cl(GSPC))
GSPCrets <- GSPCrets["1985::"]
GSPCrets$regime <- ECPres$cluster
GSPCrets$annVol <- NA

for(i in unique(ECPres$cluster)) {
  regime <- GSPCrets[GSPCrets$regime==i,]
  annVol <- StdDev.annualized(regime[,1])
  GSPCrets$annVol[GSPCrets$regime==i,] <- annVol
}

plot(GSPCrets$annVol, ylim=c(0, max(GSPCrets$annVol)), main="GSPC volatility regimes, 1985 to 2013-05")

With the corresponding image, inspired by Dr. Robert Frey:

This concludes the research replication.

********************************

Whew. Done. While I gained some understanding of what change points are useful for, I won’t profess to be an expert on them (some of the math involved uses PhD-level mathematics such as characteristic functions that I never learned). However, it was definitely interesting pulling together several different ideas and uniting them under a rigorous process.

Special thanks for this blog post:

Brian Peterson, for the process paper and putting a formal structure to the research replication process (and requesting this post).
Robert J. Frey, for the “volatility landscape” idea that I could objectively point to as an objective benchmark to validate the hypothesis of the paper.
David S. Matteson, for the ecp package.
Nicholas A. James, for the work done in the BreakoutDetection package (and clarifying some of its functionality for me).
Arun Kejariwal, for the tutorial on using the BreakoutDetection package.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

An Update On EAA and a Volatility Strategy

Again, before starting this post, I’d like to inform readers that the book Quantitative Trading With R, written by Harry Georgakopoulos, with contributions from myself, is now available for order on Amazon. Already, it has garnered a pair of five-star reviews, and it deals not only with quantstrat, but with aspects such as spread trading, high frequency data, and options. I highly recommend it.

So, first things first, I want to inform everyone that EAA (that is, Elastic Asset Allocation, the new algorithm recently released by Dr. Wouter Keller a couple of weeks ago) is now in my IKTrading package. I made some modifications to deal with incongruous security starting dates (that is, handled NA momentum, and so on, similarly to the process in FAA). Again, no particular guarantees, but at this point, I think the algorithm won’t regularly break (but I may be missing some edge case, so feedback is always appreciated). Also, after thinking about it a bit more, I don’t foresee EAA as it stands being able to make use of a conditional correlation algorithm, since rather than using correlation simply for security selection, it uses correlations to make weighting decisions, which raises the question of what the correlation value of the first security would be. 0? -1? Ideas on how to address this are always welcome, since applying conditional correlation outside of a ranking context is a topic now of interest to me.

Furthermore, TrendXplorer has recently posted his own post on EAA after seeing mine on his blog. It is *very* comprehensive, and for those that are more inclined towards AmiBroker, you’ll be in Nirvana. It can be found here. Also, it seems he has done some work with another SeekingAlpha contributor named Cliff Smith (and seems to have worked hand in hand with him), and thus, had a far more positive experience than I did going solo replicating Harry Long’s strategies (or, if some of you may like, marketing materials). TrendXplorer did some work with a strategy called QTS, which I hope I’ll be able to cover in the near future. That can all be found here. So, I’d like to formally extend thanks to TrendXplorer for the work he has done with both EAA, and also pointing me towards yet another viable asset allocation strategy.

In terms of my own updated EAA, to test it out, I added Tesla Motors to the original seven securities. So let’s look at the as-of-now-current EAA.

"EAA" <- function(monthlyPrices, wR=1, wV=0, wC=.5, wS=2, errorJitter=1e-6, 
                cashAsset=NULL, bestN=1+ceiling(sqrt(ncol(monthlyPrices))),
                enableCrashProtection = TRUE, returnWeights=FALSE, monthlyRiskFree=NULL) {
  returns <- Return.calculate(monthlyPrices)
  returns <- returns[-1,] #return calculation uses one observation
  if(!is.null(monthlyRiskFree)) {
    returnsRF <- Return.calculate(monthlyRiskFree)
    returnsRF <- returnsRF[-1,]
  }
  
  if(is.null(cashAsset)) {
    returns$zeroes <- 0
    cashAsset <- "zeroes"
    warning("No cash security specified. Recommended to use one of: quandClean('CHRIS/CME_US'), SHY, or VFISX. 
            Using vector of zeroes instead.")
  }
  
  cashCol <- grep(cashAsset, colnames(returns))
  
  weights <- list()
  for(i in 1:(nrow(returns)-11)) {
    returnsData <- returns[i:(i+11),] #each chunk will be 12 months of returns data
    #per-month mean of cumulative returns of 1, 3, 6, and 12 month periods
    periodReturn <- ((returnsData[12,] + Return.cumulative(returnsData[10:12,]) + 
                      Return.cumulative(returnsData[7:12,]) + Return.cumulative(returnsData)))/22
    
    if(!is.null(monthlyRiskFree)) {
      rfData <- returnsRF[i:(i+11),]
      rfReturn <- ((rfData[12,] + Return.cumulative(rfData[10:12,]) + 
                    Return.cumulative(rfData[7:12,]) + Return.cumulative(rfData)))/22
      periodReturn <- periodReturn - as.numeric(rfReturn)
    }
    
    vols <- StdDev.annualized(returnsData) 
    mktIndex <- xts(rowMeans(returnsData, na.rm=TRUE), order.by=index(returnsData)) #equal weight returns of universe
    cors <- cor(returnsData, mktIndex) #correlations to market index
    
    weightedRets <- periodReturn ^ wR
    weightedCors <- (1 - as.numeric(cors)) ^ wC
    weightedVols <- (vols + errorJitter) ^ wV
    wS <- wS + errorJitter
    
    z <- (weightedRets * weightedCors / weightedVols) ^ wS #compute z_i and zero out negative returns
    z[periodReturn < 0] <- 0
    crashProtection <- sum(z==0, na.rm=TRUE)/sum(!is.na(z)) #compute crash protection cash cushion
    
    orderedZ <- sort(as.numeric(z), decreasing=TRUE)
    selectedSecurities <- z >= orderedZ[bestN]
    preNormalizedWeights <- z*selectedSecurities #select top N securities, keeping z_i scores
    periodWeights <- preNormalizedWeights/sum(preNormalizedWeights, na.rm=TRUE) #normalize
    if (enableCrashProtection) {
      periodWeights <- periodWeights * (1-crashProtection) #CP rule
    }
    periodWeights[is.na(periodWeights)] <- 0
    weights[[i]] <- periodWeights
  }
  
  weights <- do.call(rbind, weights)
  weights[, cashCol] <- weights[, cashCol] + 1-rowSums(weights) #add to risk-free asset all non-invested weight
  strategyReturns <- Return.rebalancing(R = returns, weights = weights) #compute strategy returns
  if(returnWeights) {
    return(list(weights, strategyReturns))
  } else {
    return(strategyReturns)
  }
}

Essentially, little changed aside from some lines dealing with NAs (AKA securities that were not yet around at the time whose prices are given as NAs).

To test out whether the algorithm worked, I added TSLA to see if it didn’t break the code. Here is the new test code.

require(quantmod)
require(PerformanceAnalytics)

symbols <- c("VTSMX", "FDIVX", "VEIEX", "VBMFX", "VFISX", "VGSIX", "QRAAX", "TSLA")

getSymbols(symbols, from="1990-01-01")
prices <- list()
for(i in 1:length(symbols)) {
  prices[[i]] <- Ad(get(symbols[i]))  
}
prices <- do.call(cbind, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
ep <- endpoints(prices, "months")
prices <- prices[ep,]
prices <- prices["1997-03::"]

getSymbols("^IRX", from="1990-01-01")
dailyYield <- (1+(Cl(IRX)/100))^(1/252) - 1
threeMoPrice <- cumprod(1+dailyYield)
threeMoPrice <- threeMoPrice["1997-03::"]
threeMoPrice <- threeMoPrice[endpoints(threeMoPrice, "months"),]

offensive <- EAA(prices, cashAsset="VBMFX", bestN=3)
defensive <- EAA(prices, cashAsset="VBMFX", bestN=3, wS=.5, wC=1)
offRF <- EAA(prices, cashAsset="VBMFX", bestN=3, monthlyRiskFree = threeMoPrice)
defRF <- EAA(prices, cashAsset="VBMFX", bestN=3, wS=.5, wC=1, monthlyRiskFree = threeMoPrice)
compare <- cbind(offensive, defensive, offRF, defRF)
colnames(compare) <- c("Offensive", "Defensive", "OffRF", "DefRF")
stats <- rbind(Return.annualized(compare)*100, StdDev.annualized(compare)*100, maxDrawdown(compare)*100, SharpeRatio.annualized(compare))
rownames(stats)[3] <- "Worst Drawdown"
charts.PerformanceSummary(compare)
stats

With the following statistics table and equity curve:

> stats
                                 Offensive Defensive      OffRF     DefRF
Annualized Return               17.6174693 13.805683 16.7376777 13.709368
Annualized Standard Deviation   22.7328695 13.765444 22.3854966 13.504313
Worst Drawdown                  25.3534015 12.135310 25.3559118 12.146654
Annualized Sharpe Ratio (Rf=0%)  0.7749778  1.002923  0.7477019  1.015184

Essentially, TSLA — a high momentum, high-volatility stock causes some consternation in the offensive variant of the algorithm. Let’s look at the weight statistics of TSLA when it was in the portfolio.

test <- EAA(prices, cashAsset = "VBMFX", bestN=3, returnWeights=TRUE)
weights <- test[[1]]
summary(weights$TSLA[weights$TSLA > 0])

With the results:

    Index                 TSLA        
 Min.   :2011-07-29   Min.   :0.01614  
 1st Qu.:2012-09-14   1st Qu.:0.32345  
 Median :2013-07-31   Median :0.48542  
 Mean   :2013-06-20   Mean   :0.51415  
 3rd Qu.:2014-04-15   3rd Qu.:0.75631  
 Max.   :2014-12-31   Max.   :0.95793  

Also, to be clear, R’s summary function was not created with xts type objects in mind, so the Index statistics are just pure nonsense (R is trying to do summary statistics on the underlying numerical values of the date index — they have no relation to the TSLA weights), so if you ever call summary on anything in an xts, be aware that it isn’t actually providing you the dates of the corresponding weights (if they exist at all — E.G. the mean of the weights isn’t an actual weight at any point in time).

In any case, it seems that the offensive variant of the algorithm is susceptible to creating portfolios that are very poorly diversified, since the offensive variant doesn’t place any weight on security volatility–simply correlation. So if there was a very volatile instrument that was on a roaring trend, EAA would tell you to just place your entire portfolio in that one instrument–which of course, can be the correct thing to do if you know for certain that said trend will continue, until, of course, it doesn’t.

I’m sure there are still some methods to account for instruments of wildly different risk/return profiles, even without the need of additional code, by varying the parameters. I just wanted to demonstrate the need to be aware of this phenomenon, which I happened upon simply by testing the portfolio for incongruous starting dates and just so happened to pick a “hot topic” stock.

Last (for this post), I’d like to make readers aware that the blogger Volatility Made Simple has created a variant of a strategy I had written about earlier (again, thanks to Mr. Helmuth Vollmeier for providing the initial foundation), in which he mixed signals from the three variants I had found to be in stable regions, and I’m really happy he has done so, as he’s one of the first people who have explicitly extended my work.

Unfortunately, said strategy is currently in drawdown. However, looking at its drawdown curve against that of XIV itself, it seems that volatility has been doing crazy things lately, and the drawdown has been worse in the past. I am concerned, however, that it may be a strategy prone to overfitting, and it’s a constant reminder that there is still more to learn, and more techniques to use to convince oneself that a backtest isn’t just an overfit, data-mined, sample-dependent illusion with good marketing that will break down immediately upon looking at a larger sample. However, as I did not originate the strategy myself, I’d at least like to hope that whoever was the first person who came up with the VXV/VXMT ratio idea had some good rationale for the strategy to begin with.

In the immediate future, I’ll be looking into change point analysis and twitter’s new breakout detection package.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.