Several readers, upon seeing the risk and return ratio along with other statistics in the previous post stated that the result may have been the result of data mining/over-optimization/curve-fitting/overfitting, or otherwise bad practice of creating an amazing equity curve whose performance will decay out of sample.

Fortunately, there’s a way to test that assertion. In their book “Trading Systems: A New Approach to System Development and Portfolio Optimization”, Urban Jaekle and Emilio Tomasini use the concept of the “stable region” to demonstrate a way of visualizing whether or not a parameter specification is indeed overfit. The idea of a stable region is that going forward, how robust is a parameter specification to slight changes? If the system just happened to find one good small point in a sea of losers, the strategy is likely to fail going forward. However, if small changes in the parameter specifications still result in profitable configurations, then the chosen parameter set is a valid configuration.

As Frank’s trading strategy only has two parameters (standard deviation computation period, aka runSD for the R function, and the SMA period), rather than make line graphs, I decided to do a brute force grid search just to see other configurations, and plotted the results in the form of heatmaps.

Here’s the modified script for the computations (no parallel syntax in use for the sake of simplicity):

download("https://dl.dropboxusercontent.com/s/jk6der1s5lxtcfy/XIVlong.TXT", destfile="longXIV.txt") download("https://dl.dropboxusercontent.com/s/950x55x7jtm9x2q/VXXlong.TXT", destfile="longVXX.txt") #requires downloader package xiv <- xts(read.zoo("longXIV.txt", format="%Y-%m-%d", sep=",", header=TRUE)) vxx <- xts(read.zoo("longVXX.txt", format="%Y-%m-%d", sep=",", header=TRUE)) vxmt <- xts(read.zoo("vxmtdailyprices.csv", format="%m/%d/%Y", sep=",", header=TRUE)) getSymbols("^VIX", from="2004-03-29") vixvxmt <- merge(Cl(VIX), Cl(vxmt)) vixvxmt[is.na(vixvxmt[,2]),2] <- vixvxmt[is.na(vixvxmt[,2]),1] xivRets <- Return.calculate(Cl(xiv)) vxxRets <- Return.calculate(Cl(vxx)) getSymbols("^GSPC", from="1990-01-01") spyRets <- diff(log(Cl(GSPC))) t1 <- Sys.time() MARmatrix <- list() SharpeMatrix <- list() for(i in 2:21) { smaMAR <- list() smaSharpe <- list() for(j in 2:21){ spyVol <- runSD(spyRets, n=i) annSpyVol <- spyVol*100*sqrt(252) vols <- merge(vixvxmt[,2], annSpyVol, join='inner') vols$smaDiff <- SMA(vols[,1] - vols[,2], n=j) vols$signal <- vols$smaDiff > 0 vols$signal <- lag(vols$signal, k = 1) stratRets <- vols$signal*xivRets + (1-vols$signal)*vxxRets #charts.PerformanceSummary(stratRets) #stratRets[is.na(stratRets)] <- 0 #plot(log(cumprod(1+stratRets))) stats <- data.frame(cbind(Return.annualized(stratRets)*100, maxDrawdown(stratRets)*100, SharpeRatio.annualized(stratRets))) colnames(stats) <- c("Annualized Return", "Max Drawdown", "Annualized Sharpe") MAR <- as.numeric(stats[1])/as.numeric(stats[2]) smaMAR[[j-1]] <- MAR smaSharpe[[j-1]] <- stats[,3] } rm(vols) smaMAR <- do.call(c, smaMAR) smaSharpe <- do.call(c, smaSharpe) MARmatrix[[i-1]] <- smaMAR SharpeMatrix[[i-1]] <- smaSharpe } t2 <- Sys.time() print(t2-t1)

Essentially, just wrap the previous script in a nested for loop over the two parameters.

I chose GGplot2 to plot the heatmaps for more control with coloring.

Here’s the heatmap for the MAR ratio (that is, returns over max drawdown):

MARmatrix <- do.call(cbind, MARmatrix) rownames(MARmatrix) <- paste0("SMA", c(2:21)) colnames(MARmatrix) <- paste0("runSD", c(2:21)) MARlong <- melt(MARmatrix) colnames(MARlong) <- c("SMA", "runSD", "MAR") MARlong$SMA <- as.numeric(gsub("SMA", "", MARlong$SMA)) MARlong$runSD <- as.numeric(gsub("runSD", "", MARlong$runSD)) MARlong$scaleMAR <- scale(MARlong$MAR) ggplot(MARlong, aes(x=SMA, y=runSD, fill=scaleMAR))+geom_tile()+scale_fill_gradient2(high="skyblue", mid="blue", low="red")

Here’s the result:

Immediately, we start to see some answers to questions regarding overfitting. First off, is the parameter set published by TradingTheOdds optimized? Yes. In fact, not only is it optimized, it’s by far and away the best value on the heatmap. However, when discussing overfitting, curve-fitting, and the like, the question to ask isn’t “is this the best parameter set available”, but rather “is the parameter set in a stable region?” The answer, in my opinion to that, is yes, as noted by the differing values of the SMA for the 2-day sample standard deviation. Note that this quantity, due to being the sample standard deviation, is actually the square root of the two squared residuals of that time period.

Here are the MAR values for those configurations:

> MARmatrix[1:10,1] SMA2 SMA3 SMA4 SMA5 SMA6 SMA7 SMA8 SMA9 SMA10 SMA11 2.471094 2.418934 2.067463 3.027450 2.596087 2.209904 2.466055 1.394324 1.860967 1.650588

In this case, not only is the region stable, but the MAR values are all above 2 until the SMA9 value.

Furthermore, note that aside from the stable region of the 2-day sample standard deviation, a stable region using a standard deviation of ten days with less smoothing from the SMA (because there’s already an average inherent in the sample standard deviation) also exists. Let’s examine those values.

> MARmatrix[2:5, 9:16] runSD10 runSD11 runSD12 runSD13 runSD14 runSD15 runSD16 runSD17 SMA3 1.997457 2.035746 1.807391 1.713263 1.803983 1.994437 1.695406 1.0685859 SMA4 2.167992 2.034468 1.692622 1.778265 1.828703 1.752648 1.558279 1.1782665 SMA5 1.504217 1.757291 1.742978 1.963649 1.923729 1.662687 1.248936 1.0837615 SMA6 1.695616 1.978413 2.004710 1.891676 1.497672 1.471754 1.194853 0.9326545

Apparently, a standard deviation between 2 and 3 weeks with minimal SMA smoothing also produced some results comparable to the 2-day variant.

Off to the northeast of the plot, using longer periods for the parameters simply causes the risk-to-reward performance to drop steeply. This is essentially an illustration of the detriments of lag.

Finally, there’s a small rough patch between the two aforementioned stable regions. Here’s the data for that.

> MARmatrix[1:5, 4:8] runSD5 runSD6 runSD7 runSD8 runSD9 SMA2 1.928716 1.5825265 1.6624751 1.033216 1.245461 SMA3 1.528882 1.5257165 1.2348663 1.364103 1.510653 SMA4 1.419722 0.9497827 0.8491229 1.227064 1.396193 SMA5 1.023895 1.0630939 1.3632697 1.547222 1.465033 SMA6 1.128575 1.3793244 1.4085513 1.440324 1.964293

As you can see, there are some patches where the MAR is below 1, and many where it’s below 1.5. All of these are pretty detached from the stable regions.

Let’s repeat this process with the Sharpe Ratio heatmap.

SharpeMatrix <- do.call(cbind, SharpeMatrix) rownames(SharpeMatrix) <- paste0("SMA", c(2:21)) colnames(SharpeMatrix) <- paste0("runSD", c(2:21)) sharpeLong <- melt(SharpeMatrix) colnames(sharpeLong) <- c("SMA", "runSD", "Sharpe") sharpeLong$SMA <- as.numeric(gsub("SMA", "", sharpeLong$SMA)) sharpeLong$runSD <- as.numeric(gsub("runSD", "", sharpeLong$runSD)) ggplot(sharpeLong, aes(x=SMA, y=runSD, fill=Sharpe))+geom_tile()+ scale_fill_gradient2(high="skyblue", mid="blue", low="darkred", midpoint=1.5)

And the result:

Again, the TradingTheOdds parameter configuration lights up, but among a region of strong configurations. This time, we can see that in comparison to the rest of the heatmap, the northern stable region seems to have become clustered around the 10-day standard deviation (or 11) with SMAs of 2, 3, and 4. The regions to the northeast are also more subdued by comparison, with the Sharpe ratio bottoming out around 1.

Let’s look at the numerical values again for the same regions.

Two-day standard deviation region:

> SharpeMatrix[1:10,1] SMA2 SMA3 SMA4 SMA5 SMA6 SMA7 SMA8 SMA9 SMA10 SMA11 1.972256 2.210515 2.243040 2.496178 1.975748 1.965730 1.967022 1.510652 1.963970 1.778401

Again, numbers the likes of which I myself haven’t been able to achieve with more conventional strategies, and numbers the likes of which I haven’t really seen anywhere for anything on daily data. So either the strategy is fantastic, or something is terribly wrong outside the scope of the parameter optimization.

Two week standard deviation region:

> SharpeMatrix[1:5, 9:16] runSD10 runSD11 runSD12 runSD13 runSD14 runSD15 runSD16 runSD17 SMA2 1.902430 1.934403 1.687430 1.725751 1.524354 1.683608 1.719378 1.506361 SMA3 1.749710 1.758602 1.560260 1.580278 1.609211 1.722226 1.535830 1.271252 SMA4 1.915628 1.757037 1.560983 1.585787 1.630961 1.512211 1.433255 1.331697 SMA5 1.684540 1.620641 1.607461 1.752090 1.660533 1.500787 1.359043 1.276761 SMA6 1.735760 1.765137 1.788670 1.687369 1.507831 1.481652 1.318751 1.197707

Again, pretty outstanding numbers.

The rough patch:

> SharpeMatrix[1:5, 4:8] runSD5 runSD6 runSD7 runSD8 runSD9 SMA2 1.905192 1.650921 1.667556 1.388061 1.454764 SMA3 1.495310 1.399240 1.378993 1.527004 1.661142 SMA4 1.591010 1.109749 1.041914 1.411985 1.538603 SMA5 1.288419 1.277330 1.555817 1.753903 1.685827 SMA6 1.278301 1.390989 1.569666 1.650900 1.777006

All Sharpe ratios higher than 1, though some below 1.5

So, to conclude this post:

Was the replication using optimized parameters? Yes. However, those optimized parameters were found within a stable (and even strong) region. Furthermore, it isn’t as though the strategy exhibits poor risk-to-return metrics beyond those regions, either. Aside from raising the lookback period on both the moving average and the standard deviation to levels that no longer resemble the original replication, performance was solid to stellar.

Does this necessarily mean that there is nothing wrong with the strategy? No. It could be that the performance is an artifact of “observe the close, enter at the close” optimistic execution assumptions. For instance, quantstrat (the go-to backtest engine in R for more trading-oriented statistics) uses a next-bar execution method that defaults on the *next* day’s close (so if you look back over my quantstrat posts, I use prefer=”open” so as to get the open of the next bar, instead of its close). It could also be that VXMT itself is an instrument that isn’t very well known in the public sphere, either, seeing as how Yahoo finance barely has any data on it. Lastly, it could simply be the fact that although the risk to reward ratios seem amazing, many investors/mutual fund managers/etc. probably don’t want to think “I’m down 40-60% from my peak”, even though it’s arguably easier to adjust a strategy with a good reward to risk ratio with excess risk by adding cash (to use a cooking analogy, think about your favorite spice. Good in small quantities.), than it is to go and find leverage for a good reward to risk strategy with very small returns (not to mention incurring all the other risks that come with leverage to begin with, such as a 50% drawdown wiping out an account leveraged two to one).

However, to address the question of overfitting, through a modified technique from Jaekle and Tomasini (2009), these are the results I found.

Thanks for reading.

Note: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

Pingback: The Whole Street’s Daily Wrap for 11/19/2014 | The Whole Street

Ilya, great post !

Just a few thoughts regarding Optimization (not in a general sense only in the special case of this VIX strategy )

I agree with you that the posting from [“tradingtheodds”] (http://www.tradingtheodds.com/2014/11/ddns-volatility-risk-premium-strategy-revisited-2/)might be overly optimistic, but I think this is not the point. I trade a variation of the strategy since more than a year and you can play around – and download the file – by pointing your browser to [my app](https://alphaminer.shinyapps.io/VolaStrat/). The strategy employs a SINGLE variable (the SMA of a ratio of 2 points on the term structure) . The strategy is robust over the whole range. Even if you want there is almost no way to get a mediocre performance ! This gets me to my point. There must be something other than simple Optimization in play regarding the many VIX (VXX/XIV) strategies – all of them with excellent backtested results – that pop up all over the investing/quant blogs.

To put the results in perspective: Out of the 10 years charted – because of available price history, understanding of the products and liquidity – in reality only the last 2 or 3 could have been traded in realtime. ( and the returns are probably coming down already ).

I think the reason why this strategy has worked so well in the past is because this is not a kind of inefficiency of the markets but a real risk premium! Future returns will depend on the shrinkage/expansion of this premium. If the cake is gone ( or the pie much smaller so to speak ) , it’s gone ! No clever optimization will change this fact. To “harvest” the premium I think the simplest strategies are the most efficient, because by adding to many variables and making it overly complicated you risk not getting a piece of the cake even while it is there :-).

A side note concerning the entry : an entry on the NEXT close is even more profitable than entering on the close as you can check with [the app](https://alphaminer.shinyapps.io/VolaStrat/) . ( there is some short term mean-reversion)

Nice validation of parameter sensitivity! I’ve done this in my own work (brute force loop through the parameters) and some strategies do not hold up. I appreciate the thoroughness. Thanks!

Finally a good article on the subject… I think the stable region concept to be vital for a real effective trading system development process too.

http://nightlypatterns.wordpress.com

Ilya, great post !

Just a few thoughts regarding Optimization (not in a general sense only in the special case of this VIX strategy )

I agree with you that the posting from [“tradingtheodds”] (http://www.tradingtheodds.com/2014/11/ddns-volatility-risk-premium-strategy-revisited-2/)might be overly optimistic, but I think this is not the point. I trade a variation of the strategy since more than a year and you can play around – and download the file – by pointing your browser to [my app](https://alphaminer.shinyapps.io/VolaStrat/). The strategy employs a SINGLE variable (the SMA of a ratio of 2 points on the term structure) . The strategy is robust over the whole range. Even if you want there is almost no way to get a mediocre performance ! This gets me to my point. There must be something other than simple Optimization in play regarding the many VIX (VXX/XIV) strategies – all of them with excellent backtested results – that pop up all over the investing/quant blogs.

To put the results in perspective: Out of the 10 years charted – because of available price history, understanding of the products and liquidity – in reality only the last 2 or 3 could have been traded in realtime. ( and the returns are probably coming down already ).

I think the reason why this strategy has worked so well in the past is because this is not a kind of inefficiency of the markets but a real risk premium! Future returns will depend on the shrinkage/expansion of this premium. If the cake is gone ( or the pie much smaller so to speak ) , it’s gone ! No clever optimization will change this fact. To “harvest” the premium I think the simplest strategies are the most efficient, because by adding to many variables and making it overly complicated you risk not getting a piece of the cake even while it is there :-).

A side note concerning the entry : an entry on the NEXT close is even more profitable than entering on the close as you can check with [the app](https://alphaminer.shinyapps.io/VolaStrat/) . ( there is some short term mean-reversion)

Mr. Vollmeier,

What exactly is that strategy you linked me to in the app? Considering how well it performs on longer MA periods, I’d like to replicate it.

-Ilya

Ilya, I will email you this weekend as I am quite busy today. In the meantime you can try to find out the answer yourself as if you read my comment carefully and download the file it’s not too difficult to find out 🙂

It’s dubious to say “optimized parameters were found within a stable (and even strong) region.” The region may not be stable and the peaks may not even be statistically significant.

You are looking at cross-sectional stability. But remember that strategies that are close together in your heatmap are very highly correlated since they have many trades in common. So if one square has high returns the squares around it will too. That does NOT make for stability. The high returns in a region may be due to a lucky couple of months – that is not stability. Also you omitted a bootstrap analysis which will determine if the peaks in the heatmap are statistically significant or not.

You need to look at time series stability – do the same parameters produce the best returns over all periods of time? If they do THEN you have robustness.

Finayy, what is runSD for n=2? It’s just the difference between yesterday’s and today’s returns. That’s an extremely noisy quantity. Trading on that is out of my realm of expertise because it is subject to market microstructure effects, trading hours effects, and is pretty close to day trading. Then there are market frictions on top of that. I doubt that any of the gains are due to roll yield or the VRP or anything discussed in my paper. It’s more to do with S&P 500 serial correlation effects. If you want to make money from those there are (I’m guessing) possibly better ways of doing that.

Tony,

I believe this is why Frank used a moving average for this quantity. As you can see on the heatmap, as you move to the left of SMA5, the MAR decreases (though it’s still strong).

With bootstrapping, do you mean simply to do a random repeated drawing of the squares on the heatmap?

Regarding time series stability, that’s actually something that I don’t remember Jaekle and Tomasini formally specifying. But come to think of it, that’s actually a very good point. I actually haven’t considered that since I generally don’t like to over-optimize my parameters, but simply go with round numbers. (EG short term MA crossover would be say, a 10/50, while a medium term might be a 50/100, even though I know for certain those are probably not the best values to use).

Come to think of it, I really, *really* like the idea of the time-series returns (or MAR, or whatever else) comparison. What would you say would be the proper method of looking at that? Monthly-aggregated cross-sectional ranks?

Do you have any strategies like Harry long’s articles where you buy & hold or short a combination of ETFs?

If by “buy and hold, rebalance at such and such a time”, check out my posts on Flexible Asset Allocation (which will get an update in the very near future).

Pingback: A New Volatility Strategy, And A Heuristic For Analyzing Robustness | QuantStrat TradeR

Pingback: Backtesting Introduction - Bespoke Options | Bespoke Options

Pingback: Robustness Testing – Volatility Strategy – Time Series Bootstrapping – Quantitative Analysis And Back Testing