A Filter Selection Method Inspired From Statistics

This post will demonstrate a method to create an ensemble filter based on a trade-off between smoothness and responsiveness, two properties looked for in a filter. An ideal filter would both be responsive to price action so as to not hold incorrect positions, while also be smooth, so as to not incur false signals and unnecessary transaction costs.

So, ever since my volatility trading strategy, using three very naive filters (all SMAs) completely missed a 27% month in XIV, I’ve decided to try and improve ways to create better indicators in trend following. Now, under the realization that there can potentially be tons of complex filters in existence, I decided instead to focus on a way to create ensemble filters, by using an analogy from statistics/machine learning.

In static data analysis, for a regression or classification task, there is a trade-off between bias and variance. In a nutshell, variance is bad because of the possibility of overfitting on a few irregular observations, and bias is bad because of the possibility of underfitting legitimate data. Similarly, with filtering time series, there are similar concerns, except bias is called lag, and variance can be thought of as a “whipsawing” indicator. Essentially, an ideal indicator would move quickly with the data, while at the same time, not possess a myriad of small bumps-and-reverses along the way, which may send false signals to a trading strategy.

So, here’s how my simple algorithm works:

The inputs to the function are the following:

A) The time series of the data you’re trying to filter
B) A collection of candidate filters
C) A period over which to measure smoothness and responsiveness, defined as the square root of the n-day EMA (2/(n+1) convention) of the following:
a) Responsiveness: the squared quantity of price/filter – 1
b) Smoothness: the squared quantity of filter(t)/filter(t-1) – 1 (aka R’s return.calculate) function
D) A conviction factor, to which power the errors will be raised. This should probably be between .5 and 3
E) A vector that defines the emphasis on smoothness (vs. emphasis on responsiveness), which should range from 0 to 1.

Here’s the code:

require(TTR)
require(quantmod)

getSymbols('SPY', from = '1990-01-01')

smas <- list()
for(i in 2:250) {
  smas[[i]] <- SMA(Ad(SPY), n = i)
}
smas <- do.call(cbind, smas)

xtsApply <- function(x, FUN, n, ...) {
  out <- xts(apply(x, 2, FUN, n = n, ...), order.by=index(x))
  return(out)
}

sumIsNa <- function(x){
  return(sum(is.na(x)))
}

This gets SPY data, and creates two utility functions–xtsApply, which is simply a column-based apply that replaces the original index that using a column-wise apply discards, and sumIsNa, which I use later for counting the numbers of NAs in a given row. It also creates my candidate filters, which, to keep things simple, are just SMAs 2-250.

Here’s the actual code of the function, with comments in the code itself to better explain the process from a technical level (for those still unfamiliar with R, look for the hashtags):

ensembleFilter <- function(data, filters, n = 20, conviction = 1, emphasisSmooth = .51) {
  
  # smoothness error
  filtRets <- Return.calculate(filters)
  sqFiltRets <- filtRets * filtRets * 100 #multiply by 100 to prevent instability
  smoothnessError <- sqrt(xtsApply(sqFiltRets, EMA, n = n))
  
  # responsiveness error
  repX <- xts(matrix(data, nrow = nrow(filters), ncol=ncol(filters)), 
              order.by = index(filters))
  dataFilterReturns <- repX/filters - 1
  sqDataFilterQuotient <- dataFilterReturns * dataFilterReturns * 100 #multiply by 100 to prevent instability
  responseError <- sqrt(xtsApply(sqDataFilterQuotient, EMA, n = n))
  
  # place smoothness and responsiveness errors on same notional quantities
  meanSmoothError <- rowMeans(smoothnessError)
  meanResponseError <- rowMeans(responseError)
  ratio <- meanSmoothError/meanResponseError
  ratio <- xts(matrix(ratio, nrow=nrow(filters), ncol=ncol(filters)),
               order.by=index(filters))
  responseError <- responseError * ratio
  
  # for each term in emphasisSmooth, create a separate filter
  ensembleFilters <- list()
  for(term in emphasisSmooth) {
    
    # compute total errors, raise them to a conviction power, find the normalized inverse
    totalError <- smoothnessError * term + responseError * (1-term)
    totalError <- totalError ^ conviction
    invTotalError <- 1/totalError
    normInvError <- invTotalError/rowSums(invTotalError)
    
    # ensemble filter is the sum of candidate filters in proportion
    # to the inverse of their total error
    tmp <- xts(rowSums(filters * normInvError), order.by=index(data))
    
    #NA out time in which one or more filters were NA
    initialNAs <- apply(filters, 1, sumIsNa) 
    tmp[initialNAs > 0] <- NA
    tmpName <- paste("emphasisSmooth", term, sep="_")
    colnames(tmp) <- tmpName
    ensembleFilters[[tmpName]] <- tmp
  }
  
  # compile the filters
  out <- do.call(cbind, ensembleFilters)
  return(out)
}

The vast majority of the computational time takes place in the two xtsApply calls. On 249 different simple moving averages, the process takes about 30 seconds.

Here’s the output, using a conviction factor of 2:

t1 <- Sys.time()
filts <- ensembleFilter(Ad(SPY), smas, n = 20, conviction = 2, emphasisSmooth = c(0, .05, .25, .5, .75, .95, 1))
t2 <- Sys.time()
print(t2-t1)


plot(Ad(SPY)['2007::2011'])
lines(filts[,1], col='blue', lwd=2)
lines(filts[,2], col='green', lwd = 2)
lines(filts[,3], col='orange', lwd = 2)
lines(filts[,4], col='brown', lwd = 2)
lines(filts[,5], col='maroon', lwd = 2)
lines(filts[,6], col='purple', lwd = 2)
lines(filts[,7], col='red', lwd = 2)

And here is an example, looking at SPY from 2007 through 2011.

In this case, I chose to go from blue to green, orange, brown, maroon, purple, and finally red for smoothness emphasis of 0, 5%, 25%, 50%, 75%, 95%, and 1, respectively.

Notice that the blue line is very wiggly, while the red line sometimes barely moves, such as during the 2011 drop-off.

One thing that I noticed in the course of putting this process together is something that eluded me earlier–namely, that naive trend-following strategies which are either fully long or fully short based on a crossover signal can lose money quickly in sideways markets.

However, theoretically, by finely varying the jumps between 0% to 100% emphasis on smoothness, whether in steps of 1% or finer, one can have a sort of “continuous” conviction, by simply adding up the signs of differences between various ensemble filters. In an “uptrend”, the difference as one moves from the most responsive to most smooth filter should constantly be positive, and vice versa.

In the interest of brevity, this post doesn’t even have a trading strategy attached to it. However, an implied trading strategy can be to be long or short the SPY depending on the sum of signs of the differences in filters as you move from responsiveness to smoothness. Of course, as the candidate filters are all SMAs, it probably wouldn’t be particularly spectacular. However, for those out there who use more complex filters, this may be a way to create ensembles out of various candidate filters, and create even better filters. Furthermore, I hope that given enough candidate filters and an objective way of selecting them, it would be possible to reduce the chances of creating an overfit trading system. However, anything with parameters can potentially be overfit, so that may be wishful thinking.

All in all, this is still a new idea for me. For instance, the filter to compute the error terms can probably be improved. The inspiration for an EMA 20 essentially came from how Basel computes volatility (if I recall, correctly, it uses the square root of an 18 day EMA of squared returns), and the very fact that I use an EMA can itself be improved upon (why an EMA instead of some other, more complex filter). In fact, I’m always open to how I can improve this concept (and others) from readers.

Thanks for reading.

NOTE: I am currently contracting in Chicago in an analytics capacity. If anyone would like to meet up, let me know. You can email me at ilya.kipnis@gmail.com, or contact me through my LinkedIn here.

23 thoughts on “A Filter Selection Method Inspired From Statistics

  1. any idea what a good high sharpe strategy that does not involve leveraged option selling . I know you can get very good stats selling way out of the money puts, buts the returns would be terrible unless you use lots of leverage

      • Options were always something that I was never able to get into due to the question of the added dimensionality of the options ladder, and all the expiration mechanics. Do you have any reproducible results beyond some of Harry Long’s work? I’ve replicated his work as well, and I have my reservations about it.

        In any case, if you have anything enlightening to share in terms of strategy development, feel free to post a link. I’d love to read it.

  2. Ilya

    Interesting idea and nicely conceived. However, its not clear to me how ensembling individual filters would improve the overall filter characteristics beyond some theoretical limit that you could attain with a single purpose-built or selected filter.

    I think a better approach would be to begin with a clear idea of what you would like your filter to accomplish in terms of lag and attenuation (keeping in mind that there will be some trade off between the two), and then select or design an appropriate filter on that basis. There exist some useful digital filters like the Butterworth that have known lag-attenuation responses, and you can always investigate a filter’s response by feeding it a signal like a sine wave, step function, ramp function etc. Your approach could be used to build a filter with the desired characteristics. John Ehlers’ work provides a comprehensive exposition of the application of digital filters to financial data, if you’re interested.

    Thanks for sharing your research.

    • Kris,

      What I don’t like about any particular “purpose-built filter” is that any *one* filter takes a set of parameters (E.G. an SMA200). How do I know I didn’t overfit that filter? Furthermore, since last I checked, in his book “Cycle Analytics For Traders”, John Ehlers seems to have disowned his own work with filters, though I have a few of them implemented in my DSTrading package.

      • I had no idea Ehlers had disowned his own work with filters. Where did you read that? Any chance you could link to a reference? I had a quick look on his Mesa Software website without success. A while back, I spent quite a lot of time pursuing his ideas, but was never able to build a successful system. If he’s disowned that part of his work, I’d feel vindicated, if a little annoyed at the wasted effort!

        Re: overfitting – good point, and I guess you can never really be sure. About the best we can do is to test out of sample and make a judgement about the predictive utility of the parameter. But of course, that has a wealth of its own issues.

        I’d be interested to see this applied in a trading strategy. I could do a post about it on my site (thanks for commenting by the way) if you’re not averse?

      • Contrary to being averse, I wish more people would spend more time developing ideas from this site. It’d be an indication as to which ones have mileage!

        As for Ehlers’s disowning of said strategies, hold on…

        From pg. 135 of Cycle Analytics For Traders (his latest book):

        “Adaptive filters can have several different meanings. For example, Perry
        Kaufman’s adaptive moving average (KAMA)1 and Tushar Chande’s variable
        index dynamic average (VIDYA)2 adapt to changes in volatility. By definition,
        these filters are reactive to price changes, and therefore they close the barn door
        after the horse is gone.”

        I forget where I read that he gave up on trend-following altogether, but he certainly seems to have his aversions to it.

        Something I could never understand is his autocorrelation periodogram code. I’d LOVE to have something like that implemented in R that given some market data, would actually compute a cycle period, aka a dynamically usable lookback parameter so as to reduce chances of overfitting trend filters.

  3. Thanks for that reference – very interesting. Its apparent from Dr Ehlers’ website that he is very much focused on cycle trading as opposed to trend trading.

    Re the autocorrelation periodogram code – I implemented that in Lite-C (the language used in the Zorro platform) a while back. Porting it across to R efficiently is beyond my level of skill at the moment, but I’d be happy to share it with you if you’d like to have a go. It doesn’t include the graphical heat map output, but it does compute the dominant period using Ehlers’ method. Again, I found it difficult to apply it successfully in a trading system, but I haven’t looked at it for over twelve months, and I’d like to think I’ve come a long way since then.

    I’ll certainly do a post about your filter ensemble – thanks for supporting that idea. I need to finish off the k-means candlestick pattern series first.

    • I just wish that someone would explain the numbered steps to me clearer than Ehlers did. I mean, I could get to the correlation part, but then I couldn’t get any further because I didn’t understand what the different arrays did. I’d be happy to collaborate on porting it over to R, because if I could get that thing working, god knows how many experiments I’d be able to run with it.

  4. Hi Ilya,

    I really find your posts enjoyable. I thought this post was particularly interesting as I’ve explored some of these ideas in the past. It seems like at the end of the day, you are essentially assigning weights to candidates inversely proportional to their overall (weighted sum of) fast (response) and slow (smooth) volatility contributions. Apologies if I’m oversimplifying or misinterpreting. I don’t know that it makes the overall response anymore robust to varying conditions… but look forward to seeing where you take it.

    My first thought to find a tradeoff between fast and smooth responses would be to just combine an EMA and an SMA. e.g. SEMA <- wt*EMA + (1-wt)*SMA with weight being assigned based on the tradeoff goal(s). — wt of course, being analogous to your emphasisSmooth parameter. By adding the EMA response, your composite filter will likely have more ability to to assign a short term faster transient response, than just using all SMAs to accomplish the weightings. Further, wt could be dynamically assigned proportional to recent volatility to automatically put more emphasis on fast (EMA) transient responses during volatile conditions and revert back to smoother (SMA) responses on less volatile periods.

    The above can easily be extended to an ensemble matrix (2*249 cols X row obs)… which may or may not make it more robust to different conditions.

    Regards,
    IT

    • Pat,

      Yeah, you’re correct about the weight assignment. I’m not sure the volatility is the best way to measure smoothness, however. I.E. the time series that’d have no smoothness error would be a flat line, but why is that better than *any* line (E.G. a constant trendline up or down)?

      Regarding the fast error, I’m not sure that relationship to the exact noisy price action is the best measurement, either.

      Regarding another 2-249 EMAs, yeah. This was essentially a pedagogical example, using just the most basic indicator. One could certainly add more ingredients to the mix ala more complex volatility-adjusted filters.

      Speaking of which, your idea about a volatility-adjusted fast or slow weighting scheme reminds me of John Ehlers’s (and others’) work with various adaptive moving average type filters. The one issue I have with all of those is that it’s hard to predict exactly what they’ll do on any given situation, so it’s a lot more difficult to make a moving average crossover system out of them, which is the standard for a trend-following type strategy (I.E. when fast crosses above slow, buy, and vice versa). I.E., you may have the “slow” adaptive moving average cross over the “fast” one in a strong uptrend based on some small correction.

      I’d love if you could blog about your results in trying to work with adaptive moving averages, since I could never get Ehlers’s indicators to work too well for me, as they underperformed pretty badly out of sample for me. I recorded that in my first few series of posts on this blog.

  5. Pingback: Quantocracy's Daily Wrap for 11/09/2015 | Quantocracy

  6. Hi Ilya,

    Nice post on a nice simple idea. As you say above in the comment, the issue becomes knowing how the filter will react in given market conditions. It might be a nice addition to construct a systematic process for testing your ensemble filter (actually any filter I guess) by feeding it in given functions (sawtooth, step, sine, etc. combined with brownian bridges for overall trend) etc and slowly ramp up noise and jumps to see how the filter reacts.

    We’ve looked at filters before for hedge timing. In the end, the best measure of filter performance we came up with was to construct transition matrices of probabilities, ‘return lost’ and durations for highly controlled scenarios specifically looking at sideways and crash markets and focussing on periodicity, noise and jumps. Ultimately, you construct a 2×2 matrix which compares the buy/sell signals from the filter with the actual buy/sell periods in the market. Using Markov chain theory & simulation, you can build conditional n-period transition matrices across each of the controlled scenarios to get a very good understanding of how your filter will react, how long it will react for and how much a (in)correct reaction will (cost)make you.

    Regards,
    Emlyn

    • Emlyn,

      Care to write an R blog demonstrating these techniques? You definitely have a much more theoretically deep background than I do, and I’d love to see the details of how you do things. Also, I do agree on generating some theoretical constructs to better understand the properties of the filter. I might try that going forward.

      -Ilya

      • Unfortunately, my R code is downright shoddy and I don’t have a blog. [I’m an unashamed Matlab and Excel fan (don’t know why Excel & VBA get such a bad rap).] That said, give me some time and I’ll put something together and send it through to you. Hopefully it’ll be useful and interesting.
        Thanks for your effort put into this blog – always makes for interesting reading.

  7. Pingback: A Filter Selection Method Inspired From Statistics | Mubashir Qasim

  8. Pingback: Distilled News | Data Analytics & R

  9. Pingback: A Filter Selection Method Inspired From Statistics | Dinesh Ram Kali.

  10. Great work Ilya! I wonder why you think it pays to use filter in volatility trading. In general filters smooth out volatility and you are losing information. The plain vanilla model based on the ratio of VIX/VXV generated a return from 33% to 25% in October depending on commission. Adding a filter will certainly add some lag and compromise timing.

    Also, it is known for a while that all filters and any combination of then can be approximated by WMAs. Maybe you would like to take a look at that in another article.

    • Well, here’s the thing — there’s *always* a lag in the signal. After all, the signal you generate today, ideally, should have been used at *yesterday’s* close, not today’s! Essentially, in a perfect world, you’d be able to go back in time and execute on the bar that generated your signal, before it generated it–in other words, lookahead bias. When not smoothing, you get exposed to a great deal of noise.

      As VMS said in August, historically, smoothing a signal has shown to be very effective. Furthermore, as I showed in a previous post, the VIX/VXV strategy with no smoothing breaks down quickly when delayed even by one day. While I suppose this is true of any volatility strategy, the VIX/VXV is an “observe the close, execute the close” strategy, leaving it particularly vulnerable to execution considerations.

      Regarding WMAs, the thing about using any pair of filters is that it turns the system more binary–either one filter is above the other, or it isn’t. One solution that I’ve been thinking about, in order to guard against this error, is to use a quasi-continuous idea, in that I have a hundred, or maybe even several hundred filters, each only slightly different than the other, on the spectrum of responsiveness (but noisy) to smooth (but laggy). Using a sign of differences between one filter and its neighbor on the spectrum would allow for a more continuous measure of conviction besides “yes vs. no”.

      With volatility trading strategies, however, I feel that a yes or no decision is enough, actually. However, for vol strats, what I’m after is actually a faster yet smoother filter than the naive ones.

  11. Pingback: Best Links of the Week | Quantocracy

  12. Pingback: A First Attempt At Applying Ensemble Filters (and some blog commentary) | QuantStrat TradeR

Leave a reply to Ilya Kipnis Cancel reply