An Introduction to Change Points (packages: ecp and BreakoutDetection)

A forewarning, this post is me going out on a limb, to say the least. In fact, it’s a post/project requested from me by Brian Peterson, and it follows a new paper that he’s written on how to thoroughly replicate research papers. While I’ve replicated results from papers before (with FAA and EAA, for instance), this is a first for me in terms of what I’ll be doing here.

In essence, it is a thorough investigation into the paper “Leveraging Cloud Data to Mitigate User Experience from ‘Breaking Bad’”, and follows the process from the aforementioned paper. So, here we go.


Twitter Breakout Detection Package
Leveraging Cloud Data to Mitigate User Experience From ‘Breaking Bad’

Summary of Paper

Introduction: in a paper detailing the foundation of the breakout detection package (arXiv ID 1411.7955v1), James, Kejariwal, and Matteson demonstrate an algorithm that detects breakouts in twitter’s production-level cloud data. The paper begins by laying the mathematical foundation and motivation for energy statistics, the permutation test, and the E-divisive with medians algorithm, which create a fast way of detecting a shift in median between two nonparametric distributions that is robust to the presence of anomalies. Next, the paper demonstrates a trial run through some of twitter’s production cloud data, and compares the non-parametric E-divisive with medians to an algorithm called PELT. For the third topic, the paper discusses potential applications, one of which is quantitative trading/computational finance. Lastly, the paper states its conclusion, which is the addition of the E-divisive with medians algorithm to the existing literature of change point detection methodologies.

The quantitative and computational methodologies for the paper use a modified variant of energy statistics more resilient against anomalies through the use of robust statistics (viz. median). The idea of energy statistics is to compare the distances of means of two random variables contained within a larger time series. The hypothesis test to determine if this difference is statistically significant is called the permutation test, which permutes data from the two time series a finite number of times to make the process of comparing permuted time series computationally tractable. However, the presence of anomalies, such as in twitter’s production cloud data, would limit the effectiveness of using this process when using simple means. To that end, the paper proposes using the median, and due to the additional computational time resulting from the weaker distribution assumptions to extend the generality of the procedure, the paper devises the E-divisive with medians algorithms, one of which works off of distances between observations, and one works with the medians of the observations themselves (as far as I understand). To summarize, the E-divisive with medians algorithms exist as a way of creating a computationally tractable procedure for determining whether or not a new chunk of time series data is considerably different from the previous through the use of advanced distance statistics robust to anomalies such as those present in twitter’s cloud data.

To compare the performance of the E-divisive with medians algorithms, the paper compares the algorithms to an existing algorithm called PELT (which stands for Pruned Extract Linear Time) in various quantitative metrics, such as “Time To Detect”, meaning the exact moment of the breakout to when the algorithms report it (if at all), along with precision, recall, and the F-measure, defined as the product of precision and recall over their respective sum. Comparing PELT to the E-divisive with medians algorithm showed that the E-divisive algorithm outperformed the PELT algorithm in the majority of data sets. Even when anomalies were either smoothed by taking the rolling median of their neighbors, or by removing them altogether, the E-divisive algorithm still outperformed PELT. Of the variants of the EDM algorithm (EDM head, EDM tail, and EDM-exact), the EDM-tail variant (i.e. the one using the most recent observations) was also quickest to execute. However, due to fewer assumptions about the nature of the underlying generating distributions, the various E-divisive algorithms take longer to execute than the PELT algorithm, with its stronger assumptions, but worse general performance. To summarize, the EDM algorithms outperform PELT in the presence of anomalies, and generally speaking, the EDM-tail variant seems to work best when considering computational running time as well.

The next section dealt with the history and applications of change-point/breakout detection algorithms, in fields such as finance, medical applications, and signal processing. As finance is of a particular interest, the paper acknowledges the ARCH and various flavors of GARCH models, along with the work of James and Matteson in devising a trading strategy based on change-point detection. Applications in genomics to detect cancer exist as well. In any case, the paper cites many sources showing the extension and applications of change-point/breakout detection algorithms, of which finance is one area, especially through work done by Matteson. This will be covered further in the literature review.

To conclude, the paper proposes a new algorithm called the E-divisive with medians, complete with a new statistical permutation test using advanced distance statistics to determine whether or not a time series has had a change in its median. This method makes fewer assumptions about the nature of the underlying distribution than a competitive algorithm, and is robust in the face of anomalies, such as those found in twitter’s production cloud data. This algorithm outperforms a competing algorithm which possessed stronger assumptions about the underlying distribution, detecting a breakout sooner in a time series, even if it took longer to run. The applications of such work range from finance to medical devices, and further beyond. As change-point detection is a technique around which trading strategies can be constructed, it has particular relevance to trading applications.

Statement of Hypothesis

Breakouts can occur in data which does not conform to any known regular distribution, thus rendering techniques that assume a certain distribution less effective. Using the E-divisive with medians algorithm, the paper attempts to predict the presence of breakouts using time series with innovations from no regular distribution as inputs, and if effective, will outperform an existing algorithm that possesses stronger assumptions about distributions. To validate or refute a more general form of this hypothesis, which is the ability of the algorithm to detect breakouts in a timely fashion, this summary test it on the cumulative squared returns of the S&P 500, and compare the analysis created by the breakpoints to the analysis performed by Dr. Robert J. Frey of Keplerian Finance, a former managing director at Renaissance Technologies.

Literature Review


A good portion of the practical/applied motivation of this paper stems from the explosion of growth in mobile internet applications, A/B testing, and other web-specific reasons to detect breakouts. For instance, longer loading time on a mobile web page necessarily results in lower revenues. To give another example, machines in the cloud regularly fail.

However, the more salient literature regarding the topic is the literature dealing with the foundations of the mathematical ideas behind the paper.

Key References

Paper 1:

David S. Matteson and Nicholas A. James. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505):334–345, 2013.

Thesis of work: this paper is the original paper for the e-divisive and e-agglomerative algorithms, which are offline, nonparametric methods of detecting change points in time series. Unlike Paper 3, this paper lays out the mathematical assumptions, lemmas, and proofs for a formal and mathematical presentation of the algorithms. Also, it documents performance against the PELT algorithm, presented in Paper 6 and technically documented in Paper 5. This performance compares favorably. The source paper being replicated builds on the exact mathematics presented in this paper, and the subject of this report uses the ecp R package that is the actual implementation/replication of this work to form a comparison for its own innovations.

Paper 2:

M. L. Rizzo and G. J. Sz´ekely. DISCO analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics, 4(2):1034–1055, 2010

Thesis of work: this paper generalizes the ANOVA using distance statistics. This technique aims to find differences among distributions outside their sample means. Through the use of distance statistics, the techniques aim to more generally answer queries about the nature of distributions (EG identical means, but different distributions as a result of different factors). Its applicability to the source paper is that it forms the basis of the ideas for the paper’s divergence measure, as detailed in its second section.

Paper 3:

Nicholas A. James and David S. Matteson. ecp: An R package for nonparametric multiple change point analysis of multivariate data. Technical report, Cornell University, 2013.

Thesis of work: the paper introduces the ecp package which contains the e-agglomerative and e-divisive algorithms for detecting change points in time series in the R statistical programming language (in use on at least one elite trading desk). The e-divisive method recursively partitions a time series and uses a permutation test to determine change points, but it is computationally intensive. The e-agglomerative algorithm allows for inputs from the user for initial time-series segmentation and is a computationally faster algorithm. Unlike most academic papers, this paper also includes examples of data and code in order to facilitate the use of these algorithms. Furthermore, the paper includes applications to real data, such as the companies found in the Dow Jones Industrial Index, further proving the effectiveness of these methods. This paper is important to the topic in question as the E-divisive algorithm created by James and Matteson form the base changepoint detection process for which the paper builds its own innovations for, and visually compares against; furthermore, the source paper restates many of the techniques found in this paper.

Paper 4:

Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. A novel technique for long-term anomaly detection in the cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), June 2014.

Thesis of work: the paper proposes the use of piecewise median and median absolute deviation statistics to detect anomalies in time series. The technique builds upon the ESD (Extreme Studentized Deviate) technique and uses piecewise medians to approximate a long-term trend, before extracting seasonality effects from periods shorter than two weeks. The piecewise median method of anomaly detection has a greater F-measure of detecting anomalies than does the standard STL (seasonality trend loess decomposition) or quantile regression techniques. Furthermore, piecewise median executes more than three times faster. The relevance of this paper to the source paper is that it forms the idea of using robust statistics and building the techniques in the paper upon the median as opposed to the mean.

Paper 5:

Rebecca Killick and Kaylea Haynes. changepoint: An R package for changepoint analysis

Thesis of work: manual for the implementation of the PELT algorithm written by Rebecca Killick and Kaylea Haynes. This package is a competing change-point detection package, mainly focused around the Pruned Extraction Linear Time algorithm, although containing other worse algorithms, such as the segment neighborhoods algorithm. Essentially, it is a computational implementation of the work in Paper 2. Its application toward the source paper is that the paper at hand compares its own methodology against PELT, and often outperforms it.

Paper 6:

Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012

Thesis of work: the paper proposes an algorithm (PELT) that scales linearly in running time with the size of the input time series to detect exact locations of change points. The paper aims to replace both an approximate binary partitioning algorithm, and an optimal segmentation algorithm that doesn’t involve a pruning mechanism to speed up the running time. The paper uses an MLE algorithm at the heart of its dynamic partitioning in order to locate change points. The relevance to the source paper is that through the use of the non-robust MLE procedure, this algorithm is vulnerable to poor performance due to the presence of anomalies/outliers in the data, and thus underperforms the new twitter change point detection methodology which employs robust statistics.

Paper 7:

Wassily Hoeffding. The strong law of large numbers for u-statistics. Institute of Statistics mimeo series, 302, 1961.

Thesis of work: this paper establishes a convergence of the mean of tuples of many random variables to the mean of said random variables, given enough such observations. This paper is a theoretical primer on establishing the above thesis. The mathematics involve use of measure theory and other highly advanced and theoretical manipulations. Its relevance to the source paper is in its use to establish a convergence of an estimated characteristic function.

Similar Work

In terms of financial applications, the papers covering direct applications of change points to financial time series are listed above. Particularly, David Matteson presented his ecp algorithms at R/Finance several years ago, and his work is already in use on at least one professional trading desk. Beyond this, the paper cites works on technical analysis and the classic ARCH and GARCH papers as similar work. However, as this change point algorithm is created to be a batch process, direct comparison with other trend-following (that is, breakout) methods would seem to be a case of apples and oranges, as indicators such as MACD, Donchian channels, and so on, are online methods (meaning they do not have access to the full data set like the e-divisive and the e-divisive with medians algorithms do). However, they are parameterized in terms of their lookback period, and are thus prone to error in terms of inaccurate parameterization resulting from a static lookback value.

In his book Cycle Analytics for Traders, Dr. John Ehlers details an algorithm for computing the dominant cycle of a security—that is, a way to dynamically parameterize the lookback parameter, and if this were to be successfully implemented in R, it may very well allow for improved breakout detection methods than the classic parameterized indicators popularized in the last century.

References With Implementation Hints

Reference 1: Breakout Detection In The Wild

This blog post is a reference contains the actual example included in the R package for the model, written by one of the authors of the source paper. As the data used in the source paper is proprietary twitter production data, and the model is already implemented in the package discussed in this blog post, this makes the package and the included data the go-to source for starting to work with the results presented in the source paper.

Reference 2: Twitter BreakoutDetection R package evaluation

This blog post is that of a blogger altering the default parameters in the model. His analysis of traffic to his blog contains valuable information as to greater flexibility in the use of the R package that is the implementation of the source paper.


The data contained in the source paper comes from proprietary twitter cloud production data. Thus, it is not realistic to obtain a copy of that particular data set. However, one of the source paper’s co-authors, Arun Kejariwal, was so kind as to provide a tutorial, complete with code and sample data, for users to replicate at their convenience. It is this data that we will use for replication.

Building The Model

Stemming from the above, we are fortunate that the results of the source paper have already been implemented in twitter’s released R package, BreakoutDetection. This package has been written by Nicholas A. James, a PhD candidate at Cornell University studying under Dr. David S. Matteson. His page is located here.

In short, all that needs to be done on this end is to apply the model to the aforementioned data.

Validate the Results

To validate the results—that is, to obtain the same results as one of the source paper’s authors, we will execute the code on the data that he posted on his blog post (see Reference 1).

install_github(repo="BreakoutDetection", username="twitter")

res = breakout(Scribe, min.size=24, method='multi', beta=.001, degree=1, plot=TRUE)

This is the resulting image, identical from the blog post.

Validation of the Hypothesis

This validation was inspired by the following post:

The Relevance of History

The post was written by Dr. Robert J. Frey, professor of Applied Math and Statistics at Stony Brook University, the head of its Quantitative Finance program, and former managing director at Renaissance Technologies (yes, the Renaissance Technologies founded by Dr. Jim Simons). While the blog is inactive at the moment, I sincerely hope it will become more active again.

Essentially, it uses mathematica to detect changes in the slope of cumulative squared returns, and the final result is a map of spikes, mountains, and plains, the x-axis being time, and the y-axis the annualized standard deviation. Using the more formalized e-divisive and e-divisive with medians algorithms, this analysis will attempt to detect change points, and use the PerformanceAnalytics library to compute the annualized standard deviation from the data of the GSPC returns itself, and output a similarly-formatted plot.

Here’s the code:


getSymbols("^GSPC", from = "1984-12-25", to = "2013-05-31")
monthlyEp <- endpoints(GSPC, on = "months")
GSPCmoCl <- Cl(GSPC)[monthlyEp,]
GSPCmoRets <- Return.calculate(GSPCmoCl)
GSPCsqRets <- GSPCmoRets*GSPCmoRets
GSPCsqRets <- GSPCsqRets[-1,] #remove first NA as a result of return computation
GSPCcumSqRets <- cumsum(GSPCsqRets)

This results in the following image:

So far, so good. Let’s now try to find the amount of changepoints that Dr. Frey’s graph alludes to.

t1 <- Sys.time()
ECPmonthRes <- e.divisive(X = GSPCsqRets, min.size = 2)
t2 <- Sys.time()
print(t2 - t1)

t1 <- Sys.time()
BDmonthRes <- breakout(Z = GSPCsqRets, min.size = 2, beta=0, degree=1)
t2 <- Sys.time()
print(t2 - t1)


With the following results:

> ECPmonthRes$estimates
[1]   1 285 293 342
> BDres$loc
[1] 47 87

In short, two changepoints for each. Far from the 20 or so regimes present in Dr. Frey’s analysis. So, not close to anything that was expected. My intuition tells me that the main reason for this is that these algorithms are data-hungry, and there is too little data for them to do much more than what they have done thus far. So let’s go the other way and use daily data.

dailySqRets <- Return.calculate(Cl(GSPC))*Return.calculate(Cl(GSPC))
dailySqRets <- dailySqRets["1985::"]


And here’s the new plot:

First, let’s try the e-divisive algorithm from the ecp package to find our changepoints, with a minimum size of 20 days between regimes. (Blog note: this is a process that takes an exceptionally long time. For me, it took more than 2 hours.)

t1 <- Sys.time()
ECPres <- e.divisive(X = dailySqRets, min.size=20)
t2 <- Sys.time()
print(t2 - t1)
Time difference of 2.214813 hours

With the following results:

 [1] "1985-01-02" "1987-10-14" "1987-11-11" "1998-07-21" "2002-07-01" "2003-07-28" "2008-09-15" "2008-12-09"
 [9] "2009-06-02" NA   

The first and last are merely the endpoints of the data. So essentially, it encapsulates Black Monday and the crisis, among other things. Let’s look at how the algorithm split the volatility regimes. For this, we will use the xtsExtra package for its plotting functionality (thanks to Ross Bennett for the work he did in implementing it).

xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

With the resulting plot:

In this case, the e-divisive algorithm from the ecp package does a pretty great job segmenting the various volatility regimes, as can be thought of roughly as the slope of the cumulative squared returns. The algorithm’s ability to accurately cluster the Black Monday events, along with the financial crisis, shows its industrial-strength applicability. How does this look on the price graph?

xtsExtra::addLines(index(dailySqRets)[ECPres$estimates[-c(1, length(ECPres$estimates))]], on = 1, col = "blue", lwd = 2)

In this case, Black Monday is clearly visible, along with the end of the Clinton bull run through the dot-com bust, the consolidation, the run-up to the crisis, the crisis itself, the consolidation, and the new bull market.

Note that the presence of a new volatility regime may not necessarily signify a market top or bottom, but the volatility regime detection seems to have worked very well in this case.

For comparison, let’s examine the e-divisive with medians algorithm.

t1 <- Sys.time()
BDres <- breakout(Z = dailySqRets, min.size = 20, beta=0, degree=1)
t2 <- Sys.time()


With the following result:

Time difference of 2.900167 secs
> BDres$loc
[1] 5978
> BDres$loc
[1] 5978
> index(dailySqRets)[BDres$loc]
[1] "2008-09-12"

So while the algorithm is a lot faster, its volatility regime detection, it only sees the crisis as the one major change point. Beyond that, to my understanding, the e-divisive with medians algorithm may be “too robust” (even without any penalization) against anomalies (after all, the median is robust to changes in 50% of the data). In short, I think that while it clearly has applications, such as twitter cloud production data, it doesn’t seem to obtain a result that’s in the ballpark of two other separate procedures.

Lastly, let’s try and create a plot similar to Dr. Frey’s, with spikes, mountains, and plains.

GSPCrets <- Return.calculate(Cl(GSPC))
GSPCrets <- GSPCrets["1985::"]
GSPCrets$regime <- ECPres$cluster
GSPCrets$annVol <- NA

for(i in unique(ECPres$cluster)) {
  regime <- GSPCrets[GSPCrets$regime==i,]
  annVol <- StdDev.annualized(regime[,1])
  GSPCrets$annVol[GSPCrets$regime==i,] <- annVol

plot(GSPCrets$annVol, ylim=c(0, max(GSPCrets$annVol)), main="GSPC volatility regimes, 1985 to 2013-05")

With the corresponding image, inspired by Dr. Robert Frey:

This concludes the research replication.


Whew. Done. While I gained some understanding of what change points are useful for, I won’t profess to be an expert on them (some of the math involved uses PhD-level mathematics such as characteristic functions that I never learned). However, it was definitely interesting pulling together several different ideas and uniting them under a rigorous process.

Special thanks for this blog post:

Brian Peterson, for the process paper and putting a formal structure to the research replication process (and requesting this post).
Robert J. Frey, for the “volatility landscape” idea that I could objectively point to as an objective benchmark to validate the hypothesis of the paper.
David S. Matteson, for the ecp package.
Nicholas A. James, for the work done in the BreakoutDetection package (and clarifying some of its functionality for me).
Arun Kejariwal, for the tutorial on using the BreakoutDetection package.

Thanks for reading.

NOTE: I am a freelance consultant in quantitative analysis on topics related to this blog. If you have contract or full time roles available for proprietary research that could benefit from my skills, please contact me through my LinkedIn here.

Introducing Stepwise Correlation Rank

So in the last post, I attempted to replicate the Flexible Asset Allocation paper. I’d like to offer a thanks to Pat of Intelligent Trading Tech (not updated recently, hopefully this will change) for helping me corroborate the results, so that I have more confidence there isn’t an error in my code.

One of the procedures the authors of the FAA paper used is a correlation rank, which I interpreted as the average correlation of each security to the others.

The issue, pointed out to me in a phone conversation I had with David Varadi is that when considering correlation, shouldn’t the correlations the investor is concerned about be between instruments within the portfolio, as opposed to simply all the correlations, including to instruments not in the portfolio? To that end, when selecting assets (or possibly features in general), conceptually, it makes more sense to select in a stepwise fashion–that is, start off at a subset of the correlation matrix, and then rank assets in order of their correlation to the heretofore selected assets, as opposed to all of them. This was explained in Mr. Varadi’s recent post.

Here’s a work in progress function I wrote to formally code this idea:

stepwiseCorRank <- function(corMatrix, startNames=NULL, stepSize=1, bestHighestRank=FALSE) {
  #edge cases
  if(dim(corMatrix)[1] == 1) {
  } else if (dim(corMatrix)[1] == 2) {
    ranks <- c(1.5, 1.5)
    names(ranks) <- colnames(corMatrix)
  if(is.null(startNames)) {
    corSums <- rowSums(corMatrix)
    corRanks <- rank(corSums)
    startNames <- names(corRanks)[corRanks <= stepSize]
  nameList <- list()
  nameList[[1]] <- startNames
  rankList <- list()
  rankCount <- 1
  rankList[[1]] <- rep(rankCount, length(startNames))
  rankedNames <-, nameList)
  while(length(rankedNames) < nrow(corMatrix)) {
    rankCount <- rankCount+1
    subsetCor <- corMatrix[, rankedNames]
    if(class(subsetCor) != "numeric") {
      subsetCor <- subsetCor[!rownames(corMatrix) %in% rankedNames,]
      if(class(subsetCor) != "numeric") {
        corSums <- rowSums(subsetCor)
        corSumRank <- rank(corSums)
        lowestCorNames <- names(corSumRank)[corSumRank <= stepSize]
        nameList[[rankCount]] <- lowestCorNames
        rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
      } else { #1 name remaining
        nameList[[rankCount]] <- rownames(corMatrix)[!rownames(corMatrix) %in% names(subsetCor)]
        rankList[[rankCount]] <- rankCount
    } else {  #first iteration, subset on first name
      subsetCorRank <- rank(subsetCor)
      lowestCorNames <- names(subsetCorRank)[subsetCorRank <= stepSize]
      nameList[[rankCount]] <- lowestCorNames
      rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
    rankedNames <-, nameList)
  ranks <-, rankList)
  names(ranks) <- rankedNames
  if(bestHighestRank) {
    ranks <- 1+length(ranks)-ranks
  ranks <- ranks[colnames(corMatrix)] #return to original order

So the way the function works is that it takes in a correlation matrix, a starting name (if provided), and a step size (that is, how many assets to select per step, so that the process doesn’t become extremely long when dealing with larger amounts of assets/features). Then, it iterates–subset the correlation matrix on the starting name, and find the minimum value, and add it to a list of already-selected names. Next, subset the correlation matrix columns on the selected names, and the rows on the not selected names, and repeat, until all names have been accounted for. Due to R’s little habit of wiping out labels when a matrix becomes a vector, I had to write some special case code, which is the reason for two nested if/else statements (the first one being for the first column subset, and the second being for when there’s only one row remaining).

Also, if there’s an edge case (1 or 2 securities), then there is some functionality to handle those trivial cases.

Here’s a test script I wrote to test this function out:


#mid 1997 to end of 2012
getSymbols(mutualFunds, from="1997-06-30", to="2012-12-31")
tmp <- list()
for(fund in mutualFunds) {
  tmp[[fund]] <- Ad(get(fund))

#always use a list hwne intending to cbind/rbind large quantities of objects
adPrices <-, args = tmp)
colnames(adPrices) <- gsub(".Adjusted", "", colnames(adPrices))

adRets <- Return.calculate(adPrices)

subset <- adRets["2012"]
corMat <- cor(subset)

tmp <- list()
for(i in 1:length(mutualFunds)) {
  rankRow <- stepwiseCorRank(corMat, startNames=mutualFunds[i])
  tmp[[i]] <- rankRow
rankDemo <-, tmp)
rownames(rankDemo) <- mutualFunds
origRank <- rank(rowSums(corMat))
rankDemo <- rbind(rankDemo, origRank)
rownames(rankDemo)[8] <- "Original (VBMFX)"

heatmap(-rankDemo, Rowv=NA, Colv=NA, col=heat.colors(8), margins=c(6,6))

Essentially, using the 2012 year of returns for the 7 FAA mutual funds, I compared how different starting securities changed the correlation ranking sequence.

Here are the results:

VTSMX              1     6     7     4     2     3     5
FDIVX              6     1     7     4     2     5     3
VEIEX              6     7     1     4     2     3     5
VFISX              2     6     7     1     3     4     5
VBMFX              2     6     7     4     1     3     5
QRAAX              5     6     7     4     2     1     3
VGSIX              5     6     7     4     2     3     1
Non-Sequential     5     6     7     2     1     3     4

In short, the algorithm is rather robust to starting security selection, at least judging by this small example. However, comparing VBMFX start to the non-sequential ranking, we see that VFISX changes from rank 2 in the non-sequential to rank 4, with VTSMX going from rank 5 to rank 2. From an intuitive perspective, this makes sense, as both VBMFX and VFISX are bond funds, which have a low correlation with the other 5 equity-based mutual funds, but a higher correlation with each other, thus signifying that the algorithm seems to be working as intended, at least insofar as this small example demonstrates. Here’s a heatmap to demonstrate this in visual form.

The ranking order (starting security) is the vertical axis, and the horizontal are the ranks, from white being first, to red being last. Notice once again that the ranking orders are robust in general (consider each column of colors descending), but each particular ranking order is unique.

So far, this code still has to be tested in terms of its applications to portfolio management and asset allocation, but for those interested in such an idea, it’s my hope that this provides a good reference point.

Thanks for reading.

Intermission: A Quick Thought on Robust Kurtosis

This post was inspired by some musings from John Bollinger that as data in the financial world wasn’t normally distributed, that there might be a more robust computation to indicate skewness and kurtosis. For instance, one way to think about skewness is the difference between mean and median. That is, if the mean is less than the median, that the distribution was left skewed, and vice versa.

This post attempts to extend that thinking to kurtosis. That is, just as the skew can be thought of as a relationship between mean and median, so too, might kurtosis be thought of as a relationship between two measures of spread–standard deviation and the more robust interquartile range. So, I performed an experiment to simulate 10000 observations from a standard normal and 10000 observations from a standard double-exponential distribution.

Here’s the experiment I ran.

norms <- rnorm(10000)
dexps <- rexp(10000) * sign(rnorm(10000))
lines(density(norms), col="red")

And here’s the output:

[1] 0.9757966
> (IQR(norms))
[1] 1.330469
> (IQR(dexps))
[1] 1.35934
> (sd(norms))
[1] 0.9875294
> (sd(dexps))
[1] 1.393057
> (sd(norms)/IQR(norms))
[1] 0.7422415
> (sd(dexps)/IQR(dexps))
[1] 1.024804

That is, in a distribution with higher kurtosis than the standard normal, that the ratio between standard deviation to interquartile range is higher in a heavier-tailed distribution. I’m not certain that this assertion is true in all general cases, but it seems to make intuitive sense, that with heavier tails, the same amount of observations are more spread out.

Thanks for reading.

Comparing ATR order sizing to max dollar order sizing

First off, it has come to my attention that some readers have trouble getting some of my demos to work because there may be different versions of TTR in use. If ever your demo doesn’t work, the first thing I would immediately recommend you do is this:

Only run the code through the add.indicator logic. And then, rather than adding the signals and rules, run the following code:

test <- applyIndicators(, mktdata=OHLC(XLB))

That should show you the exact column names of your indicators, and you can adjust your inputs accordingly.While one of my first posts introduced the ATR order-sizing function, I recently received a suggestion to test it in the context of whether or not it actually normalized risk across instruments. To keep things simple, my strategy is as plain vanilla as strategies come — RSI2 20/80 filtered on SMA200.

Here’s the code for the ATR order sizing version, for completeness’s sake.




#trade sizing and initial equity settings
tradeSize <- 100000
initEq <- tradeSize*length(symbols) <- <- <- "DollarVsATRos"
initPortf(, symbols=symbols, initDate=initDate, currency='USD')
initAcct(,, initDate=initDate, currency='USD',initEq=initEq)
initOrders(, initDate=initDate)
strategy(, store=TRUE)


nRSI <- 2
buyThresh <- 20
sellThresh <- 80
nSMA <- 200

add.indicator(, name="lagATR", 
              arguments=list(HLC=quote(HLC(mktdata)), n=period), 

add.indicator(, name="RSI",
              arguments=list(price=quote(Cl(mktdata)), n=nRSI),

add.indicator(, name="SMA",
              arguments=list(x=quote(Cl(mktdata)), n=nSMA),

add.signal(, name="sigComparison",
           arguments=list(columns=c("Close", "sma"), relationship="gt"),

add.signal(, name="sigThreshold",
           arguments=list(column="rsi", threshold=buyThresh, 
                          relationship="lt", cross=FALSE),

add.signal(, name="sigAND",
           arguments=list(columns=c("filter", "rsiLtThresh"), cross=TRUE),

add.signal(, name="sigThreshold",
           arguments=list(column="rsi", threshold=sellThresh,
                          relationship="gt", cross=TRUE),

add.signal(, name="sigCrossover",
           arguments=list(columns=c("Close", "sma"), relationship="lt"),

add.rule(, name="ruleSignal", 
         arguments=list(sigcol="longEntry", sigval=TRUE, ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open", osFUN=osDollarATR,
                        tradeSize=tradeSize, pctATR=pctATR, atrMod="X"), 
         type="enter", path.dep=TRUE)

add.rule(, name="ruleSignal", 
         arguments=list(sigcol="longExit", sigval=TRUE, orderqty="all", ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open"), 
         type="exit", path.dep=TRUE)

add.rule(, name="ruleSignal", 
         arguments=list(sigcol="filterExit", sigval=TRUE, orderqty="all", ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open"), 
         type="exit", path.dep=TRUE)

#apply strategy
t1 <- Sys.time()
out <- applyStrategy(,
t2 <- Sys.time()

#set up analytics
dateRange <- time(getPortfolio($summary)[-1]

Here are some of the usual analytics, which don’t interest me in and of themselves as this strategy is rather throwaway, but to compare them to what happens when I use the max dollar order sizing function in a moment:

> (aggPF <- sum(tStats$Gross.Profits)/-sum(tStats$Gross.Losses))
[1] 1.659305
> (aggCorrect <- mean(tStats$Percent.Positive))
[1] 69.24967
> (numTrades <- sum(tStats$Num.Trades))
[1] 3017
> (meanAvgWLR <- mean(tStats$Avg.WinLoss.Ratio[tStats$Avg.WinLoss.Ratio < Inf], na.rm=TRUE))
[1] 0.733

> SharpeRatio.annualized(portfRets)
Annualized Sharpe Ratio (Rf=0%) 0.9783541
> Return.annualized(portfRets)
Annualized Return 0.07369592
> maxDrawdown(portfRets)
[1] 0.08405041

> round(apply.yearly(dailyRetComparison, Return.cumulative),3)
           strategy    SPY
2003-12-31    0.052  0.066
2004-12-31    0.074  0.079
2005-12-30    0.045  0.025
2006-12-29    0.182  0.132
2007-12-31    0.117  0.019
2008-12-31   -0.010 -0.433
2009-12-31    0.130  0.192
2010-12-31   -0.005  0.110
2011-12-30    0.069 -0.028
2012-12-31    0.087  0.126
> round(apply.yearly(dailyRetComparison, SharpeRatio.annualized),3)
           strategy    SPY
2003-12-31    1.867  3.641
2004-12-31    1.020  0.706
2005-12-30    0.625  0.238
2006-12-29    2.394  1.312
2007-12-31    1.105  0.123
2008-12-31   -0.376 -1.050
2009-12-31    1.752  0.719
2010-12-31   -0.051  0.614
2011-12-30    0.859 -0.122
2012-12-31    1.201  0.990
> round(apply.yearly(dailyRetComparison, maxDrawdown),3)
           strategy   SPY
2003-12-31    0.018 0.025
2004-12-31    0.065 0.085
2005-12-30    0.053 0.074
2006-12-29    0.074 0.077
2007-12-31    0.066 0.102
2008-12-31    0.032 0.520
2009-12-31    0.045 0.280
2010-12-31    0.084 0.167
2011-12-30    0.053 0.207
2012-12-31    0.050 0.099

Now here’s a new bit of analytics–comparing annualized standard deviations between securities:

> sdQuantile <- quantile(sapply(instRets, sd.annualized))
> sdQuantile
         0%         25%         50%         75%        100% 
0.004048235 0.004349390 0.004476377 0.004748530 0.005557765 
> (extremeRatio <- sdQuantile[5]/sdQuantile[1]-1)
> (boxBorderRatio <- sdQuantile[4]/sdQuantile[2]-1)

In short, because the instrument returns are computed as a function of only the initial account equity (quantstrat doesn’t know that I’m “allocating” a notional cash amount to each separate ETF–because I’m really not–I just treat it as one pile of cash that I mentally think of as being divided “equally” between all 30 ETFs), that means that the returns per instrument also have already implicitly factored in the weighting scheme from the order sizing function. In this case, the most volatile instrument is about 37% more volatile than the least — and since I’m dealing with indices of small nations along with short-term treasury bills in ETF form, I’d say that’s impressive.

More impressive, in my opinion, is that the difference in volatility between the 25th and 75th percentile is about 9%. It means that our ATR order sizing seems to be doing its job.Here’s the raw computations in terms of annualized volatility:

> sapply(instRets, sd.annualized)
EFA.DailyEndEq EPP.DailyEndEq EWA.DailyEndEq EWC.DailyEndEq 
   0.004787248    0.005557765    0.004897699    0.004305728 
EWG.DailyEndEq EWH.DailyEndEq EWJ.DailyEndEq EWS.DailyEndEq 
   0.004806879    0.004782505    0.004460708    0.004618460 
EWT.DailyEndEq EWU.DailyEndEq EWY.DailyEndEq EWZ.DailyEndEq 
   0.004417686    0.004655716    0.004888876    0.004858743 
EZU.DailyEndEq IEF.DailyEndEq IGE.DailyEndEq IYR.DailyEndEq 
   0.004631333    0.004779468    0.004617250    0.004359273 
IYZ.DailyEndEq LQD.DailyEndEq RWR.DailyEndEq SHY.DailyEndEq 
   0.004346095    0.004101408    0.004388131    0.004585389 
TLT.DailyEndEq XLB.DailyEndEq XLE.DailyEndEq XLF.DailyEndEq 
   0.004392335    0.004319708    0.004515228    0.004426415 
XLI.DailyEndEq XLK.DailyEndEq XLP.DailyEndEq XLU.DailyEndEq 
   0.004129331    0.004492046    0.004369804    0.004048235 
XLV.DailyEndEq XLY.DailyEndEq 
   0.004148445    0.004203503 

And here’s a histogram of those same calculations:

In this case, the reason that the extreme computation gives us a 37% greater result is that one security, EPP (pacific ex-Japan, which for all intents and purposes is emerging markets) is simply out there a bit. The rest just seem very clumped up.

Now let’s remove the ATR order sizing and replace it with a simple osMaxDollar rule, that simply will keep a position topped off at a notional dollar value. In short, aside from a few possible one-way position rebalancing transactions (E.G. with the ATR order sizing rule, ATR may have gone up whereas total value of a position may have gone down, which may trigger the osMaxDollar rule but not the osDollarATR rule on a second RSI cross) Here’s the new entry rule, with the ATR commented out:

# add.rule(, name="ruleSignal", 
#          arguments=list(sigcol="longEntry", sigval=TRUE, ordertype="market", 
#                         orderside="long", replace=FALSE, prefer="Open", osFUN=osDollarATR,
#                         tradeSize=tradeSize, pctATR=pctATR, atrMod="X"), 
#          type="enter", path.dep=TRUE)

add.rule(, name="ruleSignal", 
         arguments=list(sigcol="longEntry", sigval=TRUE, ordertype="market", 
                        orderside="long", replace=FALSE, prefer="Open", osFUN=osMaxDollar,
                        tradeSize=tradeSize, maxSize=tradeSize), 
         type="enter", path.dep=TRUE)

Let’s look at the corresponding statistical results:

> (aggPF <- sum(tStats$Gross.Profits)/-sum(tStats$Gross.Losses))
[1] 1.635629
> (aggCorrect <- mean(tStats$Percent.Positive))
[1] 69.45633
> (numTrades <- sum(tStats$Num.Trades))
[1] 3019
> (meanAvgWLR <- mean(tStats$Avg.WinLoss.Ratio[tStats$Avg.WinLoss.Ratio < Inf], na.rm=TRUE))
[1] 0.735

> SharpeRatio.annualized(portfRets)
Annualized Sharpe Ratio (Rf=0%) 0.8529713
> Return.annualized(portfRets)
Annualized Return 0.04857159
> maxDrawdown(portfRets)
[1] 0.06682969
> dailyRetComparison <- cbind(portfRets, SPYrets)
> colnames(dailyRetComparison)  <- c("strategy", "SPY")
> round(apply.yearly(dailyRetComparison, Return.cumulative),3)
           strategy    SPY
2003-12-31    0.034  0.066
2004-12-31    0.055  0.079
2005-12-30    0.047  0.025
2006-12-29    0.090  0.132
2007-12-31    0.065  0.019
2008-12-31   -0.023 -0.433
2009-12-31    0.141  0.192
2010-12-31   -0.010  0.110
2011-12-30    0.038 -0.028
2012-12-31    0.052  0.126
> round(apply.yearly(dailyRetComparison, SharpeRatio.annualized),3)
           strategy    SPY
2003-12-31    1.639  3.641
2004-12-31    1.116  0.706
2005-12-30    0.985  0.238
2006-12-29    1.755  1.312
2007-12-31    0.785  0.123
2008-12-31   -0.856 -1.050
2009-12-31    1.774  0.719
2010-12-31   -0.134  0.614
2011-12-30    0.686 -0.122
2012-12-31    1.182  0.990
> round(apply.yearly(dailyRetComparison, maxDrawdown),3)
           strategy   SPY
2003-12-31    0.015 0.025
2004-12-31    0.035 0.085
2005-12-30    0.033 0.074
2006-12-29    0.058 0.077
2007-12-31    0.058 0.102
2008-12-31    0.036 0.520
2009-12-31    0.043 0.280
2010-12-31    0.062 0.167
2011-12-30    0.038 0.207
2012-12-31    0.035 0.099

And now for the kicker–to see just how much riskier using a naive order-sizing method that doesn’t take into account the different idiosyncratic of a security is:

> sdQuantile <- quantile(sapply(instRets, sd.annualized))
> sdQuantile
          0%          25%          50%          75%         100% 
0.0002952884 0.0026934043 0.0032690492 0.0037727970 0.0061480828 
> (extremeRatio <- sdQuantile[5]/sdQuantile[1]-1)
> (boxBorderRatio <- sdQuantile[4]/sdQuantile[2]-1)
> hist(sapply(instRets, sd.annualized))

In short, the ratio between the riskiest and least riskiest asset rises from less than 40% to 1900%. But in case, that’s too much of an outlier (E.G. dealing with treasury bill/note/bond ETFs vs. pacific ex-Japan aka emerging Asia), the difference between the third and first quartiles in terms of volatility ratio has jumped from 9% to 40%.

Here’s the corresponding histogram:As can be seen, a visibly higher variance in variances–in other words, a second moment on the second moment–meaning that to not use an order-sizing function that takes into account individual security risk therefore introduces unnecessary kurtosis and heavier tails into the risk/reward ratio, and due to this unnecessary excess risk, performance suffers measurably.Here are the individual security annualized standard deviations for the max dollar order sizing method:

> sapply(instRets, sd.annualized)
EFA.DailyEndEq EPP.DailyEndEq EWA.DailyEndEq EWC.DailyEndEq 
  0.0029895232   0.0037767697   0.0040222015   0.0036137500 
EWG.DailyEndEq EWH.DailyEndEq EWJ.DailyEndEq EWS.DailyEndEq 
  0.0037097070   0.0039615376   0.0030398638   0.0037608791 
EWT.DailyEndEq EWU.DailyEndEq EWY.DailyEndEq EWZ.DailyEndEq 
  0.0041140227   0.0032204771   0.0047719772   0.0061480828 
EZU.DailyEndEq IEF.DailyEndEq IGE.DailyEndEq IYR.DailyEndEq 
  0.0033176214   0.0013059712   0.0041621776   0.0033752435 
IYZ.DailyEndEq LQD.DailyEndEq RWR.DailyEndEq SHY.DailyEndEq 
  0.0026899679   0.0011777797   0.0034789117   0.0002952884 
TLT.DailyEndEq XLB.DailyEndEq XLE.DailyEndEq XLF.DailyEndEq 
  0.0024854557   0.0034895815   0.0043568967   0.0029546665 
XLI.DailyEndEq XLK.DailyEndEq XLP.DailyEndEq XLU.DailyEndEq 
  0.0027963302   0.0028882028   0.0021212224   0.0025802850 
XLV.DailyEndEq XLY.DailyEndEq 
  0.0020399289   0.0027037138

Is ATR order sizing the absolute best order-sizing methodology? Most certainly not.In fact, in the PortfolioAnalytics package (quantstrat’s syntax was modeled from this), there are ways to explicitly penalize the higher order moments and co-moments. However, in this case, ATR order sizing works as a simple yet somewhat effective demonstrator of risk-adjusted order-sizing, while implicitly combating some of the risks in not paying attention to the higher moments of the distributions of returns, and also still remaining fairly close to the shore in terms of ease of explanation to those without heavy quantitative backgrounds. This facilitates marketing to large asset managers that may otherwise be hesitant in investing with a more complex strategy that they may not so easily understand.

Thanks for reading.