# Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests

This post will show how to use Monte Carlo to test for signal intelligence.

Although I had rejected this strategy in the last post, I was asked to do a monte-carlo analysis of a thousand random portfolios to see how the various signal processes performed against said distribution. Essentially, the process is quite simple: as I’m selecting one asset each month to hold, I simply generate a random number between 1 and the amount of assets (5 in this case), and hold it for the month. Repeat this process for the number of months, and then repeat this process a thousand times, and see where the signal processes fall across that distribution.

I didn’t use parallel processing here since Windows and Linux-based R have different parallel libraries, and in the interest of having the code work across all machines, I decided to leave it off.

Here’s the code:

```randomAssetPortfolio <- function(returns) {
numAssets <- ncol(returns)
numPeriods <- nrow(returns)
assetSequence <- sample.int(numAssets, numPeriods, replace=TRUE)
wts <- matrix(nrow = numPeriods, ncol=numAssets, 0)
wts <- xts(wts, order.by=index(returns))
for(i in 1:nrow(wts)) {
wts[i,assetSequence[i]] <- 1
}
randomPortfolio <- Return.portfolio(R = returns, weights = wts)
return(randomPortfolio)
}

t1 <- Sys.time()
randomPortfolios <- list()
set.seed(123)
for(i in 1:1000) {
randomPortfolios[[i]] <- randomAssetPortfolio(monthRets)
}
randomPortfolios <- do.call(cbind, randomPortfolios)
t2 <- Sys.time()
print(t2-t1)

algoPortfolios <- sigBoxplots[,1:12]
randomStats <- table.AnnualizedReturns(randomPortfolios)
algoStats <- table.AnnualizedReturns(algoPortfolios)

par(mfrow=c(3,1))
hist(as.numeric(randomStats[1,]), breaks = 20, main = 'histogram of monte carlo annualized returns',
xlab='annualized returns')
abline(v=as.numeric(algoStats[1,]), col='red')
hist(as.numeric(randomStats[2,]), breaks = 20, main = 'histogram of monte carlo volatilities',
xlab='annualized vol')
abline(v=as.numeric(algoStats[2,]), col='red')
hist(as.numeric(randomStats[3,]), breaks = 20, main = 'histogram of monte carlo Sharpes',
xlab='Sharpe ratio')
abline(v=as.numeric(algoStats[3,]), col='red')

allStats <- cbind(randomStats, algoStats)
aggregateMean <- apply(allStats, 1, mean)
aggregateDevs <- apply(allStats, 1, sd)

algoPs <- 1-pnorm(as.matrix((algoStats - aggregateMean)/aggregateDevs))

plot(as.numeric(algoPs[1,])~c(1:12), main='Return p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(1-as.numeric(algoPs[2,])~c(1:12), ylim=c(0, .5), main='Annualized vol p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')

plot(as.numeric(algoPs[3,])~c(1:12), main='Sharpe p-values',
xlab='Formation period', ylab='P-value')
abline(h=0.05, col='red')
abline(h=.1, col='green')
```

And here are the results:  In short, compared to monkeys throwing darts, to use some phrasing from the Price Action Lab blog, these signal processes are only marginally intelligent, if at all, depending on the variation one chooses. Still, I was recommended to see this process through the end, and evaluate rules, so next time, I’ll evaluate one easy-to-implement rule.

NOTE: while I am currently consulting, I am always open to networking, meeting up (Philadelphia and New York City both work), consulting arrangements, and job discussions. Contact me through my email at ilya.kipnis@gmail.com, or through my LinkedIn, found here.

## 7 thoughts on “Hypothesis Driven Development Part III: Monte Carlo In Asset Allocation Tests”

1. Emlyn says:

Hi,

Am enjoying the new testing focus.

The particular MC test above may not result in a stable testing distribution because the number of possible rule sequences is equal to (num_assets)^(num months in sample) (i.e. permutation with repitition). If you’re only sampling 1000 paths, the sampling proportion is incredibly low assuming a universe of 5 asset and a time period of ~144 months.

The second issue to consider is what null hypothesis does your simulated distribution truly represent? Given the freedom of your simulation process, these simulated rules can be extremely different from the actual tested rule paths in terms of average asset duration, number of asset changes, average number picks per asset over the sample period, etc. There is also the related issue of strong autocorrelation in the actual rule sequence which is broken by the random generator.

Finally, you’re testing 12 different rules and so there’s a multiple testing issue, whereby if you’re looking at all 12 rules in conjunction then your statistic for hypothesis testing is actually the MAXIMUM of the 12 means/sharpes/vols/etc. rather than the single values themselves.

Chapters 5 to 7 of David Aronson’s (2011) book, Evidence Based Technical Analysis, deal with some of these issues in a very readable and applicable manner.

Hope this helps,
Emlyn

• Ilya Kipnis says:

Hello Emlyn,

Regarding an unstable distribution, well, with MC simulation, under that computation, there’s no such thing as a stable distribution, practically speaking, since the number of permutations is larger than the number of subatomic particles in the universe.

Regarding autocorrelation in the rule sequence, yes, that’s true. That falls under the “how intelligent is the system” part. A random system completely ignores any autocorrelation, and any property of the markets.

Regarding the multiple testing issue, do you mean the maximum p-value across all the rules? That seems far too conservative since different parameter sets can have vastly different meanings. For instance, a 10-20 SMA crossover is vastly different than a 50-200 by nature. In any case, the reason I was testing 1-12 was that’s the classical momentum range.

• Emlyn says:

Agreed, you can never cover all the permutations but you should at least strive for efficient sampling of the no-skill hyperplane of possible rules. On the one hand, smart psuedo-random sequencing and variance reduction techniques (my first point). On the other hand, control your random generation process so that you limit your search to a smaller, more relevant subspace of the hyperplane (my second point).

An approach which controls for this is to use the set of signals as is (i.e. maintain the exact characteristics of the rule sequence) but pair these given sequences with randomly scrambled sets of market returns. This proxies the hypothesis that the rule sequence was generated from a no-skill strategy.

What I meant with the multiple testing is that your probability of finding a ‘good’ strategy increases with the number of strategies considered, irrespective of whether they’re all random or not. Obviously, the more correlated the strategies are the slower this probability will grow. Given that you’re looking at 12 strategies above, a correct test for whether any of these strategies is actually ‘good’ means that you need to generate 12 strategies in each simulated run and record the value of the best statistics only. Then repeat this process K times to get the relevant no-skill distributions applicable for considering 12 strategies simultaneously.

2. Pingback: Distilled News | Data Analytics & R