Day 20: Strategy Sample

On Day 19, we introduced circular block sampling and used it to test the likelihood the 200-day SMA strategy would outperform buy-and-hold over a five year period. We found that the 200-day outperformed buy-and-hold a little over 25% of the time across 1,000 simulations. The frequency of the 200-day’s Sharpe Ratio exceeding buy-and-hold was about 30%. Today, we apply the same analysis to the 12-by-12 strategy. When we run the simulations we find that the strategy has an average underperformance of 11% points vs.

Day 19: Circular Sample

On Day 18 we started to discuss simulating returns to quantify the probability of success for a strategy out-of-sample. The reason for this was we were unsure whether or how much to merit the 12-by-12’s performance relative to the 200-Day SMA. We discussed various simulation techniques, but we settled on historical sampling that accounts for autocorrelation of returns. We then showed the autocorrelation plots for the strategy and the three benchmarks – buy-and-hold, 60-40, and 200-day – with up to 10 lags.

Day 18: Autocorrelation Again!

On Day 17 , we compared the 12-by-12 and 200-day SMA strategies in terms of magnitude and duration of drawdowns, finding in favor of the 200-day. We also noted that most of the contributors to the differences in performance were due to two periods at the beginning and end of the period we were examining. That suggests regime driven performance, which begs the question, how much can we rely on our analysis if we have no certainty that any of the regimes identified will recur, or, even be forecastable?

Day 17: Drawdowns

On Day 16, we showed the adjusted 12-by-12 strategy with full performance metrics against buy-and-hold, the 60-40 SPY-IEF ETF portfolio, and the 200-day SMA strategy. In all cases, it tended to perform better than the benchmarks. However, against the 200-day SMA that performance came primarily at the end of the period. This begs the question of what to make of the performance differences between the 12-by-12 and 200-day SMA strategies.

Day 16: Comps

On Day 15 we adjusted our model to use more recent data to forecast the 12-week look forward return. As before, we used that forecast to generate a trading signal that tells us to go long the SPY if the forecast is positive, and exit (or short for the long-short strategy) if otherwise. We saw this tweak generated about 10% points of cumulative outperformance and a 20% point higher Sharpe Ratio.

Day 15: Backtest II

On Day 14 we showed how the trading model we built was snooping and provided one way to correct it. Essentially, we ensure the time in which we actually have the target variable data aligns with when the trading signals are produced. We then used the value of the next time step to input into the model to generate a forecast. If the forecast was positive, we’d go long the SPY ETF, if negative stay out of the market or short depending on the strategy.

Day 14: Snooping

Guess what? The model we built in our last post actually suffers from snooping. We did this deliberately to show how easy it is to get mixed up when translating forecasting models into trading signals. Let’s explain. Our momentum model uses a 12-week cumulative return lookback to forecast the next 12-week cumulative return. That may have produced a pretty good explanatory model compared to the others. But we need to be careful.

Day 13: Backtest I

Unlucky 13! Or contrarian indicator? There’s really nothing so heartwarming as magical thinking. Whatever the case, on Day 12 we iterated through the 320 different model and train step iterations to settle on 10 potential candidates. Today, we look at the best performing candidate and discuss the process to see if the forecasts produce a viable trading strategy. As we noted before, we could have launched directly into testing Fibonacci retracements with Bollinger Band breakouts filtered by Chaikin Volatility, but what would be the thesis as to why these indicators work better than others?

Day 12: Iteration

In Day 11, we presented an initial iteration of train/forecast steps to see if one combination performs better than another. Our metric of choice was root mean-squared error (RMSE)1 which is frequently used to compare model performance in machine learning circles. The advantage of RMSE is that it is in the same units as the forecast variable. The drawback is that it is tough to interpret on its own. A model with a RMSE of 10 is better than one with 20.

Day 11: Autocorrelation

On Day 10, we analyzed the performance of the 12-by-12 model by examining the predicted values and residuals. Our initial takeaway suggested the model did seem not overly biased or misspecified in the -10% to 10% region. But when it gets outside that range, watch out! We suspected that there was some autocorrelation in the residuals, which we want to discuss today. There are different statistical tests for autocorrelation and normality that would take too long to explain in a blog post.