My Backtest, so Beautiful !

Yiannis Pavlou
Dec 12, 2023
6 min read

Most of us working on data-science related projects use historical data to predict future values. The vast majority of this group have fallen for the allure of the good-looking historical backtest chart. How can one resist? I admit it, I fell for it too. Why would anyone pick a choppy, rugged, ugly looking chart – yielding less profit with a higher drawdown - when they could choose one that is smooth, exponential and generates untold wealth in the same period? The answer is that in this case (as perhaps others) beauty can be skin deep and that the beautiful curve can have zero or negative live performance merit while the rugged one may have live results that match the backtest performance (which is favourable otherwise you’d never go live with it). The challenge is that you can’t tell simply by looking at the symmetry, beauty, smoothness, ruggedness, or any other look and feel aspect. Believe me I’ve tried, for countless hours – you can’t tell. The ugly one may perform well, badly, or neutrally on live data and so can the pretty one. The one that performs more recently can perform well, badly, or neutrally live and the one that didn’t perform well recently the same. Looking at the chart for visual traits, picking one with more recent high performance – have almost zero causal link to forward performance. I’ve looked at these charts for thousands of hours – the only thing I can tell you is that if the chart is going down and to the right there is a very high probability that it will do that in a live environment. Ok we established not to invest in an algorithm where the historical chart is down and to the right.

I know you know this blog article would be depressing if I left you here. It could discourage people from entering algorithmic trading and perhaps save them a lot of sweat, effort, disappointment, and discouragement but this is hardly how one wants to contribute to the world. In my first blog article I explained some techniques to increase quality and robustness of your algorithm – today I will discuss the most important one – parameter robustness optimization. Intuition tells you to invest in the parameters that generate the perfect historical backtest. Even though I told you, even though you read it elsewhere and even though I will tell you what to do – remember me – you will be susceptible to the charms of the perfect chart with the perfect historical performance. You have been warned.

The Solution: Optimizing for Parameters with Positive fitness AND Robustness

Effectively this is about finding parameters that not only achieve a positive fitness (net profit of a financial algorithm over a period) but also score high on a robustness criterion – which means they live in a parameter ‘zone’, ‘range’ or ‘surface’ where all values have positive fitness. This means that if the market conditions were to change somewhat – there is a much lower likelihood the algorithm performance would collapse. Think of this as a ‘tuning’ sensitivity. Imagine a radio station broadcast in stereo with crystal clear sound on a center frequency but with two neighbouring channels very close to that broadcast frequency. The chances of interruption from the other channels based on wind or obstructions is high. As a counter example imagine the same channel broadcast at another frequency, with less quality of sound but no neighbouring channels to cause obstruction. The high quality of sound won’t matter much if its full of interruptions. I’d pick the uninterrupted channel every time. But that’s just me … you could pick the perfect sound with lots of noise and interruptions.

So the goal is to find optimal algorithm parameters during optimization that have positive fitness (sound quality analogy) while also live in a stable parameter range regime (no channels that can cause interference nearby).

I’ll use the simplest possible algorithmic example to demonstrate. Imagine we are looking at two moving averages, MAslow and MAfast. When MAfast crossed above MAslow we enter a BUY order and when MAfast crossed below MAslow we enter a SELL order. MAslow and MAfastwill each have a lookback period (how many points back to average to get the result) we will call those p1 and p2.

We have two parameters that can be optimized to give us the best result, p1 lookback period and p2 lookback period. The parameters I will share with you are not from this simple system – they are from two parameters in our released algo Aieden DYNAMIC , however the analogy works. We’re proud of these proprietary filters and not ready yet to share with the world how they work.

I’m going to parameterize these MA filters – choosing p1 and p2 within a range of 1 to 35 (for both). When I run a genetic algorithm optimization (simply – a tool to optimize faster) to optimize p1 and p2. I get the top 20 results from the genetic algorithm optimization in cTrader. Think of this fitness as net profit over a three-year period (it is a little more complicated than that but for that read the previous article). Here we will compare the 1st and 10th best fitness results for the parameters (highlighted in yellow).

The best fitness is almost 3x the 10th best fitness. This is where the allure comes – why would you ever deploy optimal parameters that generate $11,309 in profit rather than parameters that generate $30,103 profit in the same time-period? You shouldn’t - right? Right?

Now I will attempt to answer my own question. If I plot a chart of the parameter (p1 & p2 value) against fitness I can see which parameter score highest for fitness. As you can see above the best fitness is achieved when p1 = 4 and when p2 = 4. Now if you look at the chart below – something interesting emerges. For p1 = 4, the parameter only has one result with positive fitness. For p2 on the other hand there are several high fitness results where p2 =4.

From this we should determine that p1 = 4 is nor robust (it has sound quality but very close interfering channels) as it is sitting as a very isolated peak on the parameter space. We shouldn’t pick p1 = 4 for this exact reason, it is vulnerable to very small fluctuations of data that vary from the historical backtest. The more interesting question is about p2 = 4. While this parameter seems highly robust (no neighbouring channels) we must look at the set of results where it is contained and check for the p1 parameter pair. If p2 = 4 appears robust but only achieves positive fitness results when it is paired with p1 parameters which are unrobust – we have an issue with sister parameter.

So let’s look at the high fitness results where p2 = 4. From the table above you can see the p2 = 4 for the top 9 fitness results. For those 9 results p1 = 4, 35, 34, 33, 28, 2, 32 and 27. When we look at those zones in the p1 parameter chart we see that none of those values are very robust.

This is very disappointing – it would suggest that our top 10 fitness results are not very robust. What about the 10th highest fitness result, where p1 = 19 and p2 = 16. When we look at those regions on the p1 and p2 parameter optimization charts by fitness we see that both p1 = 19 and p2 = 16 are VERY robust as they have multiple fitness results that include that parameter value and also values to the left and right of these numbers also have a variety of positive fitness results. Could it be that the 10th highest fitness result with a net profit of only $11,309 on historical data could perform better in a live environment than our fitness result with a net profit of $30,103. I believe the answer is yes. It will perform with performance that matches closely to the backtest, while the unrobust parameters would create a situation that there will be a large deviation between backtest results and reality. Don’t take my word for it – we commit to create two demonstration accounts which run two versions of our Aieden DYNAMIC algo – one with parameters that were optimized for peak fitness and one that was optimized for ‘robust’ fitness and parameters. We commit to make our results public on this Blog and perhaps even build it into a live demonstration on cTrader Copy.