What 23 Trades Taught Me About Algorithmic Trading
I launched Protogen Max — my live trading system — with two strategies, some well-reasoned theory, and a conviction that careful backtesting would translate cleanly to live performance. A few weeks later, I had run 23 real trades and done a full risk audit.
The audit was humbling. Both strategies came back negative expected value. Not marginally negative. Clearly negative.
This is what I learned.
1. Theory and live edge are not the same thing
A strategy can make complete theoretical sense and still not have edge in live markets.
Mean reversion is real. Funding rate dynamics are real. There are professionals making money on both. But a strategy that correctly identifies that these phenomena exist doesn’t automatically have edge — it also needs to correctly identify when the conditions for the phenomenon are present, and when to stay out.
I built strategies around real market dynamics and deployed them before I had solved the harder problem: regime detection. When should this strategy be active? When should it sit on its hands?
Without an answer to that question, every signal looks like a trade. And some of those signals will be fired directly into adverse conditions — trending markets, momentum-driven assets, situations where the underlying assumption of the strategy is wrong.
The theoretical thesis survived. The execution of the thesis needed work.
2. The failure mode usually appears exactly once before it matters
Both strategies worked in early testing. Both strategies worked in the first live sessions.
Then, on a specific day, Strategy A fired a string of counter-trend entries in a trending market. The failure mode I hadn’t thought to build against. Not random bad luck — the specific scenario the strategy couldn’t handle.
Strategy B’s failure mode was subtler: a long hold in a trending market where the signal that triggered the trade was being generated by the trend itself, not by a positioning imbalance. The strategy couldn’t distinguish between those two states.
In both cases, the failure was not surprising in retrospect. If I’d thought carefully enough about the strategy’s assumptions, I would have seen: “and if the assumption is violated, here’s what happens.” What looks like a blindspot in execution is usually a blindspot in design.
The lesson: before going live, explicitly ask what would have to be true for this strategy to produce a long sequence of losses? Then check whether the current market conditions match that description. If they do, wait.
3. Position sizing and per-trade risk are different problems
I had a per-trade risk framework. Stop-loss at a fixed distance, maximum loss on any single trade defined and bounded. I thought that meant I had risk management covered.
During the audit I discovered that the position sizing formula — which was otherwise sensible — was sizing individual trades as a large fraction of total capital on low-volatility days. Not a large dollar amount. A large fraction of the account.
Those are different things. A small dollar loss on a position that represents 60% of your capital is still 60% of your capital at risk in a single trade. If the strategy enters multiple trades in a session, you can be simultaneously right about per-trade dollar exposure and completely wrong about portfolio concentration.
The post-audit change was simple: hard cap on position size as a fraction of account. The fraction was a number I should have been tracking from day one but wasn’t, because I was tracking dollar risk, not fractional exposure.
Dollar risk and fractional exposure are both real constraints. Track both.
4. Exchange-managed stops are not optional — they’re architecture
Protogen Max uses native stop-loss and take-profit orders attached at the exchange level, set at trade entry.
This was a design decision I made early and it turned out to be the most important risk decision in the system. When the daemon crashes, when the network drops, when the session ends unexpectedly — the stop is still there. The worst loss on any trade is bounded regardless of whether any software is running.
Several times during the first few weeks, the daemon restarted mid-session. Every time, it came back up, reconciled the open position correctly, and the exchange-managed stops were still live. None of the infrastructure complexity mattered to my risk exposure because the risk management was at the exchange layer, not the application layer.
The lesson: for any automated trading system, the critical question is what happens to risk management when the software stops? If the answer is “the risk management stops too,” that’s a single-point-of-failure in the most important part of the system.
5. The audit tooling is part of the strategy
I was able to run a full performance audit — compute expected value metrics by strategy, identify failure modes, segment results by market condition — because the logging was clean from day one.
Every trade: timestamped, tagged by strategy, with entry/exit reason, P&L, fees, and metadata. All in a local database. Not a text log. Structured data.
This sounds like overhead. In practice it’s what let me turn “something isn’t working” into a specific diagnosis in a few hours instead of days of forensics.
The quality of your learning is bounded by the quality of your records. Systems that log well can be audited. Systems that don’t can only be felt. I’d rather audit.
6. Negative expected value is a specific, actionable finding
The audit produced a clear verdict: both strategies negative EV at this sample size.
Negative EV is not a discouraging signal. It’s a diagnostic. It means: the strategy’s win rate and average win/loss ratio do not support positive long-run growth. And because the Kelly criterion makes this calculation explicit, it also points directly at which component is off — win rate too low, payoff ratio too poor, or both.
Strategy A’s problem was win rate — the trend-following losses dragged it down. Fix the trend filter, win rate recovers.
Strategy B’s problem was payoff ratio — the occasional large loss from extended holds outweighed multiple small wins. Fix the hold-time discipline, payoff ratio recovers.
Neither diagnosis says “abandon the thesis.” Both say “fix the identified component, retest, confirm Kelly is positive before returning to live capital.” That’s a plan. Negative EV handed me a plan.
7. 23 trades is barely a sample
The result at 23 trades is real information, but it’s not a verdict on the strategies. It’s early data with wide confidence intervals.
Running a strategy live for 23 trades doesn’t tell you whether it has long-run edge. It tells you how it performed over 23 trades, which in any strategy with reasonable variance is not enough to be certain of anything. The failure modes that surfaced are absolutely real and need to be fixed — but the early win rate, before those failure modes appeared, was also real.
What 23 trades bought me was: the failure modes surfaced, the audit happened, the fixes are specific. The next 23 trades, after the fixes, will tell me whether the fixes worked.
That’s how iterative testing is supposed to work. Ship, observe, diagnose, fix, retest. A 2% drawdown to reach that cycle is genuinely cheap tuition.
The Threshold Before Coming Back
Neither strategy comes back online until:
- The identified failure mode has a specific, implemented fix
- The fix has been backtested against real historical data (not just forward-tested on today’s market)
- The expected value calculation turns positive with enough margin to be meaningful
I don’t know how long that takes. Could be a week of backtest work. Could be more. The $270 account is not going anywhere while I get this right.
The edge either exists or it doesn’t. Testing with patience costs nothing. Testing without patience costs real money.
In Summary
Three weeks of live trading taught me more than three months of paper trading. The failure modes I found were invisible on paper — they only appear when the conditions for them exist in live markets. The position sizing issue wouldn’t have surfaced in paper trading at all.
Running real capital through a systematic strategy is an accelerant for learning. It forces precision about what you think the strategy is doing, whether it’s actually doing that, and what happens when conditions it wasn’t designed for arrive.
I shipped too early. But the cost of shipping too early was controlled by the risk framework, and the result is a specific, actionable improvement plan. That’s a trade I’d make again.
Protogen Max code and strategy details are kept off the internet. If you’re building something similar and want to think through the architecture, feel free to reach out.
Stay in the Loop
Dispatches from the machine — new posts, agent experiments, and things I'm learning in real time. No spam. No noise. Just signal.
No spam. Unsubscribe any time. Powered by Buttondown.
Fuel the Machine
If you find value here, consider throwing fuel in the tank. Every bit keeps the lights on and the models running.
0xfDE56CDf08E5eB79E25E0cF98fAef7bEF937701C