Learn / Backtesting Academy / Benchmark Selection
Configuration

Benchmark Selection

Quick Answer

Benchmark Selection is the reference index a backtest is compared against. The default in SledgeKey is the S&P 500, commonly tracked through the SPY ETF, and the right benchmark is the one that honestly represents the universe and style of the strategy being tested.

What is Benchmark Selection?

A benchmark is the yardstick a strategy is measured against. Without one, a fifteen percent annual return is just a number. Against an S&P 500 that compounded at seventeen percent over the same window, that fifteen percent becomes a story about underperformance. Benchmark Selection is the choice of yardstick.

In a backtest, the benchmark is a reference series of returns running over the same calendar dates as the simulated portfolio. The most common choice is a broad U.S. equity index. SPY, which tracks the S&P 500, is the default in most retail and professional tools because it represents the market most U.S. equity strategies are implicitly trying to beat. Other reasonable choices include total-market funds such as VTI, factor-tilted ETFs like VLUE for value or MTUM for momentum, and sector-specific funds when the strategy concentrates in one part of the market.

The benchmark is not a competitor; it is a reference. The point of comparing to one is to separate the part of returns that came from the strategy from the part that came from being in the market at all. A bull-market backtest can show double-digit returns from a screen that adds no real value, simply because the rising tide lifted everything. A benchmark that captures the same tide makes the strategy's actual contribution visible.

Why Benchmark Selection Matters in Backtesting

Benchmark choice changes three things on the results page.

It changes alpha, the difference between the portfolio's return and the benchmark's return. A strategy that outperformed SPY by three percent might match VTI almost exactly, because VTI captures small-cap and mid-cap names that the S&P 500 leaves out. The strategy did not get worse; the benchmark got more honest about what was already inside the portfolio.

It changes whether the comparison is fair in the first place. A backtest of a small-cap value screen compared against SPY is not a clean test of the screen, because SPY is dominated by mega-cap growth. The portfolio's relative return is then a mix of the screen's quality plus the small-versus-large and value-versus-growth currents that swept the period. A benchmark that already adjusts for those currents, such as a small-cap value ETF, isolates the screen's contribution far more cleanly.

It changes the reading of a strategy's risk profile when paired with metrics like beta, tracking error, and information ratio. Beta against the S&P 500 says something different from beta against a sector ETF or a factor fund. Two strategies can look like calm low-beta plays against the wrong benchmark and aggressive concentrated bets against the right one, with no change to the underlying holdings.

How SledgeKey Implements Benchmark Selection

The configuration panel includes a Benchmark control. The default is SPY, the ETF that tracks the S&P 500, and the field offers a curated list of common reference funds covering broad market, total market, sector, and factor exposures. Whichever benchmark is selected, the platform pulls the matching daily price history over the same window as the strategy, reports its total return alongside the portfolio's, and uses it as the reference line in the cumulative-return chart and the relative-performance metrics.

The benchmark and the strategy are aligned on calendar dates, so both are measured over identical periods. Changing the benchmark does not change the screen, the rebalance schedule, the universe, or the strategy's standalone performance numbers. It changes only the comparison: the relative return, the alpha, and any beta-style figures shift to reflect the new reference.

The default of SPY is a defensible starting point for almost any U.S. equity strategy. It is the market the strategy is implicitly trying to beat, the most widely used reference in academic and industry literature, and the index a typical investor would otherwise hold passively. For strategies whose universe departs sharply from large-cap U.S. equity, switching to a more representative benchmark is part of doing a clean comparison rather than a stylistic preference.

Common Pitfalls

The most common pitfall is benchmark mismatch. A strategy that screens for small-cap names compared against SPY will look more impressive than it is in years when small caps outperform large, and worse than it is when the reverse happens. Either way, the comparison says less about the strategy's quality and more about the size cohort the strategy happens to live in.

A second pitfall is benchmark shopping. Running the same backtest against four benchmarks and reporting the one that makes the strategy look best is a tilt toward the past, not toward truth. A benchmark should be chosen for its honest representation of the strategy's universe before the result is examined, not after the numbers come back and the most flattering one is picked.

A third pitfall is forgetting that benchmark returns include their own assumptions. ETF benchmarks return the price of the ETF itself, which already nets out the fund's expense ratio and any tracking drift. Index benchmarks usually do not. A strategy that beats the raw S&P 500 index by ten basis points may match SPY almost exactly, because the index figure is gross and the ETF figure is net of costs an investor would actually pay.

Watch Out

The single biggest mistake in benchmarking is comparing a narrow strategy to a broad index. If your screen produces a small-cap value tilt, a small-cap value benchmark will tell you whether the screen adds value beyond the tilt itself. SPY will tell you only whether that tilt happened to be kind to you over the chosen window.

See Benchmark Selection in your own backtest

Run a backtest on any screening strategy and compare it against SPY or a more targeted benchmark on point-in-time data, free.

Run a Backtest
Written by The SledgeKey Team · Last updated May 8, 2026