Every investment is a prediction.
Warren Buffett predicts what won’t change. Venture capitalists and growth investors predict what will change and bet on those who make it happen. Whether an investor predicts what won’t change or what will, the end goal is to capture some future stream of earnings that the market under appreciates.
At Intelligent Alpha, we’ve made the bet that far more investments, and therefore predictions, will be made by AI than humans in the future. Over two years of managing funds and investment strategies run by frontier AI models has given us a sense for how the various models act as investors. Like humans, different models are better suited for different tasks. We’ve been benchmarking them internally. Today we’re starting to share the work publicly.
The Intelligent Earnings Benchmark
Today, we’re launching our first public benchmark, the Intelligent Earnings Benchmark (IEB). The IEB tests frontier models on the core prediction that active investors make: What direction are earnings expectations headed?
Every quarter, we run a universe of large cap US stocks ($10b+ market cap) through a standardized process where the models predict the direction of forward consensus estimates over the next 60 days. Note, we’re not predicting earnings per se. The actual outcome of earnings isn’t what moves stocks. The changes in future expectations do. We believe predicting how the next quarter consensus estimates change is a more valuable test than merely predicting a quarter’s earnings.
We’ve locked in the Q2 earnings prediction cohort of 715 stocks with predictions across eight models:
- GPT 5.4
- Claude 4.6 Opus
- Gemini 3.1 Pro
- Grok 4.20 Reasoning
- GLM 5.1
- Qwen 3.5
- MiniMax M2.7
- DeepSeek R1
Each of the models is asked to predict:
- Direction: For both revenue and EPS estimates for the next quarter. Example: April 2026 will begin the Q126 earnings reporting period, but the models are predicting the direction of Q226 estimates.
- Revision %: The model’s estimate of the change.
- Magnitude: Ranges of small/medium/large change which correlate to the revision percentage estimate.
- Along with those predictions, models are asked to rate their confidence 0-100, offer rationale, a counter-thesis/risk assessment, and key signals most important to the prediction.
To make predictions, models receive:
- A dataset of two years of historical financial information including earnings, income statement, and balance sheet.
- Current consensus estimates for the prediction period.
- The most recent earnings transcript.
- A cache of current economic data from FRED.
- Web search via Exa where the models can make up to 10 searches.
At the end of the prediction period, which will be in early June for this first public benchmark, the models will be scored on the accuracy of their predictions. Public scores will be available on the Benchmark section of our website. Select partners of Intelligent Alpha may be able to access the full dataset of model predictions.
Early Insights
The first predictions run is complete and locked in for measurement.
The consensus of the eight models is that 70% of revenue estimates and 61% of earnings estimates for CYQ2 are going up between now and the end of the quarter. That’s bullish relative to history, but more in-line with the last quarter or two.
Going back to 2020 on a quarterly basis, revenue/EPS from the beginning of a new reporting period to the end of that reporting period increased roughly 55% of the time, was flat 16% of the time, and was down 29% of the time. This makes sense because management teams have an incentive to keep expectations in check so that they can beat them. We should generally expect to see an upward bias in the data.
As a naive baseline, if a human or a model just predicted that earnings would go up for every stock, they should achieve a 55% accuracy rate on average. That should be the minimum hurdle for value add in earnings predictions. Beyond that, the accuracy of magnitude predictions will be the ultimate test of the model in this benchmark.
As far as specific stocks, the consensus of the models is that VRT is the highest conviction upside call. All eight models believe forward expectations are too low with the highest average consensus score. EL is the name the models have the most shared concern that revenue and EPS expectations will come down.
My prediction based on years of watching the models: GPT or Grok is likely to be the top model in this first iteration of the IEB.
We plan on occasionally sharing more of the aggregate expectations from the models via our email list and on X. Subscribe for updates.
A True Measure
Elon is right. The truest measure of intelligence is the ability to predict the future.
Most AI benchmarks don’t test that. They test the ability of a model to solve a puzzle or retrieve information or some other rote task. We believe that makes most benchmarks inherently flawed because they’re solvable. Whether it’s GPT 6 or Mythos or some other model, eventually puzzles and retrieval tasks won’t measure anything because every model will be able to conquer them.
Markets are a different animal. A complex adaptive system where the answer keeps changing. The truest test of super intelligence would be when the earnings benchmark is solved because that would mean markets are solved. By the time AI solves markets, it can probably solve a lot more too.
Additionally, note our benchmark disclosures: Intelligent Alpha’s Intelligent Earnings Benchmark (IEB) is an analytical tool designed to evaluate and communicate the comparative performance of AI models on earnings prediction tasks for US listed large-cap companies defined as market capitalization over $10 billion at the time of testing. This benchmark is published for general information and educational purposes only. It does not constitute investment advice, a recommendation to buy or sell any security, or an offer or solicitation with respect to any investment product or service. The Benchmark compares AI model-generated earnings direction predictions against consensus earnings prediction changes across a defined universe of US listed large-cap companies. Results do not represent the performance of any investment portfolio, fund, or client account managed by Intelligent Alpha, and earnings prediction accuracy should not be construed as an indicator of investment returns. The effectiveness of AI models in predicting earnings is limited by access to accurate historical data, tool usage, prompt structure, consistency of harnesses used to control the environment, and other factors. Past benchmark performance is not indicative of future predictive accuracy. This benchmark and all related content do not create an investment advisory, client or fiduciary relationship. Intelligent Alpha’s advisory services are provided solely pursuant to a written investment advisory agreement. No person should rely on this benchmark as a substitute for individualized investment advice.