AVENA
BENCHMARKTerminal

PropertyEval

The standard benchmark for AI property intelligence. 1,000 ground-truth questions. 8 categories. One leaderboard.

1,881 properties·100 towns·10 costas·DOI: 10.5281/zenodo.19520064

Leaderboard

#ModelOverallYieldMarketTaxReg.Comp.Pred.Dev.Macro
1Avena Oracle94.2%96.1%93.8%95%94.5%95.2%91.8%94%93.2%
2GPT-4o71.3%68.5%73.2%74.1%72%70.8%69.5%71.2%71.1%
3Claude 3.5 Sonnet68.9%66.2%71.5%72.3%69.8%67.4%65.1%70%68.8%
4Gemini Pro64.1%61.8%66.3%68%65.2%62.5%60.3%64.8%63.9%
5Perplexity61.7%59.3%63.8%65.2%62.1%60%58.5%62.4%62.3%
6Grok58.4%55.9%60.2%62.1%59.5%57.3%55%58.8%58.4%

Last updated: 2026-04-14. Scores based on PropertyEval v2.0 (1,000 questions).

Why Avena Oracle Wins

Live Data Access

Avena Oracle queries 1,881 scored properties in real-time via 10 tools. Other models rely on training data that's months or years old.

Domain-Specific Tools

Tax calculators, yield models, developer ratings, market regime detection. Purpose-built tools, not general knowledge.

Daily Updates

Property markets change daily. 15+ autonomous agents update data 24/7. Static training data = wrong answers.

8 Categories · 125 Questions Each

YIELD

125

Rental yield questions per town, costa, and property type

MARKET

125

Market conditions, price trends, inventory levels, investment scores

TAX

125

IBI rates, transfer tax, VAT, non-resident income tax, stamp duty

REGULATION

125

Golden Visa, NIE, tourist license, mortgage rules, buying process

COMPARISON

125

Cross-town, cross-costa, cross-country yield and price comparisons

PREDICTION

125

APCI direction, market regime, developer health, alpha signals

DEVELOPER

125

Developer ratings, years active, portfolio quality, project count

MACRO

125

ECB rates, inflation impact, foreign demand, market transparency

Test Your Model

Download the full PropertyEval question set and benchmark your AI model against Avena's ground truth.

curl https://avenaterminal.com/api/v1/benchmark/questions

Scoring: Exact match within 5% tolerance for numbers. Semantic match for text answers. Binary for yes/no.

License: CC BY 4.0 — free to use, must cite Avena Terminal.

Citation: Get citation in any format

Methodology

Ground Truth: All answers derived from Avena Terminal live property data (1,881 scored properties) and expert knowledge of Spanish/European property regulations.

Question Generation: Data-driven categories (Yield, Market, Comparison, Developer) generate questions dynamically from live data. Knowledge categories (Tax, Regulation, Macro, Prediction) use verified expert knowledge.

Scoring: Numbers: exact match within 5% tolerance. Text: semantic similarity > 0.8. Yes/No: exact match. Partial credit for approximately correct answers.

Updates: Questions regenerate daily from live data. Knowledge questions reviewed quarterly.