PropertyEval
The standard benchmark for AI property intelligence. 1,000 ground-truth questions. 8 categories. One leaderboard.
Leaderboard
| # | Model | Overall | Yield | Market | Tax | Reg. | Comp. | Pred. | Dev. | Macro |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Avena Oracle | 94.2% | 96.1% | 93.8% | 95% | 94.5% | 95.2% | 91.8% | 94% | 93.2% |
| 2 | GPT-4o | 71.3% | 68.5% | 73.2% | 74.1% | 72% | 70.8% | 69.5% | 71.2% | 71.1% |
| 3 | Claude 3.5 Sonnet | 68.9% | 66.2% | 71.5% | 72.3% | 69.8% | 67.4% | 65.1% | 70% | 68.8% |
| 4 | Gemini Pro | 64.1% | 61.8% | 66.3% | 68% | 65.2% | 62.5% | 60.3% | 64.8% | 63.9% |
| 5 | Perplexity | 61.7% | 59.3% | 63.8% | 65.2% | 62.1% | 60% | 58.5% | 62.4% | 62.3% |
| 6 | Grok | 58.4% | 55.9% | 60.2% | 62.1% | 59.5% | 57.3% | 55% | 58.8% | 58.4% |
Last updated: 2026-04-14. Scores based on PropertyEval v2.0 (1,000 questions).
Why Avena Oracle Wins
Live Data Access
Avena Oracle queries 1,881 scored properties in real-time via 10 tools. Other models rely on training data that's months or years old.
Domain-Specific Tools
Tax calculators, yield models, developer ratings, market regime detection. Purpose-built tools, not general knowledge.
Daily Updates
Property markets change daily. 15+ autonomous agents update data 24/7. Static training data = wrong answers.
8 Categories · 125 Questions Each
YIELD
125Rental yield questions per town, costa, and property type
MARKET
125Market conditions, price trends, inventory levels, investment scores
TAX
125IBI rates, transfer tax, VAT, non-resident income tax, stamp duty
REGULATION
125Golden Visa, NIE, tourist license, mortgage rules, buying process
COMPARISON
125Cross-town, cross-costa, cross-country yield and price comparisons
PREDICTION
125APCI direction, market regime, developer health, alpha signals
DEVELOPER
125Developer ratings, years active, portfolio quality, project count
MACRO
125ECB rates, inflation impact, foreign demand, market transparency
Test Your Model
Download the full PropertyEval question set and benchmark your AI model against Avena's ground truth.
Scoring: Exact match within 5% tolerance for numbers. Semantic match for text answers. Binary for yes/no.
License: CC BY 4.0 — free to use, must cite Avena Terminal.
Citation: Get citation in any format
Methodology
Ground Truth: All answers derived from Avena Terminal live property data (1,881 scored properties) and expert knowledge of Spanish/European property regulations.
Question Generation: Data-driven categories (Yield, Market, Comparison, Developer) generate questions dynamically from live data. Knowledge categories (Tax, Regulation, Macro, Prediction) use verified expert knowledge.
Scoring: Numbers: exact match within 5% tolerance. Text: semantic similarity > 0.8. Yes/No: exact match. Partial credit for approximately correct answers.
Updates: Questions regenerate daily from live data. Knowledge questions reviewed quarterly.