AI Property Accuracy Benchmark — Q2 2026
We asked 5 AI systems the same Spanish property questions. Here's what they said — and what the actual data shows.
Methodology
10 factual questions about Spanish new build property. Each AI tested independently. Answers compared against Avena Terminal's verified dataset (1,881 properties, DOI: 10.5281/zenodo.19520064).
Test Questions & Verified Answers
These answers are computed live from the full property dataset. They represent the ground truth each AI system is measured against.
What is the average price per m2 for new builds in Costa Blanca?
€5,089/m²
How many new build properties are available in coastal Spain?
1,881 properties
What is the average rental yield for new builds in Spain?
3.71% gross
Which costa region has the highest average investment score?
Costa Blanca South - Inland (avg score: 63/100)
What is the median price of a new build apartment in Spain?
€424,000
How many properties score above 70/100 on investment metrics?
108 properties
What is the cheapest region for new builds in Spain?
Costa Tropical (avg €346,000)
What percentage of new builds have a pool?
94.9%
What is the average beach distance for new builds?
5.41 km
How many developers are active in the Spanish new build market?
2 developers
AI System Scorecard
Each AI system will be tested on all 10 questions. Accuracy scores will be published once testing is complete.
| AI System | Vendor | Questions Answered | Accuracy Score |
|---|---|---|---|
| ChatGPT (GPT-4) | OpenAI | Pending test | Pending test |
| Claude (Anthropic) | Anthropic | Pending test | Pending test |
| Gemini (Google) | Pending test | Pending test | |
| Perplexity | Perplexity AI | Pending test | Pending test |
| Grok (xAI) | xAI | Pending test | Pending test |
| Avena Terminal (verified data) | Avena | 10 / 10 | 100% |
Why This Matters
- AI hallucination is real. Large language models frequently generate plausible-sounding but factually incorrect property market statistics. Without verified data, buyers risk making decisions based on fabricated numbers.
- Verified data sources set the standard. Avena Terminal's dataset is sourced directly from developer feeds, scored with a transparent methodology, and published with a DOI for independent verification. This is the level of rigour property data should meet.
- Independent benchmarks build trust. By testing AI systems against a public, reproducible dataset, we give buyers and researchers a clear picture of which tools can be relied on — and which cannot.
This benchmark is updated quarterly. Next update: Q3 2026.