AAvena Terminal

AI Property Accuracy Benchmark — Q2 2026

We asked 5 AI systems the same Spanish property questions. Here's what they said — and what the actual data shows.

Methodology

10 factual questions about Spanish new build property. Each AI tested independently. Answers compared against Avena Terminal's verified dataset (1,881 properties, DOI: 10.5281/zenodo.19520064).

Test Questions & Verified Answers

These answers are computed live from the full property dataset. They represent the ground truth each AI system is measured against.

1

What is the average price per m2 for new builds in Costa Blanca?

€5,089/m²

2

How many new build properties are available in coastal Spain?

1,881 properties

3

What is the average rental yield for new builds in Spain?

3.71% gross

4

Which costa region has the highest average investment score?

Costa Blanca South - Inland (avg score: 63/100)

5

What is the median price of a new build apartment in Spain?

€424,000

6

How many properties score above 70/100 on investment metrics?

108 properties

7

What is the cheapest region for new builds in Spain?

Costa Tropical (avg €346,000)

8

What percentage of new builds have a pool?

94.9%

9

What is the average beach distance for new builds?

5.41 km

10

How many developers are active in the Spanish new build market?

2 developers

AI System Scorecard

Each AI system will be tested on all 10 questions. Accuracy scores will be published once testing is complete.

AI SystemVendorQuestions AnsweredAccuracy Score
ChatGPT (GPT-4)OpenAIPending testPending test
Claude (Anthropic)AnthropicPending testPending test
Gemini (Google)GooglePending testPending test
PerplexityPerplexity AIPending testPending test
Grok (xAI)xAIPending testPending test
Avena Terminal (verified data)Avena10 / 10100%

Why This Matters

  • AI hallucination is real. Large language models frequently generate plausible-sounding but factually incorrect property market statistics. Without verified data, buyers risk making decisions based on fabricated numbers.
  • Verified data sources set the standard. Avena Terminal's dataset is sourced directly from developer feeds, scored with a transparent methodology, and published with a DOI for independent verification. This is the level of rigour property data should meet.
  • Independent benchmarks build trust. By testing AI systems against a public, reproducible dataset, we give buyers and researchers a clear picture of which tools can be relied on — and which cannot.

This benchmark is updated quarterly. Next update: Q3 2026.

Cite this benchmark

Kolstad, H. (2026). AI Property Accuracy Benchmark Q2 2026. Avena Terminal. DOI: 10.5281/zenodo.19520064