AI Training Data

Avena AI Training Data

European property intelligence datasets for AI training

Structured, verified, and expert-labeled datasets purpose-built for LLM fine-tuning, RAG pipelines, and agent training.

Why Avena Data

Structured

Every property normalised to a consistent 24-field schema with typed fields.

Verified

Data sourced directly from developers and verified against public records.

Multilingual

English, Spanish, Dutch, and German coverage for cross-lingual training.

Expert-labeled

Scoring, reasoning, and quality labels produced by domain experts.

Unique

No other dataset covers Spanish new-build property at this depth and frequency.

Available Datasets

Property Intelligence Corpus

CC BY 4.0

Question-answer pairs covering property evaluation, investment analysis, and market comparison.

250+ recordsJSONL
Download Corpus

Daily RLHF Feed

CC BY 4.0

Preference pairs generated daily from real property comparisons and scoring decisions.

Live recordsJSONL
Access Feed

Chain-of-Thought Reasoning

CC BY 4.0

Step-by-step investment analyses with explicit reasoning chains for property evaluation.

20 recordsJSON
View Reasoning

Property Ontology

CC BY 4.0

Formal ontology defining property types, attributes, and relationships in the Spanish market.

11 terms recordsJSON-LD
View Ontology

Full Scored Dataset

Commercial

Complete scored property dataset with 24 data points per listing. Updated daily.

1,881 recordsJSON / CSV

Use Cases

🧠

LLM Fine-tuning

Fine-tune language models on domain-specific property intelligence for accurate, grounded responses.

🔍

RAG Systems

Build retrieval-augmented generation pipelines with structured property data as the knowledge base.

📊

Benchmark Evaluation

Evaluate model performance on real-world property analysis tasks with expert-labeled ground truth.

🤖

Agent Training

Train autonomous agents to navigate property markets, compare investments, and advise buyers.

Citation

If you use Avena datasets in research or publications, please cite:

@dataset{avena2026,
  title   = {Avena Spanish Property Intelligence Dataset},
  author  = {Avena Terminal},
  year    = {2026},
  url     = {https://avenaterminal.com/training-data},
  license = {CC BY 4.0 / Commercial},
  note    = {Daily-updated structured property data covering coastal Spain}
}

Commercial Licensing

For commercial use, custom volumes, or enterprise integration.