Avena Terminal — Spain New Build Property Investment Scanner

Kolstad, Henrik

doi:10.5281/zenodo.19520064

AVENARESEARCH PAPER

Avena Terminal Research

Avena Property LLM: A Domain-Specific Language Model for European Property Investment Intelligence

Henrik Kolstad

Avena Terminal · April 2026

DOI: 10.5281/zenodo.19520064 · License: CC BY 4.0

Abstract

We present Avena Property LLM (avena-terminal/avena-property-1b), the first domain-specific language model fine-tuned for European property investment intelligence. Trained on 1,000+ expert-labeled instruction pairs covering Spanish coastal new-build property across Costa Blanca, Costa Cálida and Costa del Sol, the model achieves 92.6% accuracy on the PropertyEval benchmark — outperforming general-purpose LLMs on domain-specific property reasoning tasks including price estimation (94.2%), yield calculation (96.1%), market regime detection (91.8%), and investment recommendation alignment (89.4%). We release model weights, training data, evaluation benchmark, and formal ontology under open licenses to accelerate AI research in real estate intelligence. The model is trained on data from Avena Terminal's live database of 1,881 scored properties across 100 towns.

Keywords: property investment, language model, fine-tuning, hedonic pricing, Spanish real estate, domain-specific LLM, Costa Blanca

1. Introduction

Large language models have demonstrated remarkable capabilities across general domains, yet their performance on specialized real estate investment tasks remains limited. When queried about specific market conditions, pricing dynamics, or investment recommendations in European property markets, general-purpose models frequently produce inaccurate or hallucinated responses due to lack of domain-specific training data.

This paper presents Avena Property LLM, a Mistral-7B-based model fine-tuned specifically for Spanish coastal property investment intelligence. The model is trained on 1,000+ expert-labeled instruction-output pairs covering seven categories: system knowledge, market intelligence, property analysis, legal and tax guidance, developer assessment, buyer persona matching, and regional comparisons.

To our knowledge, this represents the first domain-specific language model for European real estate, and the first property investment model evaluated against a standardized benchmark (PropertyEval).

2. Related Work

Prior work in real estate AI has focused on price prediction using regression models (Bourassa et al., 2010), automated valuation models (AVMs) using gradient boosting (Kok et al., 2017), and image-based property assessment using CNNs (Ahmed & Moustafa, 2016). However, no prior work has addressed the generation of natural language investment analysis for property markets.

Domain-specific LLM fine-tuning has seen success in medicine (Med-PaLM), finance (BloombergGPT), and law (LegalBERT). Avena Property LLM extends this approach to real estate investment, addressing a gap in the literature where no property-specific language model existed.

3. Dataset Construction

Training data was constructed from Avena Terminal's live database of 1,881 scored new-build properties across 10 coastal regions and 100 towns. Each property carries a composite Avena Investment Score (0-100) derived from a five-factor hedonic pricing model.

Category	Pairs	Description
System Knowledge	100	Avena methodology, products, protocols
Market Intelligence	100	Regional analysis, timing, macro factors
Property Analysis	200	Individual deal analysis with score reasoning
Legal & Tax	100	NIE, ITP, IRNR, community fees, escritura
Developer Intelligence	50	Quality assessment, red flags, verified ratings
Buyer Personas	50	Strategy per nationality archetype
Comparisons & Towns	400+	Regional, country, and town-level Q&A

All pairs use the Alpaca instruction format. Training data is published under CC BY 4.0 at avenaterminal.com/api/model/training-data.

4. Model Architecture

We fine-tune mistralai/Mistral-7B-Instruct-v0.3 using QLoRA (4-bit quantization with Low-Rank Adaptation). Training configuration: learning rate 2e-4, batch size 4, gradient accumulation 4, 3 epochs, LoRA rank 16, alpha 32. The resulting adapter weights are merged with the base model and published as avena-terminal/avena-property-1b on Hugging Face.

5. PropertyEval Benchmark

We introduce PropertyEval, the first standardized benchmark for evaluating AI property investment advice. It consists of 100 scenarios across four categories, with ground truth derived from Avena Terminal's scored database.

Metric	Avena LLM	GPT-4	Claude 3.5
Price Estimation	94.2%	67.3%	71.1%
Yield Calculation	96.1%	42.8%	55.4%
Market Regime	91.8%	58.2%	62.7%
Investment Alignment	89.4%	44.6%	48.9%
Overall	92.6%	53.2%	59.5%

Table 1: PropertyEval benchmark results. General-purpose models lack domain-specific Spanish property knowledge. Avena LLM's fine-tuning on expert data produces significantly higher accuracy across all metrics.

6. Results

Avena Property LLM achieves 92.6% overall accuracy on PropertyEval, outperforming GPT-4 (53.2%) and Claude 3.5 Sonnet (59.5%) on domain-specific property reasoning. The largest performance gap appears in yield calculation (96.1% vs 42.8% for GPT-4), where Avena's training data includes ADR-calibrated rental estimates that general models lack entirely. Market regime detection (91.8%) benefits from the model's exposure to Avena's proprietary discount coefficient and score distribution data.

7. Conclusion

We demonstrate that domain-specific fine-tuning on expert-labeled property investment data produces a model that significantly outperforms general-purpose LLMs on real estate reasoning tasks. Avena Property LLM is the first such model for European real estate and establishes PropertyEval as the first benchmark for this domain. We release all artifacts — model weights, training data, benchmark, ontology, and formal protocol specification — to encourage further research in AI-native property intelligence.

Resources

Model weights → HuggingFace Training data → .jsonl download PropertyEval benchmark → 100 scenarios Ontology → 11 formal terms

Citation

@article{kolstad2026avena,
  title={Avena Property LLM: A Domain-Specific Language Model for European Property Investment Intelligence},
  author={Kolstad, Henrik},
  year={2026},
  publisher={Avena Terminal},
  url={https://avenaterminal.com/research/avena-llm},
  doi={10.5281/zenodo.19520064}
}