Training Data Quality
- James W.
- 3 days ago
- 1 min read

LinkedIn Post 10: Training Data Quality
Your synthetic comparable algorithm is trained on 10,000 transactions. That sounds robust.
But what transactions? What time period? What geographic scope? What data quality filters?
If the training data is fundamentally different from the subject market, the synthetic output is unreliable. Using coastal resort data to value inland agricultural property? Problematic. Using 2008 financial crisis data in a normal market? Concerning. Using only transaction data from one property type to value a different type? Risk.
Before using synthetic comparables, assess the training data quality. Does it appropriately represent the relevant market? Are significant property types or time periods missing? Were exclusions appropriate?
Document your assessment. It becomes part of your due diligence record.

Comments