When Explainability Tools Mislead
- James W.
- 3 days ago
- 1 min read

SHAP values. Feature importance. Attention maps. These tools make black-box AI models seem explainable.
They tell you *which factors mattered*. They don't tell you *how the model is using those factors*.
Example: explainability tool shows "LTV is 40% important" for a credit decision.
Does this mean: the model thinks high LTV is always bad? Sometimes bad depending on other factors? Non-linearly related to risk?
The tool won't tell you.
Worse: a feature might appear important because the model overfit to its noise, not because it's genuinely predictive.
Regulators seeing explainability output might incorrectly assume the model is working as intended.
ACRGA-EXPLAIN goes deeper: model cards documenting limitations, governance committee review questioning model logic, counterfactual analysis showing what would change decisions.
Not just transparency theater. Actual governance.

Comments