
What to evaluate
How to evaluate it
Approaches
| Humans | Task | |
|---|---|---|
| Application-grounded Evaluation | Real Humans | Real Tasks |
| Human-grounded Evaluation | Real Humans | Simple Tasks |
| Functionally-grounded Evaluation | No Real Humans | Proxy Tasks |




See the taxonomy module for a review of explainability desiderata.
Evaluation is task-specific and context-dependent
It should account for both aspect of XML systems
Overall, it should assess human understanding