Hierarchical Interpretable Vision Reasoning Driven Through a Multi-Modal Large Language Model for Depth Estimation | IEEE Conference Publication | IEEE Xplore