Abstract:
Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting...Show MoreMetadata
Abstract:
Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting richer, energy-intensive sensors and processing pipelines. In such applications, the context for activating sensors and processing blocks required to accomplish a given task instance is usually manifested via multiple sensing modes. Based on this observation, we introduce a novel Commit-and-Switch (CAS) paradigm that simultaneously seeks to reduce both sensing and processing energy. In CAS, we first commit to a low-energy computational pipeline with a subset of available sensors. Then, the task context estimated by this pipeline is used to optionally switch to another energy-intensive DNN pipeline and activate additional sensors. We demonstrate how CAS's paradigm of interweaving DNN computation and sensor triggering can be instantiated principally by constructing multi-head DNN models and jointly optimizing the accuracy and sensing costs associated with different heads. We exemplify CAS via the development of the RealGIN-MH model for multi-modal target acquisition tasks, a core enabler of immersive human-agent interaction. RealGIN-MH achieves 12.9x reduction in energy overheads, while outperforming baseline dynamic model optimization approaches.
Published in: IEEE Robotics and Automation Letters ( Volume: 9, Issue: 11, November 2024)