Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations-using the configuration best on the average may actually result in worsened energy for a specific program. We therefore introduce a loop cache exploration tool that analyzes a particular program's profile, rapidly explores the possible configurations, and generates the configuration with the greatest power savings. We introduce a simulation-based approach and show the good energy savings that a customized loop cache yields. We also introduce a fast estimation-based approach that obtains nearly the same results in seconds rather than tens of minutes or hours.