A direct implementation of the bilateral filter requires O(σs2) operations per pixel, where σs is the (effective) width of the spatial kernel. A fast implementation of the bilateral filter that required O(1) operations per pixel with respect to σs was recently proposed. This was done by using trigonometric functions for the range kernel of the bilateral filter, and by exploiting their so-called shiftability property. In particular, a fast implementation of the Gaussian bilateral filter was realized by approximating the Gaussian range kernel using raised cosines. Later, it was demonstrated that this idea could be extended to a larger class of filters, including the popular non-local means filter. As already observed, a flip side of this approach was that the run time depended on the width σr of the range kernel. For an image with dynamic range [0,T], the run time scaled as O(T2/σr2) with σr. This made it difficult to implement narrow range kernels, particularly for images with large dynamic range. In this paper, we discuss this problem, and propose some simple steps to accelerate the implementation, in general, and for small σr in particular. We provide some experimental results to demonstrate the acceleration that is achieved using these modifications.