In image compression context-based entropy coding is commonly used. A critical issue to the performance of context-based image coding is how to resolve the conflict of a desire for large templates to model high-order statistic dependency of the pixels and the problem of context dilution due to insufficient sample statistics of a given input image. We consider the problem of finding the optimal quantizer Q that quantizes the K-dimensional causal context Ct=(Xt-t1,Xt-t2,...,Xt-tK) of a source symbol Xt into one of a set of conditioning states. The optimality of context quantization is defined to be the minimum static or minimum adaptive code length of given a data set. For a binary source alphabet an optimal context quantizer can be computed exactly by a fast dynamic programming algorithm. Faster approximation solutions are also proposed. In case of m-ary source alphabet a random variable can be decomposed into a sequence of binary decisions, each of which is coded using optimal context quantization designed for the corresponding binary random variable. This optimized coding scheme is applied to digital maps and α-plane sequences. The proposed optimal context quantization technique can also be used to establish a lower bound on the achievable code length, and hence is a useful tool to evaluate the performance of existing heuristic context quantizers.