Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. The case of two random variables is particularly challenging since no (conditional) independences can be exploited. Recent methods that are based on additive noise models suggest the following principle: Whenever the joint distribution P(X,Y) admits such a model in one direction, e.g., Y = f(X)+N, N ⊥X, but does not admit the reversed model X=g(Y)+Ñ, Ñ ⊥ Y, one infers the former direction to be causal (i.e., X → Y). Up to now, these approaches only dealt with continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work, we extend the notion of additive noise models to these cases. We prove that it almost never occurs that additive noise models can be fit in both directions. We further propose an efficient algorithm that is able to perform this way of causal inference on finite samples of discrete variables. We show that the algorithm works on both synthetic and real data sets.