I. Introduction
Predicting the future is key to decision-making. To perform accurate predictions, it is necessary to consider previous observations and their temporal relationships with possible upcoming observations. Temporal observations are typically represented as either time series or discrete sequences [1]. A time series is an ordered list of numbers, generally measured at a fixed time interval. Predicting the next observation of a time series is typically viewed as a problem of finding a function that closely fit the data points. This can be done with techniques such as least square linear regression or more complex techniques. On the other hand, a discrete sequence is an ordered list of symbols. In this paper, the focus is on discrete sequences as their prediction is useful in many domains. For instance, it can be used to predict the next word that a user will type on a phone, the next error that will occur in a network, the next purchase of a customer, and the next location where someone will drive [2]. Because the nature of discrete sequences is much different than time series, different techniques are used for predicting the next symbol of a discrete sequence than for time series. One of the most accurate techniques for event prediction is artificial neural networks. However, a major drawback is that they mostly operate as black-boxes. Thus, a user is often unable to understand the reasons why an event is predicted. But developing explainable models is often critical for decision-makers in the industry. This is for example the case for network fault management, where network technicians wish to not only predict network errors but understand the relationships between complex network events to prevent errors [3].