I. Introduction
Sending accurate channel state information (CSI) feedback from user equipment (UE) to next generation Node B (gNB) is essential to help better utilize the available resources. In order to maximize throughput at the UE, a gNB allocates the resources for future transmissions in accordance with CSI feedback. According to the 3rd generation partnership project (3GPP) 5G new radio (NR) specification [1], a CSI report consists of multiple indicators, e.g., channel quality indicator (CQI) and rank indicator (RI). CQI indicates the type of modulation and code rate for the physical downlink shared channel (PDSCH) transmission, while RI indicates the rank for that transmission. In this work, we study the estimation of both RI and CQI using a shallow reinforcement learning (RL) technique. Without loss of generality, this algorithm can be applied to estimate either RI or CQI only as well, when the other is already given or fixed to some value.