In this paper, we consider the distributive queue-aware power and subband allocation design for a delay-optimal OFDMA uplink system with one base station, K users and NF independent subbands. Each mobile has an uplink queue with heterogeneous packet arrivals and delay requirements. We model the problem as an infinite horizon average reward Markov decision problem (MDP) where the control actions are functions of the instantaneous channel state information (CSI) as well as the joint queue state information (QSI). To address the distributive requirement and the issue of exponential memory requirement and computational complexity, we approximate the subband allocation Q-factor by the sum of the per-user subband allocation Q-factor and derive a distributive online stochastic learning algorithm to estimate the per-user Q-factor and the Lagrange multipliers (LM) simultaneously and determine the control actions using an auction mechanism. We show that under the proposed auction mechanism, the distributive online learning converges almost surely (with probability 1). For illustration, we apply the proposed distributive stochastic learning framework to an application example with exponential packet size distribution. We show that the delay-optimal power control has the multilevel water-filling structure where the CSI determines the instantaneous power allocation and the QSI determines the water-level. The proposed algorithm has linear signaling overhead and computational complexity O(KNF), which is desirable from an implementation perspective.