This paper exploits reinforcement learning (RL) for developing real-time adaptive control of tip trajectory and deflection of a two-link flexible manipulator handling variable payloads. This proposed adaptive controller consists of a proportional derivative (PD) tracking loop and an actor-critic-based RL loop that adapts the actor and critic weights in response to payload variations while suppressing the tip deflection and tracking the desired trajectory. The actor-critic-based RL loop uses a recursive least square (RLS)-based temporal difference (TD) learning with eligibility trace and an adaptive memory to estimate the critic weights and a gradient-based estimator for estimating actor weights. Tip trajectory tracking and suppression of tip deflection performances of the proposed RL-based adaptive controller (RLAC) are compared with that of a nonlinear regression-based direct adaptive controller (DAC) and a fuzzy learning-based adaptive controller (FLAC). Simulation and experimental results envisage that the RLAC outperforms both the DAC and FLAC.