We derive an optimization framework for joint view and rate scalable coding of multi-view video content represented in the texture plus depth format. The optimization enables the sender to select the subset of coded views and their encoding rates such that the aggregate distortion over a continuum of synthesized views is minimized. We construct the view and rate embedded bitstream such that it delivers optimal performance simultaneously over a discrete set of transmission rates. In conjunction, we develop a user interaction model that characterizes the view selection actions of the client as a Markov chain over a discrete state-space. We exploit the model within the context of our optimization to compute user-action-driven coding strategies that aim at enhancing the client's performance in terms of latency and video quality. Our optimization outperforms the state-of-the-art H.264 SVC codec as well as a multi-view wavelet-based coder equipped with a uniform rate allocation strategy, across all scenarios studied in our experiments. Equally important, we can achieve an arbitrarily fine granularity of encoding bit rates, while providing a novel functionality of view embedded encoding, unlike the other encoding methods that we examined. Finally, we observe that the interactivity-aware coding delivers superior performance over conventional allocation techniques that do not anticipate the client's view selection actions in their operation.