We consider multicast communications from a single source to multiple destinations through a wireless network with unreliable links. Random linear network coding achieves the min-cut flow capacity; however, additional overhead is needed for end-to-end error protection and to communicate the network coding matrix to each destination. We present a joint coding and training scheme in which training bits are appended to each source packet, and the channel code is applied across both the training and data. This scheme allows each destination to decode jointly the network coding matrix along with the data without knowledge of the network topology. It also balances the reliability of communicating the network coding matrices with the reliability of data detection. The throughput for this scheme, accounting for overhead, is characterized as a function of the packet size, channel properties (error and erasure statistics), number of independent messages, and field size. We also compare the performance with that obtained by individual channel coding of training and data. Numerical results are presented for a grid network that illustrate the reduction in throughput due to overhead.