This paper investigates cooperative estimation of averaged moving target object poses in three dimensions for visual sensor networks. In particular, we consider the situation where multiple vision cameras see a common target object but the poses consistent with visual measurements differ from camera to camera due to a variety of uncertainties. Under the situation, we try to estimate an average of the contaminated poses not only for static but also for moving target objects by using only local negotiations. For this purpose, we present a cooperative estimation mechanism called networked visual motion observer. We then derive an upper bound of the ultimate error between the actual average and the estimates produced by the present estimation mechanism for both static and moving target objects. Finally the effectiveness of the networked visual motion observer is demonstrated through simulation.