The object-based attention theory has shown that perception processes only select one object of interest from the world at a time which is then represented for action. This paper therefore presents an autonomous visual perception model for robots by simulating the object-based bottom-up attention mechanism. Using this model visual perception of robots starts from attentional selection over the scene followed by high-level analysis only on the attended object. The proposed model involves three components: pre-attentive segmentation, bottom-up attentional selection and post-attentive recognition and learning of the attended object. The model pre-attentively segments the visual field into discrete proto-objects at first. Automatic bottom-up competition is then performed to yield a location-based saliency map. By combination of location-based salience within each proto-object, the proto-object based salience is evaluated. The most salient proto-object is selected for recognition and learning. This model has been applied into the robotic task of automatical detection of objects. Experimental results in natural and cluttered scenes are shown to validate this model.