This paper addresses the problem of simultaneous estimation of the vehicle ego-motion and motions of multiple moving objects in the scene—called eoru-motions—through a monocular vehicle-mounted camera. Localization of multiple moving objects and estimation of their motions is crucial for autonomous vehicles. Conventional localization and mapping techniques (e.g. Visual Odometry and SLAM) can only estimate the ego-motion of the vehicle. The capability of robot localization pipeline to deal with multiple motions has not been widely investigated in the literature. We present a theoretical framework for robust estimation of multiple relative motions in addition to the camera ego-motion. First, the framework for general unconstrained motion is introduced and then, it is adapted to exploit the vehicle kinematic constraints to increase efficiency. The method is based on projective factorization of the multiple-trajectory matrix. First, the ego-motion is segmented and, then, several hypotheses are generated for the eoru-motions. All the hypotheses are evaluated and the one with the smallest reprojection error is selected. The proposed framework does not need any a priori knowledge of the number of motions and is robust to noisy image measurements. The method with constrained motion model is evaluated on a popular street-level image dataset collected in urban environments (KITTI dataset) including several relative ego-motion and eoru-motion scenarios. A benchmark dataset (Hopkins 155) is used to evaluate this method with general motion model. The results are compared with those of the state-of-the-art methods considering a similar problem, referred to as the Multi-Body Structure from Motion in the computer vision community.