This thesis addresses the challenging problem of obtaining 3D models for real environments from stereo images and translational video sequences. The problem is partitioned into two main parts: matching and disparity estimations to obtain depth maps and separate 3D models for different image locations, and the combining of these separate 3D models into one 3D model for the whole environment. Solutions are proposed dealing with these two main issues respectively, and the results from implementing these solutions are also presented. The novelty for the solution of the first part -- which deals with the matching problem -- lies in the fact that it combines the pixel-based approach and region-based approach. A hybrid algorithm is developed for d...