When we perceive a visual scene, we usually see an arrangement of multiple cluttered and partly overlapping objects, like a park with trees and people in it. Spatial attention helps us to prioritize relevant portions of such scenes to efficiently interact with our environments. In previous experiments on object recognition, objects were often presented in isolation, and these studies found that the location of objects is encoded early in time (before ∼150 ms) and in early visual cortex or in the dorsal stream. However, in real life objects rarely appear in isolation but are instead embedded in cluttered scenes. Encoding the location of an object in clutter might require fundamentally different neural computations. Therefore this dissertation addressed the question of how location representations of objects on cluttered backgrounds are encoded in the human brain. To answer this question, we investigated where in cortical space and when in neural processing time location representations emerge when objects are presented on cluttered backgrounds and which role spatial attention plays for the encoding of object location. We addressed these questions in two studies, both including fMRI and EEG experiments. The results of the first study showed that location representations of objects on cluttered backgrounds emerge along the ventral visual stream, peaking in region LOC with a temporal delay that was linked to recurrent processing. The second study showed that spatial attention modulated those location representations in mid- and high-level regions along the ventral stream and late in time (after ∼150 ms), independently of whether backgrounds were cluttered or not. These findings show that location representations emerge during late stages of processing both in cortical space and in neural processing time when objects are presented on cluttered backgrounds and that they are enhanced by spatial attention. Our results provide a new perspective on visual information processing in the ventral visual stream and on the temporal dynamics of location processing. Finally, we discuss how shared neural substrates of location and category representations in the brain might improve object recognition for real-world vision.