Neural oscillations constitute an intrinsic property of functional brain organization that facilitates the tracking of linguistic units at multiple time scales through brain-to-stimulus alignment. This ubiquitous neural principle has been shown to facilitate speech segmentation and word learning based on statistical regularities. However, there is no common agreement yet on whether speech segmentation is mediated by a transition of neural synchronization from syllable to word rate, or whether the two time scales are concurrently tracked. Furthermore, it is currently unknown whether syllable transition probability contributes to speech segmentation when lexical stress cues can be directly used to extract word forms. Using inter-trial coherence (ITC) analyses in combinations with Event-Related Potentials (ERPs), we showed that speech segmentation based on both statistical regularities and lexical stress cues was accompanied by concurrent neural synchronization to syllables and words. In particular, ITC at the word rate was generally higher in structured compared to random sequences, and this effect was particularly pronounced in the flat condition. Furthermore, ITC at the syllable rate dynamically increased across the blocks of the flat condition, whereas a similar modulation was not observed in the stressed condition. Notably, in the flat condition ITC at both time scales correlated with each other, and changes in neural synchronization were accompanied by a rapid reconfiguration of the P200 and N400 components with a close relationship between ITC and ERPs. These results highlight distinct computational principles governing neural synchronization to pertinent linguistic units while segmenting speech under different listening conditions.