Several real-world systems, such as gene expression networks in biological cells, contain cou- pled chemical reactions with a time delay between reaction initiation and completion. The non- Markovian kinetics of such reaction networks can be exactly simulated using the delay stochastic simulation algorithm (dSSA). The computational cost of dSSA scales with the total number of reactions in the network. We reduce this cost to scale at most with the smaller number of species by using the concept of partial reaction propensities. The resulting delay partial-propensity direct method (dPDM) is an exact dSSA formulation for well-stirred systems of coupled chemical reac- tions with delays. We detail dPDM and present a theoretical analysis of its computational cost. Furthermore, we demonstrate the implications of the theoretical cost analysis in two prototypical benchmark applications. The dPDM formulation is shown to be particularly efficient for strongly coupled reaction networks, where the number of reactions is much larger than the number of species.