One of the best-known demonstrations of long-term learning through repetition is the Hebb effect: Immediate recall of a memory list repeated amidst nonrepeated lists improves steadily with repetitions. However, previous studies often failed to observe this effect for visuospatial arrays. Souza and Oberauer (2022) showed that the strongest determinant for producing learning was the difficulty of the test: Learning was consistently observed when participants recalled all items of a visuospatial array (difficult test) but not if only one item was recalled, or recognition procedures were used (less difficult tests). This suggests that long-term learning was promoted by increased testing demands over the short term. Alternatively, it is possible that lower testing demands still lead to learning but prevented the application of what was learned. In four preregistered experiments (N = 981), we ruled out this alternative explanation: Changing the type of memory test midway through the experiment from less demanding (i.e., single item recall or recognition) to a more demanding test (i.e., full item recall) did not reveal hidden learning, and changing it from the more demanding to a less demanding test did not conceal learning. Mixing high and low demanding tests for nonrepeated arrays, however, eventually produced Hebb learning even for the less demanding testing conditions. We propose that testing affects long-term learning in two ways: Expectations of the test difficulty influence how information is encoded into memory, and retrieval consolidates this information in memory.