Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture, that can learn in situ as accurately as conventional processors, is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine learning and computational neuroscience that could achieve such accuracy. Using a surrogate gradient learning framework, we derive local, error-triggered learning dynamics compatible with crossbar arrays and the temporal dynamics of SNNs. The derivation reveals that circuits used for inference and training dynamics can be shared, which simplifies the circuit and suppresses the effects of fabrication mismatch. We present SPICE simulations on XFAB 180nm process, as well as large-scale simulations of the spiking neural networks on event-based benchmarks, including a gesture recognition task. Our results show that the number of updates can be reduced hundred-fold compared to the standard rule while achieving performances that are on par with the state-of-the-art.