Abstract
HIV RNA levels are influenced by genetic characteristics of both the host and the virus. Here we applied machine learning techniques to determine if plasma-derived HIV-1 amino acid sequences can be used to predict spontaneous virologic control. We studied the relationship between HIV-1 env genotype and viral load in 20 chronically infected patients undergoing treatment interruptions (SSITT, Swiss-Spanish Intermittent Treatment Trial) and in 104 primary HIV infected (PHI) patients before antiretroviral therapy (cART) and where applicable also after treatment stop. Extensive longitudinal sampling during the interruptions was performed in nine SSITT patients. Sequences obtained from these nine patients during the first virus rebound were used as a training data set and revealed a strong genetic signature (accuracy 98.6% in cross-validation) associated with control of viremia at levels below 5000copies/mL of viral RNA maintained for at least 2 months after the final cART stop. The simple sequence pattern at gp120 positions 268E/358T was confirmed to be predictive of control in the clonal sequences originating from these patients during all subsequent rebounds. Sequences from the remaining 11 SSITT patients with less frequent sampling and from the PHI patients were used for external validation. High sensitivities (71-100%) and negative predictive values (80-100%) but low positive predictive values (12-40%) were achieved in the patient-wise analysis which was based on presence of the genetic pattern in all clones. These results suggest that presence of virus lacking the amino acid pattern 268E/358T is associated with VL >5000 at baseline of PHI and with low probability of spontaneous virologic control after treatment stop. Conversely, however, presence of 268E/358T does not predict control of viremia. These residues in HIV gp120 might affect in vivo HIV-1 fitness either at the level of Env function or influence susceptibility to adaptive or innate immune response.