Abstract
Black-box adversarial attacks involve the generation of adversarial samples that can mislead a model by exploring and adjusting input data without requiring knowledge of the model’s internal structure and parameters. Attackers typically observe the model’s output to infer and optimise the input, identifying data that causes the model to misclassify or make incorrect predictions. This approach more closely reflects real-world scenarios, making it highly threatening. While researchers have made significant progress in applying black-box adversarial attack methods to machine learning, their performance in federated learning has not been thoroughly validated. In this work, widely used black-box adversarial attack methods in machine learning were selected and thoroughly studied to gain a comprehensive understanding of their attack principles and implementation methods. These attack methods were then integrated into a federated learning platform, called Fedstellar in this work. By setting different federated learning parameters, the performance of the attack methods was evaluated in different environments and the robustness of the federated learning platform was assessed. The experimental results showed significant performance differences between the different attack methods. Moreover, the performance of these attack methods was highly related to the number of federated learning nodes, datasets and federation methods, while the topology had minimal impact on the attack performance. Based on the analysis of the experimental results, a new potential timing for black-box adversarial attacks is proposed, which could be further explored in future work to have a greater impact on federated learning.