Abstract
Software testing is a crucial component in modern continuous integration development environment.
Ideally, at every commit, all the system's test cases should be executed and moreover, new test cases should be generated for the new code.
This is especially true in the a Continuous Test Generation (CTG) environment, where the automatic generation of test cases is integrated into the continuous integration pipeline.
Furthermore, developers want to achieve a minimum level of coverage for every build of their systems.
Since both executing all the test cases and generating new ones for all the classes at every commit is not feasible, they have to select which subset of classes has to be tested.
In this context, knowing a priori the branch coverage that can be achieved with test data generation tools might gives some useful indications for answering such a question.
In this paper, we take the first steps towards the definition of machine learning models to predict the branch coverage achieved by test data generation tools.
We conduct a preliminary study considering well known code metrics as a features.
Despite the simplicity of these features, our results show that using machine learning to predict branch coverage in automated testing is a viable and feasible option.