What is the Vocabulary of Flaky Tests?
MSR - Technical Paper
Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests (e.g., networking or concurrency identifiers). This work proposes to automatically classify tests as flaky or not. Classification of flaky tests is important,for example, to detect the introduction of flaky test and to search for flaky tests after they are introduced in test suites. We evaluated performance of various machine learning algorithms to solve this problem. We constructed a dataset of flaky and non-flaky tests by running more than 50k test cases, 100 times each. We then used machine learning techniques on the resulting data set to predict which tests are flaky from their source. Based on features, such as counting stemmed tokens extracted from source code identifiers, we achieved an F-measure of 0.95 for the identification of flaky tests. The best performance was achieved when using Random Forest and Support Vector Machines for the prediction. In terms of the code identifiers that are most strongly associated with test flakiness, we noted that job, action, and services are commonly associated with flaky tests.