Abstract

In today's software projects GUI tests are frequently represented. Like all tests, they serve to maintain the software quality and thus to keep the speed of development as linear as possible. However, this software is under development and thus subject to constant change. In particular thereby also the GUI with its components changes. The GUI tests must be able to deal with these changes to the components. Otherwise they fail, they break. A frequent reason for it is that changed components are not found and recognized by the assigned Framework for GUI tests any longer. In this seminar work individual classes at errors in GUI tests are to be found on the basis a case study. The goal is to avoid these and thus to keep the maintenance expenditure for the GUI tests as small as possible. A set of test cases for the v2.00 version of KeePass will be created. Subsequent development is simulated by applying the tests to the next version, i.e. v2.01. This allows us to determine at what point the tests fail. These are then grouped into classes. In summary, by considering persistence in the test environment, trivial renaming, and properties of the GUI test framework, the vast majority of errors can be avoided.

Objective

Similar to the dimensions that can be used to derive different types of tests, different dimensions can also be defined that change the SUT. Examples are the version, the language or the platform. In this seminar work only developed changes by the development, thus version, are to be regarded. A goal of this study is it errors in GUI tests and possibilities these to avoid to find. Thus the quality of the GUI tests increases with the initial production and these break later less frequently. Thus, the quality of the GUI tests should be increased with an additional investment of time during the creation in order to avoid frequent adaptations of the GUI tests later on. If the additional time spent at the beginning is less than the sum of the potential time spent later for adaptation, the maintenance effort is reduced. This is desirable in all cases.

Interpretation

Persistence in the test environment

The first group of errors was caused by persistence in the test environment. This means that data is persistently written and used by the SUT across the execution of tests.

In the case of KeePass, this was present in the Settings dialog, for example. As of version 2.07, KeePass saves the open tab when the dialog is closed. The next time the dialog is opened, the saved tab is preselected and not the first tab as in versions 2.00 to 2.06. The tests were written under the assumption that the first tab is always selected. They therefore failed from v2.07 onwards.

In total, two of the 17 failed tests failed due to this or some other reason. This error is very easy to avoid by deleting the persistent data before running the tests. With a self developed SUT also a function is conceivable with which the SUT executes a so-called clean start by an argument. Thus without consideration of the persistent data.

Trivial renaming

The largest group of errors identified in this case study are trivial renames. These are renamings that a normal user may not notice, but are uncovered during automated GUI testing. Examples from KeePass are renamings from Find to Find..., from Side-By-Side to Side by Side or from Lock Windows to Lock Window or Logout.

With ten out of 17 tests failing for this or some other reason, this group is the largest of the groups listed here. So avoiding these errors is of great interest.

Study possibilities to avoid these errors are broader than for the first group. For example, testing without case sensitivity can avoid these errors. If a component is already sufficiently characterized by a substring, only this substring can be matched. Another case, where a certain error tolerance should be allowed, this tolerance could be realized by a Levenshtein distance. So the components can be renamed within this tolerance without breaking the GUI tests. If the GUI test framework allows the use of regular expressions, these can also be used with the associated degree of flexibility.

Know the properties of the GUI test framework

The last group of errors can be avoided by knowing the properties of the GUI test framework. So in this case study, QF-Test and the properties in component recognition. QF-Test is a probability-based approach. Based on certain properties, a probability is calculated for each component that it is the component being searched for. The component with the highest result is finally selected. The unique name of the component, which can be set for example by means of setName(), is particularly strongly weighted.

As in figure 2.4 to recognize in KeePass during the development the names of components were changed. Other properties of these components did not change however, whereby this change is not visually recognizable. However, tests with QF-Test and the weighting used failed. In total, six of the 17 failed tests failed due to changed names or some other reason. These errors can also be easily avoided by setting the names of the components consistently at the beginning.

Conclusion

The overall result of the case study is clear. Thus, 14 of the 17 failed tests failed due to at least one of the three errors. Only the remaining three failed tests fall outside these error classes. Of course, these results cannot be transferred to any software project without restrictions. The classes of errors shown here are very specific and apply overall only to the conditions of the case study. To what extent the results can be transferred to other software projects is a possible subject of other work. Possible changes to the conditions include, for example, the GUI test framework used, the SUT, the versions, the tests and test cases, the operating system used, and many others.

However, it has been clearly shown that a few errors in GUI tests break a large percentage of them. Thus, the robustness of automated GUI tests can be increased by avoiding errors within the identified error classes. As a direct consequence, the maintenance effort for the GUI tests decreases, which is desirable in any case.

The whole seminar paper can be found here (PDF).

Seminar paper: Why do my automated GUI tests break? July 2020 - Leonard Husmann, Faculty of Computer Science, Technische Universität München, Germany.

(Original German texts and citations are translated into English.)