This article discusses an overview of the approach implemented in an integration testing project to minimize the time required for validating changes within the test suites and corresponding libraries.
Introduction
If you are an automation engineer specializing in big, integration, or system-level tests, most of the time, you work with your own repository, which is separate from the product code. Sometimes, the code of the product under test and integration test libraries are in the same repository, but the problem remains the same: how to validate changes within tests or in a library (framework) closely related to tests? Typically, the automation team simply runs the same test suites that developers use to ensure that changes in tests won't impact the continuous integration (CI) processes for developers. However, what if these test suites require a substantial amount of infrastructure resources and take a significant amount of time to complete? Is there a way to run only those tests that were affected by corresponding changes in the tests repository? Indeed, there is. Let's discuss how to achieve this.
In this article, we are focusing on Python-written frameworks and libraries, but the approach is general enough to be applied in other programming languages as well.
So, is it really possible to determine which tests to run using a dynamic language like Python and be reliable? There is a plugin for pytest called pytest-testmon that achieves this by collecting code coverage after the tests run. This approach works perfectly when dealing with developers and their code. However, when it comes to integration-level test frameworks, collecting such coverage would require running entire test suites after each change in the test framework to understand what is called, which would negate its benefits. So, we need a different solution.
The Reliability Question
Regarding reliability, is it truly crucial to 100% accurately determine which tests to run, especially when considering changes to the test framework? In the worst-case scenario, if we can't really understand which tests were affected, we can always run the entire test suite and be safe. However, there's another worst-case scenario – accidentally skipping some genuinely affected tests. The possibility of it should be obviously minimised as much as possible. Still, this isn't too detrimental when we consider the best-case scenario. For instance, if we modify a method used in just one test, we can simply run that specific test and save a significant amount of time.
IDE to the Rescue
Now, if we don't require a 100% reliable solution, how can we address our problem? Let's take a look at IDEs. If you have VSCode or PyCharm, when you click on a function, the IDE tries to determine where this function is used and provides a list of usages or references. In our approach, we'll do something similar to what IDEs do. Fortunately, there are libraries that can assist us in this regard. In the Python world, we can use Jedi.
Handling Git Changes
But first, we need to parse Git changes somehow. Another great library for this purpose is GitPython, which helps us to obtain a list of diffs
for further processing:
repo = Repo(repo_path)
diffs = repo.merge_base(repo.head.commit, 'main')[0].diff(repo.head.commit)
Here, repo.head.commit
is the commit hash of the proposed changes, and main
is the branch where we want to merge our newly added code. These diffs
provide the following useful fields:
diff.a_blob
diff.b_blob
If both diff.a_blob
and diff.b_blob
exist, it means the file was changed. If only one of them exists, the file was either removed or added. To handle the diff
properly, it's better to use the difflib library, as it provides more functionality for our case:
diff_res = difflib.unified_diff(
file_contents_before_change.split("\n"),
file_contents_after_change.split("\n"),
lineterm="",
n=0,
)
Working with Jedi
Now, let's delve a bit into the Jedi-related code to provide an example of how to work with it. First, we need to initialize Jedi's Project object:
project = jedi.Project(path=code_project_path)
Afterward, we need something that represents our changed code:
script = jedi.Script(path=changed_file_path, project=project)
To obtain the context of the exact location of the changed code:
jedi_name = script.get_context(line=changed_line, column=changed_file)
The 'context
' here refers to a function name, class name, variable name, etc., which is the logical entity behind a piece of code in the file. We will use it to find actual references to this affected entity. To find references, there are two methods:
jedi_names = script.get_references(line=changed_line, column=changed_column)
The first method works, but due to the dynamic nature of Python, it may sometimes miss valid references. To overcome this, another method can be used:
project.search(context_name, all_scopes=True)
This method searches for all references to the specified name throughout the entire project. Using both methods together ensures valid references almost all the time. In my project with around 1000 test cases, I encountered a situation where the affected tests were not run.
Handling 'Special' Cases with ast
However, having valid references is not enough. We need to handle some 'special' cases differently, such as changing a pytest fixture with a 'session' scope and 'autouse' param or a pytest hook – for these, we need to run all tests regardless of changes. To handle this, we can use the ast library, which provides everything we need: the changed file path and the logical entity that requires processing. Calling:
ast.parse(code)
will give us the parsed file where we can search for fixtures, hooks, etc.
'Rules' Logic
Apart from these 'special' cases, it's also beneficial to introduce some kind of 'rules' logic to specify what to do when non-Python files are changed. In this case, after obtaining Git changes, we just need to look for some 'special' files, and if they are present, decide whether to take action or not.
Pytest Plugin
The final step for pytest is to create a plugin that utilizes the functionality specified above and modifies the hook:
def pytest_collection_modifyitems(session, config, items):
affected = config.getoption("affected")
if not affected:
return
changes = get_changes_from_git(
config.getoption("git_path"), config.getoption("git_branch")
)
if not changes:
return
rules = get_rules(config.getoption("affected_rules"))
test_filenames = get_affected_test_filenames(config.getoption("project"), changes)
test_filenames.update(
process_not_python_files(
config.getoption("git_path"), rules, changes
)
)
selected = []
deselected = []
for item in items:
item_path = item.location[0]
if any(item_path in test_filename for test_filename in test_filenames):
selected.append(item)
continue
deselected.append(item)
items[:] = selected
if deselected:
config.hook.pytest_deselected(items=deselected)
One last point – if you encounter any exceptions or timeouts (Jedi can be slow sometimes, especially on large projects, so introducing a timeout is a good idea), you can always run all tests and be happy.
Closing Thoughts
After implementing the above approach in our project with 1000 test cases, the time required to merge our test changes significantly reduced. In a team of 10 people with limited infrastructure resources, we were previously unable to merge more than 3-5 pull requests per week. Afterwards, we managed to handle 10-20 pull requests, speeding up all test automation processes and reducing the time spent on new releases. This, in turn, led to a significant improvement in release times, from 3-6 months to just one month. Therefore, this approach had a profound impact on all our processes. If you find yourself running many tests without a clear reason, try the approach above and observe how it can benefit you.
History
- 11th October, 2023: Initial version