Why use coverage to find which parts of a python code were executed? (2024)

May 19, 2018

In this post, I’ll walk you through the decision making process the team behindVulture underwent to come up with a way to deal with false positives in it’sresults.

Let’s first start with a brief introduction ofVulture:

Vulture

As the name suggests, vulture helps find deadcode for Python programs. There aremany reasons for dead code ending up in a project. The most common isrefactoring, but another is misspellings, which are only detected at runtime fordynamic languages. Finding and removing dead code allows to keep the code baseclean and reduces bugs.

Vulture can detect unused imports, variables, attributes, functions, methods,properties and classes. Other than these, code after return statements andchecking for a Boolean False (eg. if False: or while 0:) can also bedetected.

Using vulture

Vulture is a standard Python package, that is installed with pip:

(venv) $ pip install vulture

Let us say that you have the following program (say program.py) on which youwant to perform analysis:

# program.pyimport osdef false_positive_function(): """ A function which is vital for your application to function, but Vulture reports it as unused. """ passdef hello_world(): message = "Hello World" print("Hello World")def main(): hello_world()if __name__ == '__main__': main()

Analysing the program with vulture is as simple as running the followingcommand:

(venv) $ vulture program.py

which would produce the following output (on vulture 0.26):

program.py:2: unused import 'os' (90% confidence)program.py:4: unused function 'false_positive_function' (60% confidence)program.py:12: unused variable 'message' (60% confidence)

As you can see, along with every result, vulture also reports a confidencevalue - which is a measure of how sure vulture is about that part of code beingunused. This output can be made even more meaningful with the help of flags like--min-confidence and --sort-by-size. Read more about themhere

Owing to Python’s dynamic nature, Vulture is likely to miss some dead code.Also, code which is only implicitly used is reported unused, such as overloadinga parent class method, overriding methods of C/C++ extensions, etc.

Some other examples where vulture may report “useful” code as unused:

API Endpoints - They are meant for users and are not employed to any usedirectly in the source code, therefore confusing vulture.
ORM Schema - Again, they aren’t used by program’s source code directly.

Handling false positives

One way to prevent Vulture from reporting false positives is to explicitly usethe code anyway. > WHAT! - Are you telling me to run my already used code?

Worry not! - no need to actually call the code - If you create a mocking classwith attributes, name of whose exactly match the name of the unused code(variables, functions, classes, anything which can be unused and has a name),you can very cleverly fool Vulture into believing that that part of the code isbeing used (because Vulture keeps only track of the names parsed from the AST).This is known as “Whitelisting” and since it is a fairly common practice tocreate such a class for mocking objects, we already ship it with Vulture.

Let me show how you can create your own whitelists with our previous example.(Remeber we had a false_positive_function - let’s whitelist it.)

Note that I am calling this file whitelist_program.py because it’s a good ideato start the name with whitelist just so you know what that file does, butthere is no such compulsion - You can call it whatever you want.

# whitelist_program.pyfrom vulture.whitelist_utils import Whitelistawesome_whitelist = Whitelist()Whitelist.false_positive_function

Now, let’s run Vulture using the following command:

(venv) $ vulture program.py whitelist_program.py

And hurray, output does not contain the false positive function.

program.py:2: unused import 'os' (90% confidence)program.py:12: unused variable 'message' (60% confidence)

How did that work?

Since you also passed the whitelist file along with the file to be analysed,vulture created ast’s for both of them and while parsing those trees, Vulturecreated a common set for storing the names of used and defined objects. Sincethe name of the false positive function occurs in both of them, it is thereforenot treated as unused.

A thing you may find interesting and noteworthy about the Whitelist class is that it does absolutely “nothing”. It’s current implementation is as follows:

class Whitelist: """ Helper class that allows mocking Python objects. Use it to create whitelist files that are not only syntactically correct, but can also be executed. """ def __getattr__(self, _): pass

Now, since whitelists are so extensively used, Vulture already comes loaded forsome popular libraries like sys, collections, unittest,etc. -These whitelists are automatically “activated” as soon as the user imports thatlibrary. The developers at Vulture are working hard (gives a pat on his back) toship even more of these. Guess what, you can add one for your library, or justopen an issue and we would create one for you. PR’s are more than welcome! :-)

Some other ways of dealing with false poisitives:

Mark unused variables by starting them with an “_”. (e.g., _x, y =get_pos())
Use different files for API endpoints, ORM, etc. and exclude them with thehelp of --exclude flag.

You can find more information about vulture in it’sdocumentation.

What more does vulture want?

As we saw earlier, the results reported by vulture sometimes contain falsepositives. We want to be able to develop such a system which should be able todetect wether or not the result given by vulture is a false positive.

Please note that this discussion originally occurred onjendrikseipp/vulture/#109and this post is going to be a translation with some insight on how we finallycame up with a decision.

Approach 1 - User employed regular expressions

Vulture has different methods for parsing different kind of nodes in an ast.So for example, if Vulture encounters a function definition, it triggers thevisitFunctionDef method for parsing that node and similarly for classes,variables, etc. Now, in those methods, we can easily insert a check to see if aname matches with any of the regex then we should ignore that construct.

The implementation for this was very easy, but there was a whole new problem -How do we present this functionality to the user?

All the inputs in Vulture are supplied through command line arguments and therearen’t any config files or variables (because that would be an overkill for sucha simple tool). Although passing regex through a cli argument is possibe,seeking that it would be a non trivial task to write such a command and thatthere would be way too many different permutations of “lists” of regex for allthe different types of constructs (functions, classes, variables, methods,properties, etc.), we soon dropped the idea.

Approach 2 - Using XML output from `coverage.py`

Now, since we were restricted to minimum user interaction, Jendrik came up withthis brilliant idea of automatically generating an initial whitelist and thenletting user adapt it to her needs. This lead us to think that we should letuser run coverage.py on their code base andexport the XML output which could then be consumed by Vulture to detect whichlines are actually used.

So, we tried a basic prototype. We took a sample file, let’s say ab.py:

def a(f): print(f)@adef b(): pass

Running coverage.py, the following XML output was observed:

<?xml version="1.0" ?><coverage branch-rate="0" line-rate="0.75" timestamp="1524751600489" version="4.3.4"> <!-- Generated by coverage.py: https://coverage.readthedocs.io --> <!-- Based on https://raw.githubusercontent.com/cobertura/web/f0366e5e2cf18f111cbd61fc34ef720a6584ba02/htdocs/xml/coverage-03.dtd --> <sources> <source>/Users/rahuljha/Documents/test_everything_here</source> </sources> <packages> <package branch-rate="0" complexity="0" line-rate="0.75" name="."> <classes> <class branch-rate="0" complexity="0" filename="ab.py" line-rate="0.75" name="ab.py"> <methods/> <lines> <line hits="1" number="1"/> <line hits="1" number="2"/> <line hits="1" number="4"/> <line hits="0" number="6"/> </lines> </class> </classes> </package> </packages></coverage>

Now, as you may have observed, the problem with this was that the line numberswere not accurate (function b starts at line number 6, but according to thereport, line only line number 6 is unused) and no information about the name ofthe unused code was provided. So, we discraded this idea and decided to go yetmore bare bones and to write a tracer.

Approach 3 - Writing a `tracer`

A tracer would allow us to keep track of what is going on in the current Pythonprocess, but after trying to develop a simple protoype, I knew that it wasn’t atrivial task. Also, Abdeali who have had previous experience working with such“technology” held the following opinion:

If vulture starts to maintain tracer and so on it is starting to go towardsnon-static analysis which I am not sure you guys want to maintain as it canhave a bunch of issues :/

So, in this moment of confusion, we decided to ask Ned Batchelder, the guybehind coverage.py himself. Let me quote him:

You will need a trace function. Writing your own doesn’t have to becomplicated, though you are right there are details like subprocesses that canbe a pain. Writing a file tracer plugin could be a way to piggy-back oncoverage.py. Getting function names wouldn’t be hard, since you can inspect theframe object in the trace function to get what you need. Variables are harder.You’ve noted that coverage.py reports line numbers, but then you say it’s hardto go from line numbers to the information you need. But you already are doingAST analysis. Couldn’t you use coverage.py’s line numbers to index into the ASTyou already have, and then use your AST expertise from there?

So, we were back to square one (well, square two in this case) - Use output fromcoverage.xml

Back to approach 2 - Use XML output from `coverage.py`

We quickly discovered that <line hits="1"> was merely a binary switchindicative of whether or not that line was called and contained no informationabout “how many times”. It was important because it would always be “1” for thefirst line of function (It is used when the program is initialised to store thename of the function in memory) and therefore we couldn’t use this switch as anindication of whether or not the function was actually used. This also explainedwhy the line numbers were off when we inspected the output earlier.

But, this gave us a workable idea: If Vulture says that a function is unused, wecan check whether this is really the case by checking whether the next line ismarked as unused in the coverage.py XML file. The only exception would be afunction defined in a single line, such as:

def true(): return True

which aren’t very common nor very useful and therefore can be neglected.

And that’s how my dear friends we decided to use coverage.py to detect if codewas actually used or not. Although, it would only enable vulture to detectunused “functions” (and properties, maybe) and not classes or variable, but itwould still be a very useful feature.

P.S. I am excited to work with the team on this feature. It is also one of thegoals of my Google Summer of Code project.

FAQs

Why use coverage to find which parts of a python code were executed? ›

Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not. Coverage measurement is typically used to gauge the effectiveness of tests.

Discover More Details ›

What is code coverage in Python project? ›

Coverage.py is one of the most popular code coverage tools for Python. It uses code analysis tools and tracing hooks provided in Python standard library to measure coverage. It runs on major versions of CPython, PyPy, Jython and IronPython. You can use Coverage.py with both unittest and Pytest.

Find Out More ›

What is the best code coverage for Python? ›

Coverage.py is the most widely used tool for measuring code coverage in Python. It's easy to set up and integrates seamlessly with most testing frameworks, including unittest, pytest, and nose.

How does a coverage report work? ›

Code coverage is a simple statistic that measures the total lines of code that a test suite validates. It uses set metrics to calculate the total number of lines of code in your application source code that runs successfully in a test—typically expressed as a percentage.

Learn More Now ›

How do you calculate code coverage in Python? ›

Code coverage is a metric that measures the percentage of your codebase that is executed by your test suite. It is calculated by dividing the number of lines of code executed by your tests by the total number of lines in the codebase.

Discover More ›

What does coverage do in Python? ›

Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not. Coverage measurement is typically used to gauge the effectiveness of tests.

Get More Info Here ›

Why code coverage is used? ›

Code coverage is a metric that can help you understand how much of your source is tested. It's a very useful metric that can help you assess the quality of your test suite, and we will see here how you can get started with your projects.

Explore More ›

How to increase the code coverage in Python? ›

Write More (Useful) Tests

So given a current ratio, one can increase total coverage by by increasing the amount of covered_code. (We'll visit the topic of decreasing total_code later). The usual way to increase covered code answer is "code more tests" that exercise additional code.

See Details ›

What is the difference between test coverage and code coverage? ›

The main difference between test coverage vs. code coverage is whether the coverage is qualitative or quantitative. While test coverage measures the quality of the testing process, code coverage measures the quantity of code that has been tested.

View Details ›

How do I organize my Python code better? ›

You can start by creating separate folders for different parts of the project, such as one for the code itself, one for data, one for testing, and one for project documentation. This way to structure will help you find what you need more quickly and make it easier for others to navigate your code.

Get More Info ›

Does code coverage matter? ›

Why does code coverage matter? The purpose of software testing—whether it's smoke testing, regression testing, unit testing, etc. —is to improve the quality and reliability of your software. In theory, the more lines of code you test (i.e., the higher your code coverage), the more likely you'll find harmful defects.