A common pattern in automated testing is to clear the database state before every test runs. This ensures that the tests remain robust and immune to side-effects of adjacent tests. While a clean database state for every test is beneficial in this sense, the overhead of clearing data can be significant. In one of our largest test suites comprised of thousands of tests interacting with Postgres, we reduced the run time of the suite by over 33% by switching our table-clearing SQL to use 'DELETE' rather than 'TRUNCATE'.
Our team's productivity depends on the performance of this test suite, since they run very frequently and are an integral part of the deployment pipeline. The nearly 3,200 tests included are responsible for the correctness of one of our most critical codebases. The codebase has constant activity, and changes are continuously committed by authors from multiple teams. The entire suite runs before every commit is merged, as well as after each commit lands in master.
We were surprised when we measured the overhead of clearing table data. Before each test runs, no fewer than 32 tables are wiped clean with a `TRUNCATE`. The 'TRUNCATE's accounted for 20-200 milliseconds of overhead for each test. In other words, this added 1 to 5 minutes of overhead for the entire suite when run serially. (In practice, we run groups of tests in parallel, so the perceived total overhead was closer to about 1 minute total.)
When we swapped the 'TRUNCATE's for 'DELETE's, the overhead of each test dropped to 1-3 milliseconds. This reduced the combined overhead of all the tests to a total of about 10 seconds. In the graph below, you can see that average test suite run time dropped by over 33%—from over 3 minutes to about 2 minutes.
We were excited about the substantial boost to our test performance, but curious about why the switch from 'TRUNCATE' to 'DELETE' helped. After all, the result of 'TRUNCATE table_a;' or 'DELETE FROM table_a;' is the same: an empty 'table_a'. So why would one be any faster than the other?
In actuality, 'TRUNCATE' and 'DELETE' in Postgres do different things under the hood. From a database user perspective, the results seem the same: data is deleted; however, internally, there are major differences in the underlying data representing the table. With 'DELETE's, dead rows remain in database pages and their dead pointers are still present in indices. 'TRUNCATE', on the other hand, is a bit more thorough in wiping the slate clean. The resulting table looks almost identical internally to a newly 'CREATE'ed table. There are no dead rows, the indices are empty, and the table statistics are reset. There are quite a few other differences as well, all highlighted by a terrifically detailed post by Craig Ringer.
Also, 'TRUNCATE' is a constant time operation while 'DELETE' complexity depends on the number of rows in the table. 'TRUNCATE' is a fast operation to quickly clear out huge tables, but 'DELETE' is much faster when a table contains just a handful of rows. This is why our tests saw such a significant speed improvement by the change: each test only inserts a handful of rows.
With the large size of this test suite and the heavy traffic from our engineers, a reduction in the run time of the tests has a big impact on developer happiness and productivity. This tweak is perhaps most satisfying because of the combination of significant test speedup and it's relative simplicity. The investigation also offers interesting insights into Postgres internals and a take-away to use 'DELETE'on very small tables and 'TRUNCATE' on everything else.
If you're interested in working with Postgres to Make the World Programmable, you're in luck! We're hiring!