Back and blogging about postgres again! We do other things too, it’s just that postgres is a really cool product that we’ve had time to learn a lot about recently and it’s interesting to write about.
When we configured postgres, we set up logging for slow queries, but we set a really high
log_min_duration_statement, going for 60 seconds. We’ve been running with that configuration ever since, but we’ve decided that 60 seconds is way too high to be useful, since nothing should ever trigger that. We’ve been wanting to lower it so we can be better at keeping track of slow queries. The first step you’re probably going to do when optimizing a query is to find the query plan that was executed.
Postgres comes with a neat module called
auto_explain which can be configured to log the query plans for queries that run for more than
auto_explain.log_min_duration milliseconds. We wanted both to lower
log_min_duration_statement, and at the same time set up auto-explain to the same threshold. We weren’t sure exactly of how low is too low. The reason why this matters is mostly to avoid spamming logs and creating too much I/O on the database server. The overhead of
auto_explain will stay low, as long as you do not enable
auto_explain.log_analyze. The documentation warns you about this, saying the following:
When this parameter is on, per-plan-node timing occurs for all statements executed, whether or not they run long enough to actually get logged. This can have an extremely negative impact on performance. […]
We tried to investigate exactly how large this “extremely negative impact” actually is, but didn’t find too much. People we chatted with said slowdown by about a factor of 2 could happen. We honestly don’t know what that would do to the database server and prefer not to find out. So we chose to not enable
auto_explain.log_analyze at all.
There’s another postgres extension that was really helpful when setting our thresholds.
pg_stat_statements will gather statistics about your most common queries for you, if you
PG_STAT_STATEMENTS in your database. This view contains several really interesting statistics about your queries, for example how often they are called, how long they take to execute (mean, max, min, standard deviation), how often they have cache-misses and how many temporary files they generate (ideally none!)
We decided we could use this to obtain a ballpark estimate of how many queries different thresholds for
auto_explain would catch. We really don’t want to be logging many thousands each day, as that would make it difficult to focus on the big wins, so we’d prefer to start with a lenient threshold and then lower it as we improve queries. Trying to query against
pg_stat_statements also gave us an opportunity to practice some pretty cool SQL.
We based our choice on
calls (total amount of query executions) and investigated what percentage of our total queries that had which average running time. We started by classifying each query in
pg_stat_statements into different buckets depending on its
select calls, case when mean_time > 60000 then 60 when mean_time > 30000 then 30 when mean_time > 10000 then 10 when mean_time > 5000 then 5 else 0 end as bucket_time from pg_stat_statements;
This will classify each query according to its
mean_time, giving it the
bucket_time of 0, 10, 30 or 60 seconds, which is the lower bound of the bucket we put it in. We can build on this query to aggregate some useful information:
with query_times as (select calls, case when mean_time > 60000 then 60 when mean_time > 30000 then 30 when mean_time > 10000 then 10 when mean_time > 5000 then 5 else 0 end as bucket_time from pg_stat_statements) select sum(calls) as calls, bucket_time from query_times group by bucket_time;
with statement here is called a Common Table Expression and you can read more about it here. It’s super useful for structuring queries nicely to make them readable, and it helps with building up queries one part at a time. The query above produces an output much like:
calls | bucket_time ------------+------------- 688 | 30 1131 | 60 4166 | 5 6065699764 | 0 1483 | 10
That’s already useful and will give us our ballpark estimate. We know that it’s probably safe to set our threshold to 5 seconds for a start, because that won’t catch many queries (
688 + 1131 + 4166 + 1483). But wouldn’t it be cool to calculate some percentages so we can see how many percents of queries that fall into each bucket? We can do that with window functions:
with query_times as (select calls, case when mean_time > 60000 then 60 when mean_time > 30000 then 30 when mean_time > 10000 then 10 when mean_time > 5000 then 5 else 0 end as bucket_time from pg_stat_statements), grouping as (select sum(calls) as calls, bucket_time from query_times group by bucket_time) select bucket_time, 100 * calls / (sum(calls) over ()) as percent, calls, rank() over (order by bucket_time) from grouping;
sum(calls) over () part of the query will
calls for each column in the window, which in our case is the entire result-set.
rank() will number the rows according to the ordering of
bucket_time. In total, this produces an output much like:
bucket_time | percent | calls | rank -------------+----------------------------+------------+------ 0 | 99.9998768942781880 | 6066323073 | 1 5 | 0.000068674134583396087978 | 4166 | 2 10 | 0.000024446409406427364011 | 1483 | 3 30 | 0.000011341287708443713041 | 688 | 4 60 | 0.000018643890113735231758 | 1131 | 5
Wouldn’t it be cooler still if we could accumulate the calls and percentages according to
ranks, such that each bucket also contains all the calls from the ranks that are lower than it? That way, we can say that
99.9999999% of all queries complete in less than X seconds. We can do that joining the previous query with a join-condition on the rank:
with query_times as (select calls, case when mean_time > 60000 then 60 when mean_time > 30000 then 30 when mean_time > 10000 then 10 when mean_time > 5000 then 5 else 0 end as bucket_time from pg_stat_statements), grouping as (select sum(calls) as calls, bucket_time from query_times group by bucket_time), stats as (select bucket_time, 100 * calls / (sum(calls) over ()) as percent, calls, rank() over (order by bucket_time) from grouping) select s1.bucket_time, s1.calls, s1.percent, sum(s2.calls) as accumulated_calls, sum(s2.percent) as accumulated_percent from stats s1 inner join stats s2 on s1.rank >= s2.rank group by s1.bucket_time, s1.calls, s1.percent order by s1.bucket_time;
The result will look something like this:
bucket_time | calls | percent | accumulated_calls | accumulated_percent -------------+------------+----------------------------+-------------------+------------------------------ 0 | 6066780909 | 99.9998769035684793 | 6066780909 | 99.9998769035684793 5 | 4166 | 0.000068668952024004314466 | 6066785075 | 99.999945572520503304314466 10 | 1483 | 0.000024444564534709169072 | 6066786558 | 99.999970017085038013483538 30 | 688 | 0.000011340431827295959758 | 6066787246 | 99.999981357516865309443296 60 | 1131 | 0.000018642483134697282684 | 6066788377 | 100.000000000000000006725980
accumulated_percent is larger than
100%? We think this is a combination of rounding errors with floats, and the fact that we’re actually running queries right now, which will add new entries to
pg_stat_statements while we’re executing. Anyway, the
acumulated columns should show how many queries that are at least this fast, on average. Choosing different bucket sizes is easy, only the first common table expression needs to be modified.
The result gives us some pretty good ballpark figures for setting slow query thresholds.