Recently in one of the projects that I was working on, I came across a situation where the SQL query times was getting slower and slower per day, and I had to figure out what was happening to the system.

One of the things that I checked immediately was whether there were any long running queries in the system, that is affecting the other queries, and whoila, there it was - the system had a bunch of rogue queries that had been running for over a day or so.

Running the following command gave me queries that were running for more than 5 seconds.

  now() - pg_stat_activity.query_start AS duration,
FROM pg_stat_activity
WHERE state ='active' AND query NOT ILIKE '%pg_stat_activity%' AND (now() - pg_stat_activity.query_start) > interval '5 seconds'
ORDER BY duration desc;

This would give you a result like:

-[ RECORD 1 ]------------------------------------------------------------
pid      | 6473
duration | 00:00:01.011715
state    | active
query    | ..........

Now you have the pid of the query, lets try to get some more information on the query.

strace -f -s2048 -p 6473

Running a strace on the query gives you more information on what the query is doing.

In my case, it was waiting indefinitely for acquiring a lock, which would never succeed. Solution for me was to terminate the query as it had already gone rogue.

SELECT pg_cancel_backend(6473)

the pg_cancel_backend command kills the running query.