Histograms in PostgreSQL query tool


Assuming you have a table transactions with a balance column:

SELECT  
		WIDTH_BUCKET(balance, 0, 1000, 15) AS "bucket",
		NUMRANGE(min(balance), max(balance), '[]') AS "range",
		COUNT(*) as freq,
		REPEAT('*',(COUNT(*)::float / max(COUNT(*)) over() * 30)::INT) AS "bar"
	FROM transactions
	GROUP BY bucket
	ORDER BY bucket

Notes:

  • 0 is the smallest number you expect
  • 1000 is the largest number you expect
  • 15 is the number of buckets
  • 30 is the length of the longest bar

If you want to see the empty buckets, generate the series and join:

SELECT 
		buckets.bucket,
		range,
		COALESCE(freq, 0) AS "count",
		REPEAT('*',(freq::float / max(freq) over() * 30)::INT) AS "bar"
	FROM 
		(SELECT GENERATE_SERIES::INT AS "bucket" FROM GENERATE_SERIES(0, 21)) AS "buckets"
	LEFT OUTER JOIN
		(SELECT  WIDTH_BUCKET(balance, 0, 2000, 20) AS "bucket",
			NUMRANGE(min(balance), max(balance), '[]') AS "range",
			COUNT(*) as freq
			FROM transactions
			GROUP BY bucket
		) AS histogram
		ON buckets.bucket = histogram.bucket
	ORDER BY buckets.bucket

Note: the GENERATE_SERIES() parameters should be zero and # buckets + 1 if there might be values outside the WIDTH_BUCKET min and max

If you don’t know the min and max in advance:

WITH minmax AS (SELECT MIN(balance), MAX(balance) FROM transactions)
SELECT 
		buckets.bucket,
		range,
		COALESCE(freq, 0) AS "count",
		REPEAT('*',(freq::float / max(freq) over() * 30)::INT) AS "bar"
	FROM 
		(SELECT GENERATE_SERIES::INT AS "bucket" FROM GENERATE_SERIES(1, 15)) AS "buckets"
	LEFT OUTER JOIN
		(SELECT  WIDTH_BUCKET(balance, min, max+1, 15) AS "bucket",
			NUMRANGE(min(balance), max(balance), '[]') AS "range",
			COUNT(*) as freq
			FROM transactions, minmax
			GROUP BY bucket
		) AS histogram
		ON buckets.bucket = histogram.bucket
	ORDER BY buckets.bucket

Note: the +1 for max in WIDTH_BUCKET is because the max value will end up above the bucket range

Source

Date: 2018-03-12

Tags:  postgresql

Share: