Ranking Function in SQL
What are the different types of ranking function in SQL. I will take the example of SQL implemented in Snowflake but other sql engines have similar format.
WITH corn_production AS (
SELECT
v.VALUE[0] AS farmer_ID,
v.VALUE[1] AS state,
v.VALUE[2] AS bushels,
FROM LATERAL FLATTEN([
[1, 'Iowa', 100],
[2, 'Iowa', 110],
[3, 'Kansas', 120],
[4, 'Kansas', 130],
[5, 'Kansas', 130],
[6, 'Kansas', 130]
]) AS v
)
SELECT
farmer_ID, state, bushels,
ROW_NUMBER() OVER (ORDER BY bushels DESC) row_num,
RANK() OVER (ORDER BY bushels DESC) srank,
DENSE_RANK() OVER (ORDER BY bushels DESC) drank,
PERCENT_RANK() OVER (ORDER BY bushels DESC) prank,
CUME_DIST() OVER (ORDER BY bushels DESC) cdist,
FROM corn_production
ORDER BY row_num;
FARMER_ID | STATE | BUSHELS | ROW_NUM | SRANK | DRANK | PRANK | CDIST |
---|---|---|---|---|---|---|---|
4 | "Kansas" | 130 | 1 | 1 | 1 | 0 | 0.5 |
5 | "Kansas" | 130 | 2 | 1 | 1 | 0 | 0.5 |
6 | "Kansas" | 130 | 3 | 1 | 1 | 0 | 0.5 |
3 | "Kansas" | 120 | 4 | 4 | 2 | 0.6 | 0.6667 |
2 | "Iowa" | 110 | 5 | 5 | 3 | 0.8 | 0.8333 |
1 | "Iowa" | 100 | 6 | 6 | 4 | 1 | 1 |
The ranking functions available are ROW_NUMBER
, RANK
, DENSE_RANK
, PERCENTILE_RANK
, and CUME_DIST
.
ROW_NUMBER
, RANK
, DENSE_RANK
provide numerical ranks.
PERCENTILE_RANK
, and CUME_DIST
provide percentile based ranks.
The challenge is when you have column values which are duplicates leading to tie resolution in
If you just want ranks for top K, use ROW_NUMBER
, it will break ties arbitrarily. RANK
will duplicate ranks but will also create gaps in ranks. DENSE_RANK
will duplicate ranks but will not create rank gaps.
PERCENT_RANK
and CUME_DIST
are good if you want to know where the element lies in the distribution.