Ranking Function in SQL

Posted on 21/11/2024 in posts sql


What are the different types of ranking function in SQL. I will take the example of SQL implemented in Snowflake but other sql engines have similar format.

WITH corn_production AS (
SELECT 
v.VALUE[0] AS farmer_ID,
v.VALUE[1] AS state,
v.VALUE[2] AS bushels,
FROM LATERAL FLATTEN([
[1, 'Iowa', 100],
[2, 'Iowa', 110],
[3, 'Kansas', 120],
[4, 'Kansas', 130],
[5, 'Kansas', 130],
[6, 'Kansas', 130]
]) AS v
)
SELECT 
farmer_ID, state, bushels,
ROW_NUMBER() OVER (ORDER BY bushels DESC) row_num,
RANK() OVER (ORDER BY bushels DESC) srank,
DENSE_RANK() OVER (ORDER BY bushels DESC) drank,
PERCENT_RANK() OVER (ORDER BY bushels DESC) prank,
CUME_DIST() OVER (ORDER BY bushels DESC) cdist,
FROM corn_production
ORDER BY row_num;
FARMER_ID STATE BUSHELS ROW_NUM SRANK DRANK PRANK CDIST
4 "Kansas" 130 1 1 1 0 0.5
5 "Kansas" 130 2 1 1 0 0.5
6 "Kansas" 130 3 1 1 0 0.5
3 "Kansas" 120 4 4 2 0.6 0.6667
2 "Iowa" 110 5 5 3 0.8 0.8333
1 "Iowa" 100 6 6 4 1 1

The ranking functions available are ROW_NUMBER, RANK, DENSE_RANK, PERCENTILE_RANK, and CUME_DIST.

ROW_NUMBER, RANK, DENSE_RANK provide numerical ranks. PERCENTILE_RANK, and CUME_DIST provide percentile based ranks.

The challenge is when you have column values which are duplicates leading to tie resolution in

If you just want ranks for top K, use ROW_NUMBER, it will break ties arbitrarily. RANK will duplicate ranks but will also create gaps in ranks. DENSE_RANK will duplicate ranks but will not create rank gaps.

PERCENT_RANK and CUME_DIST are good if you want to know where the element lies in the distribution.