One of the most obvious and useful set of window functions are ranking functions where rows from your result set are ranked according to a certain scheme. There are three ranking functions:
ROW_NUMBER()
RANK()
DENSE_RANK()
The difference is easy to remember. For the examples, let’s assume we have this table (using PostgreSQL syntax):
1
2
3
4
5
CREATE
TABLE
t(v)
AS
SELECT
*
FROM
(
VALUES
(
'a'
),(
'a'
),(
'a'
),(
'b'
),
(
'c'
),(
'c'
),(
'd'
),(
'e'
)
) t(v)
ROW_NUMBER()
… assigns unique numbers to each row within the PARTITION
given the ORDER BY
clause. So you’d get:
1
2
SELECT
v, ROW_NUMBER() OVER()
FROM
t
Note that some SQL dialects (e.g. SQL Server) require an explicit ORDER BY
clause in the OVER()
clause:
1
2
SELECT
v, ROW_NUMBER() OVER(
ORDER
BY
v)
FROM
t
The above query returns:
| V | ROW_NUMBER |
|---|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
RANK()
… behaves like ROW_NUMBER()
, except that “equal” rows are ranked the same. If we substitute RANK()
into our previous query:
1
2
SELECT
v, RANK() OVER(
ORDER
BY
v)
FROM
t
… then the result we’re getting is this:
| V | RANK |
|---|------|
| a | 1 |
| a | 1 |
| a | 1 |
| b | 4 |
| c | 5 |
| c | 5 |
| d | 7 |
| e | 8 |
As you can see, much like in a sports ranking, we have gaps between the different ranks. We can avoid those gaps by using
DENSE_RANK()
Trivially, DENSE_RANK()
is a rank with no gaps, i.e. it is “dense”. We can write:
1
2
SELECT
v, DENSE_RANK() OVER(
ORDER
BY
v)
FROM
t
… to obtain
| V | DENSE_RANK |
|---|------------|
| a | 1 |
| a | 1 |
| a | 1 |
| b | 2 |
| c | 3 |
| c | 3 |
| d | 4 |
| e | 5 |
One interesting aspect of DENSE_RANK()
is the fact that it “behaves like” ROW_NUMBER()
when we add the DISTINCT
keyword.
1
2
SELECT
DISTINCT
v, DENSE_RANK() OVER(
ORDER
BY
v)
FROM
t
… to obtain
| V | DENSE_RANK |
|---|------------|
| a | 1 |
| b | 2 |
| e | 5 |
| d | 4 |
| c | 3 |
In fact, ROW_NUMBER()
prevents you from using DISTINCT
, because ROW_NUMBER()
generates unique values across the partition before DISTINCT
is applied:
1
2
3
SELECT
DISTINCT
v, ROW_NUMBER() OVER(
ORDER
BY
v)
FROM
t
ORDER
BY
1, 2
DISTINCT
has no effect:
| V | ROW_NUMBER |
|---|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
ROW_NUMBER()
RANK()
DENSE_RANK()
1
2
3
4
5
| CREATE TABLE t(v) AS SELECT * FROM ( VALUES ( 'a' ),( 'a' ),( 'a' ),( 'b' ), ( 'c' ),( 'c' ),( 'd' ),( 'e' ) ) t(v) |
PARTITION
given the ORDER BY
clause. So you’d get:
1
2
| SELECT v, ROW_NUMBER() OVER() FROM t |
ORDER BY
clause in the OVER()
clause:
1
2
| SELECT v, ROW_NUMBER() OVER( ORDER BY v) FROM t |
ROW_NUMBER()
, except that “equal” rows are ranked the same. If we substitute RANK()
into our previous query:
1
2
| SELECT v, RANK() OVER( ORDER BY v) FROM t |
DENSE_RANK()
is a rank with no gaps, i.e. it is “dense”. We can write:
1
2
| SELECT v, DENSE_RANK() OVER( ORDER BY v) FROM t |
DENSE_RANK()
is the fact that it “behaves like” ROW_NUMBER()
when we add the DISTINCT
keyword.
1
2
| SELECT DISTINCT v, DENSE_RANK() OVER( ORDER BY v) FROM t |
ROW_NUMBER()
prevents you from using DISTINCT
, because ROW_NUMBER()
generates unique values across the partition before DISTINCT
is applied:
1
2
3
| SELECT DISTINCT v, ROW_NUMBER() OVER( ORDER BY v) FROM t ORDER BY 1, 2 |
DISTINCT
has no effect:Putting it all together
A good way to understand the three ranking functions is to see them all in action side-by-side. Run this query
1
2
3
4
5
6
7
SELECT
v,
ROW_NUMBER() OVER(
ORDER
BY
v),
RANK() OVER(
ORDER
BY
v),
DENSE_RANK() OVER(
ORDER
BY
v)
FROM
t
ORDER
BY
1, 2
… or this one (using the SQL standard WINDOW
clause, to reuse window specifications):
1
2
3
4
5
6
7
SELECT
v,
ROW_NUMBER() OVER(w),
RANK() OVER(w),
DENSE_RANK() OVER(w)
FROM
t
WINDOW w
AS
(
ORDER
BY
v)
… to obtain:
| V | ROW_NUMBER | RANK | DENSE_RANK |
|---|------------|------|------------|
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
Note that unfortunately, the WINDOW
clause is not supported in all databases.
1
2
3
4
5
6
7
| SELECT v, ROW_NUMBER() OVER( ORDER BY v), RANK() OVER( ORDER BY v), DENSE_RANK() OVER( ORDER BY v) FROM t ORDER BY 1, 2 |
WINDOW
clause, to reuse window specifications):
1
2
3
4
5
6
7
| SELECT v, ROW_NUMBER() OVER(w), RANK() OVER(w), DENSE_RANK() OVER(w) FROM t WINDOW w AS ( ORDER BY v) |
WINDOW
clause is not supported in all databases.
Yorumlar
Yorum Gönder