row_number, rank, dense rank and ntile

ROW_NUMBER

A t-sql function, row_number() will put a counter on the output of a query. You supply the table via a select statement, include as a column row_number over (order by id) as Row to the select, and you get a column named ROW and a sequence of numbers for each row. Why do you need the Order by (id) part of the function? Do you want (i don’t care, you can’t have it) the order of output different every time> TH efunction needs to have a definite row to number and you just pick one. ID is safe, how about [Last Name] — and you have ties (you have several last names the same) If you did the select, you WOULD get output – and if you are using ROW_NUMBER over (last name), you get an increasing number for each row.

What if you don’t want to see all the rows, just 50 through 60? This is why you are using row_number to get the row count. You can insert a where clause and the predicate of Where row between 50 and 60

select row_number over (order by last_name) as row
, first_name
, last_name
, dob
, gender from table where row is between 50 and 60

So you have thousands in the list, and you just want old women, like with a dob of ‘1947’? You can get them with the where clause, and just row_number them, but say you are a little more lenient with your age category and don’t want to mess with where dob = ‘1947’ or dob = ‘1948’ or …

You can still select females from the where, and row-count has a partition in addition to order by parameter.

select row_number over (partition by dob order by last_name) as row, first_name, last_name, dob, gender from table order by dob desc

You will get back females with each dob age group having a row_number in it, starting over from 1 as the dob changes. This is an awfully lot like a rank except it is based on last_name within the partition of the dob.

RANK

This works very similarly to row_number with the same over(partition order) arguments, except that the rank of the last name Anderson will be the same for all five andersons (all rank 1) and THEN Baker starts at 6 because it is sixth after the top 5.

DENSE RANK

If you think that unfair and that Baker should be 2, then use DENSE RANK. There are still 5 #1, but then Baker is #2 and Carlson is #3.

NTILE

breaks the ranking up into a number of chunks. If you evoke NTILE(5), you have five groups of 4 out of a row_count of 20. If you had 21, you would have one group of 5 and 4 groups of 4 – rows 1-5 in group 1. 22 would create 5 in 1, 5 in 2 and 4 in groups 3 through 5.

SUMMARY:  Row_number orders the table and assigns a line number to each row – this counts the output.  If you include a partition by … argument, you get each partition counted and the count restarts in a new partition.  The sums of the MAX row_number over all partitions should be the count of the table.

If you want a rank, use rank() over (partition, order) .  Things in the same partition are ranked the same.  The next partition starts with the row_number as the rank in RANK, or the next number as DENSE RANK.  Dense Rank has ranks that are consequent and monotonic.  Rank can have non-constitutive number – where ever two or more are ranked together, the next rank will be missing numbers.

Oh, this should be perfectly obvious, but since i just did it, i will mention that if you partition by a field that does not repeat, every ROW NUMBER will be one.  I mean “1”.

Advertisements