order by needed

If you run a query against a table, and run it again, the output will probably be displayed the same way. This order is not guarenteed (without an order by clause). but face it, its going to happen. Again and again. Unless something changes – Kenneth Fisher wrote a blog about this in SQLServerCentral.

He creates a table, put data in it, and then a primary key on the locationID and a NCI on state+zip code.

A query on (state = ‘TX’) will return data. It happens to be ordered by state+zip code. On a system that doesn’t change, we can reasonably anticipate that this will be the same order for subsequent queries.

However, lets now insert a lot of “state = TX” data. The first query used a seek to get the 10-100 or so TX entries. The seek was using the NCI of State and ZIP – the output was ordered in the order of the index. If we insert enough TX to the table, the Query Optimizer decides that a clustered index scan will be more efficient, and the order of the output table, enforced by the clustered index, is by locationID.

Same query, same schema. Assuming “SET AUTO_UPDATE_STATISTICS ON” is set, then enough inserts will force a stats update, otherwise the Execution Plan changed.  Running the two queries in Show Plan mode will, in this case, show the change of plan. Inserting enough data MIGHT change the output returned, and also might change the order returned.

In our case, we are building tables that will be either used as reports (display matters) or as source for other queries – applications. In the second case, the order should probably be most efficiently handled by the application – less data, multiple queries, possible changes in order anyway. The exception would be when we are debugging or testing. We want to see items that are sorted or ordered together. We should consider removing the order by after that if not needed. Also note that (“select top 1000 *”) is quick. (“select top 1000 * … order by …”) is effectively (select ALL * … order by and FINALLY display first 1000).  Which is about the same speed as (“select *”) – which is slow.

Finally, even if our output table is in the “right order’, the QA will not know that unless there is an index that will indicate the order to the QA.  Angels, and QA, have no memory of how that intermediate table was generated, so doing the extra order by work without the ‘hinting’ index, is not useful.