Apr 30 2007

SQL Alphabetical Sorting Without “The”

Published by Chris McAvoy at 2:33 pm under Python, Ruby, cranky

Hey Lazyweb, is there some clever pattern that addresses passing an ‘order by’ to a database and getting an alphabetically sorted list where “the” doesn’t factor into the sort?

Look at this gigantic list of bands. It’s annoying to have to remember that “The” screws everything up.

Do I create a field “name_without_the” and select * from bands order by name_without_the? I’m assuming this is a common alpha-issue that’s been solved millions of times, but I can’t find a best-practice suggestion via two minutes of Googling.

Please suggest clever ways to do this without sorting in-memory or some junk like that. Let’s keep it in the database folks.


Share and Enjoy:
These icons link to social bookmarking sites where readers can share and discover new web pages.

  • del.icio.us
  • digg
  • Reddit
  • StumbleUpon

6 Responses to “SQL Alphabetical Sorting Without “The””

  1. Peter Feinon 30 Apr 2007 at 6:19 pm

    Oh god, let’s see if I can remember how to do such a thing…

    SELECT name, (IF name ILIKE ‘the *’ THEN name[4:] ELSE name) as sort_name ORDER BY sort_name

    the name[4:] is obviously python & there’s probably a faster way than ILIKE & you may need to stick the IF clause in the order by, but that’s the gist.

  2. Rick Morrisonon 30 Apr 2007 at 6:42 pm

    Most databases have a string function that performs similarly to Python’s string.replace()

    Just order by an expression using this function to remove the “The”.

  3. Peter Harkinson 30 Apr 2007 at 7:27 pm

    SELECT name FROM bands
    ORDER BY CASE
    WHEN LOWER(LEFT(name, 2)) =”a ” THEN SUBSTRING(name, 3)
    WHEN LOWER(LEFT(name, 3)) =”an ” THEN SUBSTRING(name, 4)
    WHEN LOWER(LEFT(name, 4)) =”the ” THEN SUBSTRING(name, 5)
    ELSE name
    END;

    If this isn’t fast enough (and it likely won’t be for a gigantic list because it means scanning every row instead of using an index), definitely create a separate column. I might not create ‘name_without_the’, I might create ‘name_article’ and let my app split off the leading article. It’s generally bad to have interdependent columns, but I think I like it a little more than partial data duplication. I’d have to ponder it a bit.

  4. Chris McAvoyon 01 May 2007 at 8:41 am

    As much as I hate to do it, I think a non-the column is the solution that makes the most sense to me. It does mean data duplication, but in a read-heavy write-light database I think that it isn’t as big a deal. The non-stated part of this whole discussion is that the app is running on Django, so the pain of dropping to pure SQL rather than putting in a “strip the / a / an” hook pre-save seems like it might not be worth it. That said, thanks for the SQL lesson, I’ve never tried to put conditionals in queries, other than basic WHERE stuff. Thanks.

    Chris

  5. […] Chris McAvoy asked how to sort alphabetically so that entries starting with “A”, “An”, or “The” end up in the proper place instead of jumbled into the As and Ts. Do I create a field “name_without_the” and select * from bands order by name_without_the? I’m assuming this is a common alpha-issue that’s been solved millions of times, but I can’t find a best-practice suggestion via two minutes of Googling. […]

  6. johnnnnnnnon 01 May 2007 at 10:09 pm

    What database are you using? If you’re using PostgreSQL, you can do the extra-column solution without actually creating an extra column: create a functional index. Then include the function in your select, and it hits the index instead.

    I’m not sure how that would work through Django, though i do know if you have to drop to SQL, you can still create a QuerySet object from it.

Trackback URI | Comments RSS

Leave a Reply