Lonely Lion

Chris McAvoy likes kites

SQL Alphabetical Sorting Without "The"

with 6 comments

Hey Lazyweb, is there some clever pattern that addresses passing an ‘order by’ to a database and getting an alphabetically sorted list where “the” doesn’t factor into the sort?

Look at this gigantic list of bands. It’s annoying to have to remember that “The” screws everything up.

Do I create a field “name_without_the” and select * from bands order by name_without_the? I’m assuming this is a common alpha-issue that’s been solved millions of times, but I can’t find a best-practice suggestion via two minutes of Googling.

Please suggest clever ways to do this without sorting in-memory or some junk like that. Let’s keep it in the database folks.

Written by Chris

April 30th, 2007 at 2:33 pm

Posted in cranky,Python,Ruby

  • http://www.pobox.com/~pfein/ Peter Fein

    Oh god, let’s see if I can remember how to do such a thing…

    SELECT name, (IF name ILIKE ‘the *’ THEN name[4:] ELSE name) as sort_name ORDER BY sort_name

    the name[4:] is obviously python & there’s probably a faster way than ILIKE & you may need to stick the IF clause in the order by, but that’s the gist.

  • Rick Morrison

    Most databases have a string function that performs similarly to Python’s string.replace()

    Just order by an expression using this function to remove the “The”.

  • http://IconOClock.com Peter Harkins

    SELECT name FROM bands
    ORDER BY CASE
    WHEN LOWER(LEFT(name, 2)) =”a ” THEN SUBSTRING(name, 3)
    WHEN LOWER(LEFT(name, 3)) =”an ” THEN SUBSTRING(name, 4)
    WHEN LOWER(LEFT(name, 4)) =”the ” THEN SUBSTRING(name, 5)
    ELSE name
    END;

    If this isn’t fast enough (and it likely won’t be for a gigantic list because it means scanning every row instead of using an index), definitely create a separate column. I might not create ‘name_without_the’, I might create ‘name_article’ and let my app split off the leading article. It’s generally bad to have interdependent columns, but I think I like it a little more than partial data duplication. I’d have to ponder it a bit.

  • http://lonelylion.com Chris McAvoy

    As much as I hate to do it, I think a non-the column is the solution that makes the most sense to me. It does mean data duplication, but in a read-heavy write-light database I think that it isn’t as big a deal. The non-stated part of this whole discussion is that the app is running on Django, so the pain of dropping to pure SQL rather than putting in a “strip the / a / an” hook pre-save seems like it might not be worth it. That said, thanks for the SQL lesson, I’ve never tried to put conditionals in queries, other than basic WHERE stuff. Thanks.

    Chris

  • Pingback: Alphabetical Sorting SQL Without “The” - Push cx

  • http://phaedrusdeinus.org/ johnnnnnnn

    What database are you using? If you’re using PostgreSQL, you can do the extra-column solution without actually creating an extra column: create a functional index. Then include the function in your select, and it hits the index instead.

    I’m not sure how that would work through Django, though i do know if you have to drop to SQL, you can still create a QuerySet object from it.

Switch to our mobile site