Return a limited but random selection of rows per category

18 May 2011

I manage my MP3 collection with my own hand-written tool because I already have a big database with information about the bands and songs.

One of the features of the playlist manager is to specify custom SQL code to select any of the 30,000+ songs in the database for the playlist.

This allows me to have playlists like:

You get the idea.

For my next big trip I was thinking of which songs to take along. I only have about 40 Gig space for music on my player, which is enough for roughly 10,000 files.

One idea was to take songs from my all-time favourite bands, but the filter already returns over 10,000 songs and I wanted to add some other tracks as well. So I said, lets take just 50 songs from each of my 250 favourite bands. Which songs? I don't care, they can be random, but I don't want a song twice, regardless of the fact that I have duplicate MP3s because they have been released on multiple albums or compilations. The SQL to do this was not super simple, so that's why I documented here.

To keep things simpler, I first created two views, the first one filters out duplicate song titles:

CREATE VIEW UniqueSongs 
AS 
SELECT MIN(FileID) AS FileId, Title, BandKey  
FROM SoundFiles 
GROUP BY Title, BandKey

the second one returns my favourite bands:

CREATE VIEW FavouriteBands
AS 
SELECT BandKey FROM Bands WHERE Rank > 8

Now to the main query, which uses a Common Table Expression:

WITH CTE AS
(
SELECT Row_Number() OVER (Partition BY b.BandKey ORDER BY ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))) As RowNo, 
u.FileId FROM UniqueSongs u INNER JOIN FavouriteBands b ON b.BandKey = u.BandKey
)
SELECT FileName, RootFolder + Location as Location, BandName, Title, Duration 
FROM SoundFiles 
WHERE FileId IN 
(
SELECT FileId FROM CTE WHERE RowNo <= 50
)

The first SELECT gets all the MP3s for my favourite bands,

ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)))

just returns a random number which we order by, so we get a different set of up to 50 songs each time we run the query.
The really cool part here is:

Row_Number() OVER (Partition BY b.BandKey ORDER BY

We partition our data by the BandKey in a random order and each row gets an new row number one higher than the previous one. For each new band, the row_number is reset to 1.

Here's how that looks for 3 songs per band:

The second SELECT gets the actual data required to build a playlist, the interesting part is in the sub-select in the WHERE clause:

SELECT FileId FROM CTE WHERE RowNo <= 50

Here we use our common table expression from above to get the first 50 fileIDs per band. Remember the RowNo is a continues increasing number, so filtering for anything less 51, gets us the first 50. Cool!

Pages in this section

Categories

ASP.Net | Community | Development | IIS | IT Pro | Security | SQL (Server) | Tools | Web | Work on the road | Windows