SQL For Fun: Parsing the iTunes XML

What happens if one part of your personality is data-parsing nerd and another part is music lover nerd?

You end up trying to answer questions like, “Which songs did I have rated with 4 stars last year that are lower than that now?”

Or, “How many songs that were added to my library before this year started have been played this year?”

iTunes makes it easy to create smart playlists that can answer a lot of these questions. But iTunes keeps your current play count and your current rating. You can’t get information about the state of your music library in the past and then run queries to see how things have changed.

It does, however, let you export a snapshot of your library data to an XML format, and if we save that XML with time stamps in a SQL database, we can answer these questions ourselves.

To get an XML snapshot of your iTunes music library, just choose File –> Library –> Export Library… from the iTunes menu. It will prompt you for the path and name of the file to save.

The XML looks something like this:

Let’s make each other a deal — I’ll finish writing up this process, but you can’t make fun of my music based on the handful of songs you see here. (But it’s completely fair game to make fun of me based on what I’m actually listening to in real time, which you can see over on my personal web site. ) Agreed? Good. Let’s continue.

This XML is an Apple “property list” document, which is a super-generic format used to pass information around in their software. To make it super-generic, the data is just stored as key-value pairs. If I really wanted to drive clicks to this post, I would call it, “Using SQL Server to parse property list XML document,” or something like that. There must be more business-critical use cases for this stuff.

Before we try to parse our document, let’s get it into SQL.

Here’s a query that will open our file and then display it as an XML document in a result set. You will have to change the path to point it to your exported file, of course, but other than that, this is as easy as advertised.

I’m going a step farther with this. I’m going to create a table named LibrarySnapshots and then load each XML into a different row in that table, along with a date stamp. First, let’s create the table.

Now, we can take our simple import statement and make it insert to our table.

You recognize the first Common Table Expression as our import statement. The second CTE looks at the most recent XML document that’s already in the table. Then, our WHERE clause makes sure that these two XML docs aren’t equal, so that I don’t use up space importing the same XML over and over.

Now, let’s look at that hard part: extracting the information that we want to know from the XML. You may have noticed this, but the format of the XML document makes it hard to parse. We can’t just find an element named “name” and check the value for the name of the song. There’s an xpath axes named “following-sibling” which in normal XML parsing would let us write an xpath that says, “find the element named “key” with the value of “Name”, then give me the value of the element after that.” But SQL Server (at least SQL Server 2016, where I’m doing this work) doesn’t support “following-sibling.” When I discovered that. I almost gave up on this project.

Let me dump my final SELECT statement here for you, and then we can discuss what it’s doing.

Let’s start with the FROM clause. dbo.LibrarySnapshots is our table with the XML documents. We use CROSS APPLY to get a representation of the XML nodes. Each node is the whole collection of elements that represents one song. (If you look, each song is in the XML path — or xpath — of plist/dict/dict/dict.) Each one is its own small XML document, and we can run XML functions on those.

Running another XML function is exactly what we do when we select our columns out of each element. We apply the VALUE function to each node to get the value out that we want.

The xpath expression is what gets messy. I tried several variations, and this is the only way I could find to get it to work. They all have more or less the same structure, so let’s dig in to one.

Here, we assign the variable named $a, the value that means “right here in this node”.

This means, “starting at $a, find an element named ‘key’ whose value is ‘Name'”. In theory, there could be more than one of those (there won’t be in this case, but the XML parser can’t know that), so [1] just means “use the first one of those that you see”.

The stuff in the parentheses is what we just talked about. This additional syntax means “find the element named ‘string’ that follows that bit that you just found one step before”. Again, there could be more than one (there won’t be), so we add [1] to tell it to use the first one found.

That’s our whole xpath. The second parameter is the data type that you want it to have once it’s extracted. For the values that are binary, I look for the element named “true” and wrap it in a CASE statement so that the value in my column is either one or zero.

The result of my select statement is that I get my entire library, divided into columns that I can use my super-data powers to query:

I know this screen shot is super-small pasted here in the blog, but we have a row for each song, along with columns for all of our song’s attributes that we parsed. You can click on the image to see it a little bit bigger.

Now, it’s easy to answer the questions I posed at the beginning of the article. Note that I’ve saved the SELECT statement as a view for easy querying.

Question: How many songs that were added to my library before this year started have been played this year?

Answer: 843


Which songs did I have rated with 4 stars last year that are lower than that now?


Still good songs, but for me, 4 stars is reserved for the ones where you have to crank it up when the song comes on.

If any of you implement any of this code or do something similar but in a different way, please let me know. This is what I do instead of seeing movies set in the Marvel Universe, I guess!

Posted in Apple, iTunes, MS SQL, Transact SQL | Leave a comment

Comments are closed.