SQL For Fun: Parsing the iTunes XML
What happens if one part of your personality is data-parsing nerd and another part is music lover nerd?
You end up trying to answer questions like, “Which songs did I have rated with 4 stars last year that are lower than that now?”
Or, “How many songs that were added to my library before this year started have been played this year?”
iTunes makes it easy to create smart playlists that can answer a lot of these questions. But iTunes keeps your current play count and your current rating. You can’t get information about the state of your music library in the past and then run queries to see how things have changed.
It does, however, let you export a snapshot of your library data to an XML format, and if we save that XML with time stamps in a SQL database, we can answer these questions ourselves.
To get an XML snapshot of your iTunes music library, just choose File –> Library –> Export Library… from the iTunes menu. It will prompt you for the path and name of the file to save.
The XML looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
Major Version1 Minor Version1 Date2019-03-18T21:30:26Z Application Version12.9.2.5 Features5 Show Content Ratings Music Folderfile:///Users/ben/Sync/Itunes%20Media/ Library Persistent IDADF4A4B1DD12E724 Tracks 7434 Track ID7434 NameSatisfy My Soul ArtistBob Marley & The Wailers AlbumLegend GenreReggae KindMPEG audio file Size4382096 Total Time271568 Track Number12 Year1984 Date Modified2009-09-16T01:27:43Z Date Added2007-05-11T09:13:43Z Bit Rate128 Sample Rate44100 Play Count9 Play Date3513474987 Play Date UTC2015-05-03T10:16:27Z Rating60 Album Rating1 Normalization758 Artwork Count1 Persistent IDB3CA51792568D814 Track TypeFile Locationfile:///Users/ben/Sync/Itunes%20Media/Music/Bob%20Marley%20&%20The%20Wailers/Legend/12%20Satisfy%20My%20Soul.mp3 File Folder Count5 Library Folder Count1 7436 Track ID7436 NameThe Fire Inside ArtistBob Seger Album ArtistBob Seger & the Silver Bullet Band ComposerBob Seger AlbumGreatest Hits (Bob Seger) GenreClassic Rock KindMPEG audio file Size5742282 Total Time355787 Track Number11 Year1994 Date Modified2007-05-11T09:28:01Z Date Added2007-05-11T09:13:43Z Bit Rate128 Sample Rate44100 Comments 00000424 000004D5 000019C9 00002268 00027145 00027145 000076D9 00007297 0004E217 00002755 Play Count1 Play Date3290083450 Play Date UTC2008-04-03T21:04:10Z Rating40 Album Rating1 Normalization1237 Artwork Count1 Sort NameFire Inside Persistent IDB3CA51792568D815 Disabled Track TypeFile Locationfile:///Users/ben/Sync/Itunes%20Media/Music/Bob%20Seger%20&%20the%20Silver%20Bullet%20Band/Greatest%20Hits%20(Bob%20Seger)/11%20The%20Fire%20Inside.mp3 File Folder Count5 Library Folder Count1 7438 Track ID7438 NameLove for Sale ArtistBon Jovi Album ArtistBon Jovi ComposerJon Bon Jovi/Richie Sambora AlbumNew Jersey GenreHard Rock KindMPEG audio file Size3820442 Total Time236721 Track Number12 Year1988 Date Modified2009-03-23T01:03:29Z Date Added2007-05-11T09:13:43Z Bit Rate128 Sample Rate44100 Comments 000000D0 00000158 00000387 0000053F 0001D4EE 0001FBE7 000034F7 000031D5 0001119E 0001119E Play Count29 Play Date3623408307 Play Date UTC2018-10-26T19:18:27Z Skip Count1 Skip Date2016-07-02T23:29:42Z Rating80 Album Rating1 Normalization344 Artwork Count1 Persistent IDB3CA51792568D817 Track TypeFile Locationfile:///Users/ben/Sync/Itunes%20Media/Music/Bon%20Jovi/New%20Jersey/12%20Love%20for%20Sale.mp3 File Folder Count5 Library Folder Count1 |
Let’s make each other a deal — I’ll finish writing up this process, but you can’t make fun of my music based on the handful of songs you see here. (But it’s completely fair game to make fun of me based on what I’m actually listening to in real time, which you can see over on my personal web site. ) Agreed? Good. Let’s continue.
This XML is an Apple “property list” document, which is a super-generic format used to pass information around in their software. To make it super-generic, the data is just stored as key-value pairs. If I really wanted to drive clicks to this post, I would call it, “Using SQL Server to parse property list XML document,” or something like that. There must be more business-critical use cases for this stuff.
Before we try to parse our document, let’s get it into SQL.
1 2 |
SELECT CONVERT(XML, BulkColumn, 2) FROM OPENROWSET(BULK '\\VBOXSVR\itunes_xml\Library.xml', SINGLE_BLOB) rowsetresults |
Here’s a query that will open our file and then display it as an XML document in a result set. You will have to change the path to point it to your exported file, of course, but other than that, this is as easy as advertised.
I’m going a step farther with this. I’m going to create a table named LibrarySnapshots and then load each XML into a different row in that table, along with a date stamp. First, let’s create the table.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE [dbo].[LibrarySnapshots]( [LibrarySnapshotID] [int] IDENTITY(1,1) NOT NULL, [LibraryXML] [xml] NOT NULL, [LibraryDate] [date] NOT NULL, CONSTRAINT [PK_LibrarySnapshots] PRIMARY KEY CLUSTERED ( [LibrarySnapshotID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO ALTER TABLE [dbo].[LibrarySnapshots] ADD CONSTRAINT [DF_LibrarySnapshots_LibraryDate] DEFAULT (getdate()) FOR [LibraryDate] GO |
Now, we can take our simple import statement and make it insert to our table.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
;WITH newXML(XMLDoc) AS ( SELECT CONVERT(XML, BulkColumn, 2) FROM OPENROWSET(BULK '\\VBOXSVR\itunes_xml\Library.xml', SINGLE_BLOB) rowsetresults ), oldXML AS ( SELECT TOP 1 LibraryXML, LibraryDate FROM dbo.LibrarySnapshots ORDER BY LibraryDate DESC ) INSERT dbo.LibrarySnapshots (LibraryXML) SELECT XMLDoc FROM newXML CROSS JOIN oldXML WHERE CAST(newXML.XMLDoc AS nvarchar(max)) <> CAST(oldXML.LibraryXML AS nvarchar(max)) AND oldXML.LibraryDate <> CAST(GETDATE() AS date) |
You recognize the first Common Table Expression as our import statement. The second CTE looks at the most recent XML document that’s already in the table. Then, our WHERE clause makes sure that these two XML docs aren’t equal, so that I don’t use up space importing the same XML over and over.
Now, let’s look at that hard part: extracting the information that we want to know from the XML. You may have noticed this, but the format of the XML document makes it hard to parse. We can’t just find an element named “name” and check the value for the name of the song. There’s an xpath axes named “following-sibling” which in normal XML parsing would let us write an xpath that says, “find the element named “key” with the value of “Name”, then give me the value of the element after that.” But SQL Server (at least SQL Server 2016, where I’m doing this work) doesn’t support “following-sibling.” When I discovered that. I almost gave up on this project.
Let me dump my final SELECT statement here for you, and then we can discuss what it’s doing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
SELECT ls.LibrarySnapshotID, ls.LibraryDate, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Track ID"])[1]])[1]', 'int') as id, SongXML.n.VALUE('let $a := . return (./string[. >> $a/key[.="Persistent ID"][1]])[1]', 'char(16)') AS PersistentID, SongXML.n.VALUE('let $a := . return (./string[. >> $a/key[.="Artist"][1]])[1]', 'nvarchar(1000)') AS Artist, SongXML.n.VALUE('let $a := . return (./string[. >> ($a/key[.="Album"])[1]])[1]', 'nvarchar(1000)') as Album, SongXML.n.VALUE('let $a := . return (./string[. >> ($a/key[.="Name"])[1]])[1]', 'nvarchar(1000)') as TrackName, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Rating"])[1]])[1]', 'int') as Rating, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Play Count"])[1]])[1]', 'int') as Playcount, SongXML.n.VALUE('let $a := . return (./string[. >> ($a/key[.="Genre"])[1]])[1]', 'nvarchar(1000)') as Genre, SongXML.n.VALUE('let $a := . return (./date[. >> ($a/key[.="Play Date UTC"])[1]])[1]', 'datetime2') as LastPlayed, SongXML.n.VALUE('let $a := . return (./date[. >> ($a/key[.="Date Added"])[1]])[1]', 'datetime2') as DateAdded, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Bit Rate"])[1]])[1]', 'int') as BitRate, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Year"])[1]])[1]', 'int') as [Year], CASE WHEN SongXML.n.VALUE('let $a := . return (./true[. >> ($a/key[.="Loved"])[1]])[1]', 'char(1)') IS NOT NULL THEN '1' ELSE '0' END AS Loved, CASE WHEN SongXML.n.VALUE('let $a := . return (./true[. >> ($a/key[.="Disabled"])[1]])[1]', 'char(1)') IS NOT NULL THEN '0' ELSE '1' END AS Checked, SongXML.n.VALUE('let $a := . return (./integer[. >> ($a/key[.="Skip Count"])[1]])[1]', 'int') as SkipCount, SongXML.n.VALUE('let $a := . return (./date[. >> ($a/key[.="Skip Date"])[1]])[1]', 'datetime2') as LastSkipped FROM dbo.LibrarySnapshots ls CROSS APPLY ls.LibraryXML.nodes('plist/dict/dict/dict') AS SongXML(n) |
Let’s start with the FROM clause. dbo.LibrarySnapshots is our table with the XML documents. We use CROSS APPLY to get a representation of the XML nodes. Each node is the whole collection of elements that represents one song. (If you look, each song is in the XML path — or xpath — of plist/dict/dict/dict.) Each one is its own small XML document, and we can run XML functions on those.
Running another XML function is exactly what we do when we select our columns out of each element. We apply the VALUE function to each node to get the value out that we want.
The xpath expression is what gets messy. I tried several variations, and this is the only way I could find to get it to work. They all have more or less the same structure, so let’s dig in to one.
1 |
let $a := . |
Here, we assign the variable named $a, the value that means “right here in this node”.
1 |
($a/key[.="Name"])[1] |
This means, “starting at $a, find an element named ‘key’ whose value is ‘Name'”. In theory, there could be more than one of those (there won’t be in this case, but the XML parser can’t know that), so [1] just means “use the first one of those that you see”.
1 |
./string[. >> ( ... )[1] |
The stuff in the parentheses is what we just talked about. This additional syntax means “find the element named ‘string’ that follows that bit that you just found one step before”. Again, there could be more than one (there won’t be), so we add [1] to tell it to use the first one found.
That’s our whole xpath. The second parameter is the data type that you want it to have once it’s extracted. For the values that are binary, I look for the element named “true” and wrap it in a CASE statement so that the value in my column is either one or zero.
The result of my select statement is that I get my entire library, divided into columns that I can use my super-data powers to query:
I know this screen shot is super-small pasted here in the blog, but we have a row for each song, along with columns for all of our song’s attributes that we parsed. You can click on the image to see it a little bit bigger.
Now, it’s easy to answer the questions I posed at the beginning of the article. Note that I’ve saved the SELECT statement as a view for easy querying.
Question: How many songs that were added to my library before this year started have been played this year?
1 2 3 4 |
SELECT COUNT(*) FROM vwLibrary libEnd2018 JOIN vwLibrary libNow on libEnd2018.LibraryDate = '2018-12-27' AND libNow.LibraryDate = '2019-05-09' AND libEnd2018.PersistentID = libNow.PersistentID WHERE libEnd2018.Playcount < libNow.Playcount |
Answer: 843
Question:
Which songs did I have rated with 4 stars last year that are lower than that now?
1 2 3 4 |
SELECT libEnd2018.Artist, libEnd2018.TrackName, libEnd2018.Rating AS OldRating, libNow.Rating AS CurrentRating FROM vwLibrary libEnd2018 JOIN vwLibrary libNow on libEnd2018.LibraryDate = '2018-12-27' AND libNow.LibraryDate = '2019-05-09' AND libEnd2018.PersistentID = libNow.PersistentID WHERE libEnd2018.Rating >= 80 AND libNow.Rating < 80 -- For some reason, itunes stores these as # of stars * 20 |
Answer:
Still good songs, but for me, 4 stars is reserved for the ones where you have to crank it up when the song comes on.
If any of you implement any of this code or do something similar but in a different way, please let me know. This is what I do instead of seeing movies set in the Marvel Universe, I guess!