Magic: The Gathering Meets Data Science

Magic: The Gathering Meets Data Science

Magic: The Gathering has been one of my hobbies for years. Its large card base and long history make it a perfect fit for Data Analysis and Machine Learning.

In case you missed my Unsupervised Learning tutorial, I applied K-Means Clustering (an Unsupervised Learning technique) to a Magic: The Gathering Dataset I scraped myself from mtgtop8, using Python’s Scrapy library.

That article explains the technical side, but doesn’t get into the results, because I didn’t think my readers would be into it.

Since many people have stood up to voice their disagreement, I will now show you some of the things the Algorithm learned.

This will not be the first nor the last time that I say that unsupervised learning can be spooky with all it learns, even when you know how it works.

The Data

The Dataset I used for this project contained only professional decks from last year, from the Modern format. I did not include sideboards into this analysis. All of the decks I used for training and visualizations are available, alongside the code, in this GitHub project.

If you know of any good Dataset for casual decks, I’ll be happy to know in the comments. Otherwise, I may scrape one in the future.

For this analysis, I’m looking at 777 different decks, containing a total of 642 unique cards (counting lands).

The Results

First of all, I strongly encourage you to pull the repository and try the Jupyter Notebook yourself, as there may be some particular insights you find interesting that I may be missing.

That said, if you want to see what the Data say about a particular card (provided it is part of the competitive meta, which we’ve seen is small enough) ask me in the comments if you don’t see it here!

Now, the first question we’ll ask ourselves is…

What does each Magic: The Gathering cluster look like?

Remember, we clustered decks, not cards, so we would expect each cluster to roughly represent an archetype, particularly one seeing play in the Modern meta.

First of all: here are the counts for each cluster. That is, how many decks fell into each.

Quantity of decks that fell on each cluster after applying K-Means Clustering.

We can see right off the bat there are two particularly small clusters, with less than 30 decks each. Let’s take a closer look.

Cards on each cluster

For cluster number 4, I got the set of 40 cards that appeared the most times for each deck in it, and then took the intersection to see what they all had in common. I repeated that procedure for cluster number 6.

Cluster number 4:
{'Devoted Druid', 'Horizon Canopy', 'Ezuri, Renegade Leader', 'Forest', 'Elvish Archdruid', 'Pendelhaven', "Dwynen\\'s Elite", 'Llanowar Elves', 'Collected Company', 'Windswept Heath', 'Temple Garden', 'Westvale Abbey', 'Razorverge Thicket', 'Heritage Druid', 'Elvish Mystic', 'Nettle Sentinel','Eternal Witness', 'Cavern of Souls', 'Chord of Calling', 'Vizier of Remedies', 'Selfless Spirit'}
Cluster number 6:
{'Funeral Charm', 'Liliana of the Veil', "Raven\\'s Crime", 'Fatal Push', 'Thoughtseize', 'Wrench Mind', 'Bloodstained Mire', 'Smallpox', 'Inquisition of Kozilek', 'Mutavault', 'Urborg, Tomb of Yawgmoth','Infernal Tutor', 'Swamp', 'The Rack', "Bontu\\'s Last Reckoning", 'Shrieking Affliction'}

It appears one of them is playing a green deck, using elves and green lands, while the other one combines milling and discarding, with cards like Liliana and Inquisition of Kozilek.

Here’s the result for the previous algorithm for all of the clusters, see if you can tell which archetype each belongs to. This also tells us about the distribution of the meta back when I got the data.

The same analysis on a more recent Dataset may even be useful in and of itself, if you’re into competitive tournaments.

Particular Cards

Three cards stood out to me in those lists: “Mutavault“, “Inquisition of Kozilek” and “Llanowar Elves“.

I wonder if they’re more common in other clusters? I didn’t really know Mutavault was so common in competitive play, and I think Llanowar Elves appearing on a deck tells us some stuff about it.

Well, that’s a one-trick pony. Clearly one of the things characterizing Cluster number 4 is the presence of Llanowar Elves.
With 35 decks using it out of 777, Mutavault appears in 5 out of 8 clusters. Not bad, but not as unexpected a diversity from such a versatile card.
This one appears in half of the clusters, but it’s three times as likely to appear on the first one.

As always, you can generate these graphs for any of the cards, or ask me if you’re interested in a particular one.

Versatile Cards

Lastly, I’ll define a new category of card: a card’s versatility will mean how many different clusters contain at least a deck that uses it.

I agree that that definition, admittedly, could be refined a bit more. For instance, by counting apparitions instead of just whether the card is in a deck or not.

However, the results this way are coherent enough, so I don’t think it needs any more tweaking. Here’s a list with the top 10 most versatile cards, after filtering Basic Lands out.

  1. Dismember
  2. Ghost Quarter
  3. Field of Ruin
  4. Cavern of Souls
  5. Thoughtseize
  6. Mutavault
  7. Sacred Foundry
  8. Stomping Ground
  9. Engineered Explosives
  10. Botanical Sanctum

They’re pretty much the ones you’d expect. However, I’m surprised Lightning Bolt didn’t make the cut. I wasn’t sure whether non-Basic Lands should count, but I left them in in the end.

The fact that I have no idea which card “Engineered Explosives” is, proves I’m out of touch with the state-of-the-meta, and maybe I should be playing more, but that’s beside the point.

Conclusion

As we expected, Magic: The Gathering can be a fun source of Data, and I think we have all learned a bit by seeing all this.

Personally, I’m still surprised a bit of glorified linear algebra could learn all about the meta of competitive play.

I’d be even more surprised if it learned about archetypes in casual play, where decks are more diverse, though my intuition tells me with enough clusters, even that should be properly characterized.

What do you think? Would you have liked to see any other bits of information? Were you expecting the algorithm to perform well?

And finally, what other domains do you think are fit for proper Statistical Analysis, particularly using other Unsupervised Machine Learning Techniques?

Please let me know any or all of that in the comments!

Follow me on Medium or Twitter for more Articles, tutorials and analysis. Please consider supporting my website and my writing habit with a contribution.


For more tutorials,
subscribe to my newsletter!

3 thoughts on “Magic: The Gathering Meets Data Science

  1. Hey, thanks for the follow-up!

    My knowledge of MtG is rusty, but I’d really like to see how the clusters break down by card colour, and maybe by proportion of different types of cards (land, creatures, spells).

    1. That would be cool! The Dataset didn’t really have metadata about the cards themselves, but I can work something out. I’ll see what I can do!

Leave a Reply

Your email address will not be published. Required fields are marked *