AI Generated Pokemon Sprites with GPT-2

by Matthew Rayfield on November 11th 2020

Text Generation for Images

A few months back I started messing around with gpt-2-simple. Inspired by a few of the more unusual uses like chess playing and MIDI generation I wanted to find something funky I could do with it myself. Image generation quickly came to mind. I figured if I converted a bunch of small images (in this case pokemon) into text files, re-trained GPT-2 with them, then converted the output back into images I would at least get something. To my surprise the results were actually pretty good !

Text Format

After a few attempts that resulted in a sea of noisy pixels I settled on the following format:

32d ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ! ` > ` > > > > > > > > ! > > ! ! > > > > > ! > > ! > > > > ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
33d ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ > > ! > > > ! ! > > > ! > > ! ` ` ! > > > ! ! > ! > > > > > ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
34d ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ! ` ! ` ! > > ! ! > > > ! > ! ! ! ` ` ! > > ! ! ! ! ! > > > > ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
35d ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ! ` ! ` U > > > > > > > > > ` U ! ` ` ! > > ! > > ! ! > ! > > ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
36d ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ! > ! ` U ! > > > > > > > ! ` U ! ` ` ! > > ! > > > > > ! > ! > ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

You could probably already tell, but the above text is Bulbasaur's eyes. The format is simply line number, direction of image (up or down), and finally each pixel in that row represented as single characters seperated by spaces. The pixel color to character transformation is quite simple and can be seen here.

Results

Training GPT-2 with a bunch of text is super easy. The trickiest part was writing code to manage those results into something usable. My "generation" code does this. In this way it's not at all generating, but instead normalizing output and feeding input to GPT-2.

First it gets a few lines to start off with. Keeps the good lines, throws out the rest. Then it feeds in a few lines and has GPT-2 carry on from there. Then it repeats these steps until all 64 lines are filled, converts to an image, and saves the results.

After quite a few hours of training and generating (on Google Colab) we get stuff like this:

6 hand picked pokes

The above examples are,of course, hand picked. Most of the output sprites look pretty odd, but they do usually capture the Pokemon essence. You can see >3000 more generated sprites here.

Illustrated

With all the interesting quirks of the generated sprites, I was quite keen on seeing how they would look if "redrawn" as illustrations. I was fortunate to come across the work of Rachel Briggs who has made a name for herself drawing other Pokemon that don't officially exist like the Space World '97 Pokemon. I reached out to her and thankfully she was up to the task of drawing these monsters, and she absolutely killed it !

I love these and I couldn't be happier with how they turned out ! Rachel did an amazing job pulling out details from just a few pixels.

The End

So will you be seeing Pokemon sprites generated with this technique in the next game ? No. Because they use 3D now.

All code for this project is available at https://github.com/MatthewRayfield/pokemon-gpt-2. There you will also find links to Google Colab notebooks where you can perform all training and image generation with a free GPU.

NOTE: I started this project in APRIL 2020 but did not finish it until NOVEMBER 2020. Not that it was a lot of work, I'm just slow. BUT in the mean time OpenAI themselves (creators of GPT-2) developed their own (i'm sure much better) technique for image generation with GPT. More info about that can be found here: https://openai.com/blog/image-gpt/.

If you have questions, comments, or simply want to keep up with my latest tinkerings, check me out on Twitter: @MatthewRayfield. Or subscribe to my almost-never-bothered email list here.

back to articles

back to home