After my efforts to create a nice database for the class to use with the IMDB database (written about here) fell short due to the size of the database being unrealistic for student machines and the fact that I did not love the structure that was created from the script I used. I decided to go another route with sample databases using a “classicmodels” database I was able to find online. This database had a structure and data that I felt was more adequate for learning, but there was no real fun involved.

In my nightly ventures, I was perusing the web for other possible databases when I came across the IGDb website. IGDb stands for Internet Games Database. This is a data set that is managed by amazon! This could be good!

The site did not offer any kind of database dump, but it did have APIs I could hit to pull down records with a variety of endpoints (as documented here). This got very interesting, very fast!

I had to sign up for a twitch account and then go to the twitch developers portal (along with a few other small steps related to 2 factor authentication) to generate an API key, but then I was off to the races!

I opened up the postman desktop app, used it to generate an access token and then just started firing away at different endpoints to see what was coming back.

Man, was I having fun just exploring! As I was exploring, I was thinking about how fun it could be to create an application using the APIs (I made note of my creds in this article for anyone who wants to explore without the setup). The options seemed endless!…but my current task was about data…specifically relational data, so I has some work to do. I needed to understand the relationships between the endpoints, the objects they were returning and conceive a way to split it up into a normalized database (note: I think this could also be used for NoSQL database work if we should ever explore that).

Anyhow, I had thrown the gauntlet before myself, and I had the data to do it, so there was really no excuse to not find a way to arrive at the database I had been dreaming about from the start. After studying the data available, I created a very basic draft of what a relational database might look like. Here’s what I came up with:

Obviously, there was a lot of important data missing, but I knew I could fill that in as I actually created the database.

The next step I took was to actually pull the data and store it locally so I could parse it and insert it as needed without blowing up my network over and over and rate limiting myself. So, I used a slightly tweaked php cURL code that was created by postman from one of my request to start building a script.

<?php
ini_set('max_execution_time', '0'); 

pullData("artworks");
pullData("character_mug_shots");
pullData("characters");
pullData("covers");
pullData("game_modes");
pullData("game_videos");
pullData("games");
pullData("genres");
pullData("platforms");
pullData("screenshots");
pullData("websites");
function pullData($endpoint){
$keepgoing = true;
$i = 0;
while($keepgoing){
	print $i . "\n";
$offset = $i * 500;
$curl = curl_init();
$fp = fopen("{$endpoint}.json", "a");
$ef =  fopen("errors.json", "a");
curl_setopt_array($curl, array(
  CURLOPT_URL => "https://api.igdb.com/v4/{$endpoint}",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 60,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => "POST",
  CURLOPT_POSTFIELDS => "fields *; sort id; limit 500; offset {$offset};",
  CURLOPT_HTTPHEADER => array(
    "Accept: */*",
    "Accept-Encoding: gzip, deflate",
    "Authorization: Bearer vvg3uxybyoywi5fc8fr49x0frt386v",
    "Cache-Control: no-cache",
    "Client-ID: 9vo7274t10jhnionzqn5w9zwbpajka",
    "Connection: keep-alive",
    //"Content-Length: 42",
    "Content-Type: text/plain"
 
  ),
));
 
$data = curl_exec($curl);
if(curl_error($curl)) {
    fwrite($ef, "index: {$i} \noffset: {($i*500)} \nerror: {curl_error($curl)}\n");
	$keepgoing = false;
}else{
	$jsonarr = json_decode($data);
	foreach($jsonarr as &$rec){
		fwrite($fp, json_encode($rec));
		fwrite($fp, "\n");
	}
    if(empty($jsonarr)){
		$keepgoing = false;
	}
}
curl_close($curl);
fclose($fp);
fclose($ef);
$i+=1;
sleep(2);   
}
}
?>

It’s not the prettiest script in the world, but it was a nice utility for getting me the files. Basically, I just wrote a function that takes an endpoint, creates a .json file with the name of the endpoint and then writes the records to that file. The api had a limit of 500 records per request, so I just had to update the offset for each iteration of the loop. It took some time to run, but by the end, I had all the data I needed to start building and populating the database. For things like this, I don’t actually love rushing through it because it makes it harder to catch and fix errors as they occur.

I returned to my crude ER diagram and then popped open the workbench to start writing the create statements for the tables. As I went, before I created each table, I looked at the data from the related endpoints again and determined what fields, datatypes, relationships and keys the table should have.

I ended up with this:

 /*order of inserts using json files from api*/
/*
genres
platforms
game_modes
games
games -> game_genre
games -> game_platform
games -> game_game_mode
+games -> game_similar
game_video
game_covers
characters
characters -> game_characters
screenshots
websites
*/
create database IGDb;
use IGDb;

drop table if exists `igdb`.`categories`;
CREATE TABLE `igdb`.`categories` (
  `category_id` INT NOT NULL,
  `description` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`category_id`));
/*note: The categories field in IGDb is actually an enumerated type. Compensating here because I don't want to use enums in this*/
Insert into `igdb`.`categories` (description, category_id) values ("main_game",	0);
Insert into `igdb`.`categories` (description, category_id) values ("dlc_addon",	1);
Insert into `igdb`.`categories` (description, category_id) values ("expansion",	2);
Insert into `igdb`.`categories` (description, category_id) values ("bundle",	3);
Insert into `igdb`.`categories` (description, category_id) values ("standalone_expansion",	4);
Insert into `igdb`.`categories` (description, category_id) values ("mod",	5);
Insert into `igdb`.`categories` (description, category_id) values ("episode",	6);
Insert into `igdb`.`categories` (description, category_id) values ("season",	7);
Insert into `igdb`.`categories` (description, category_id) values ("remake",	8);
Insert into `igdb`.`categories` (description, category_id) values ("remaster",	9);
Insert into `igdb`.`categories` (description, category_id) values ("expanded_game",	10);
Insert into `igdb`.`categories` (description, category_id) values ("port",	11);
Insert into `igdb`.`categories` (description, category_id) values ("fork",	12);
Insert into `igdb`.`categories` (description, category_id) values ("pack",	13);
Insert into `igdb`.`categories` (description, category_id) values ("update",	14);

drop table if exists `igdb`.`ratings`;
CREATE TABLE `igdb`.`ratings` (
  `id` INT NOT NULL,
  `description` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`));
  
INSERT INTO `igdb`.`ratings` (description, id) values ("Three",1);
INSERT INTO `igdb`.`ratings` (description, id) values ("Seven",2);
INSERT INTO `igdb`.`ratings` (description, id) values ("Twelve",3);
INSERT INTO `igdb`.`ratings` (description, id) values ("Sixteen",4);
INSERT INTO `igdb`.`ratings` (description, id) values ("Eighteen",5);
INSERT INTO `igdb`.`ratings` (description, id) values ("RP",6);
INSERT INTO `igdb`.`ratings` (description, id) values ("EC",7);
INSERT INTO `igdb`.`ratings` (description, id) values ("E",8);
INSERT INTO `igdb`.`ratings` (description, id) values ("E10",9);
INSERT INTO `igdb`.`ratings` (description, id) values ("T",10);
INSERT INTO `igdb`.`ratings` (description, id) values ("M",11);
INSERT INTO `igdb`.`ratings` (description, id) values ("AO",12);
INSERT INTO `igdb`.`ratings` (description, id) values ("CERO_A",13);
INSERT INTO `igdb`.`ratings` (description, id) values ("CERO_B",14);
INSERT INTO `igdb`.`ratings` (description, id) values ("CERO_C",15);
INSERT INTO `igdb`.`ratings` (description, id) values ("CERO_D",16);
INSERT INTO `igdb`.`ratings` (description, id) values ("CERO_Z",17);
INSERT INTO `igdb`.`ratings` (description, id) values ("USK_0",18);
INSERT INTO `igdb`.`ratings` (description, id) values ("USK_6",19);
INSERT INTO `igdb`.`ratings` (description, id) values ("USK_12",20);
INSERT INTO `igdb`.`ratings` (description, id) values ("USK_16",21);
INSERT INTO `igdb`.`ratings` (description, id) values ("USK_18",22);
INSERT INTO `igdb`.`ratings` (description, id) values ("GRAC_ALL",23);
INSERT INTO `igdb`.`ratings` (description, id) values ("GRAC_Twelve",24);
INSERT INTO `igdb`.`ratings` (description, id) values ("GRAC_Fifteen",25);
INSERT INTO `igdb`.`ratings` (description, id) values ("GRAC_Eighteen",26);
INSERT INTO `igdb`.`ratings` (description, id) values ("GRAC_TESTING",27);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_L",28);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_Ten",29);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_Twelve",30);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_Fourteen",31);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_Sixteen",32);
INSERT INTO `igdb`.`ratings` (description, id) values ("CLASS_IND_Eighteen",33);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_G",34);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_PG",35);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_M",36);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_MA15",37);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_R18",38);
INSERT INTO `igdb`.`ratings` (description, id) values ("ACB_RC",39);

drop table if exists `igdb`.`rating_issuer`;
CREATE TABLE `igdb`.`rating_issuer` (
  `id` INT NOT NULL,
  `description` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`));
Insert into `igdb`.`rating_issuer` (description, id) values("ESRB",1);
Insert into `igdb`.`rating_issuer` (description, id) values("PEGI",2);
Insert into `igdb`.`rating_issuer` (description, id) values("CERO",3);
Insert into `igdb`.`rating_issuer` (description, id) values("USK",4);
Insert into `igdb`.`rating_issuer` (description, id) values("GRAC",5);
Insert into `igdb`.`rating_issuer` (description, id) values("CLASS_IND",6);
Insert into `igdb`.`rating_issuer` (description, id) values("ACB",7);

drop table if exists `igdb`.`age_ratings`;
CREATE TABLE `igdb`.`age_ratings` (
  `id` INT NOT NULL,
  `rating_issuer_id` INT NULL,
  `rating_id` INT NULL,
  PRIMARY KEY (`id`),
  INDEX `fk_age_ratings_issuer_idx` (`rating_issuer_id` ASC),
  INDEX `fk_age_ratings_rating_idx` (`rating_id` ASC),
  CONSTRAINT `fk_age_ratings_issuer`
    FOREIGN KEY (`rating_issuer_id`)
    REFERENCES `igdb`.`rating_issuer` (`id`),
  CONSTRAINT `fk_age_ratings_rating`
    FOREIGN KEY (`rating_id`)
    REFERENCES `igdb`.`ratings` (`id`));

drop table if exists `igdb`.`game_age_ratings`;
CREATE TABLE `igdb`.`game_age_ratings` (
  `game_id` INT NOT NULL,
  `age_rating_id` INT NOT NULL,
  PRIMARY KEY (`game_id`, `age_rating_id`),
  INDEX `fk_game_age_ratings_age_idx` (`age_rating_id` ASC),
  CONSTRAINT `fk_game_age_ratings_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
  ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_age_ratings_age`
    FOREIGN KEY (`age_rating_id`)
    REFERENCES `igdb`.`age_ratings` (`id`));
    
drop table if exists  `igdb`.`games`;
CREATE TABLE `igdb`.`games` (
  `game_id` INT NOT NULL,
  `name` VARCHAR(200) NOT NULL,
  `category` INT NULL,
  `summary` VARCHAR(10000) NULL,
  PRIMARY KEY (`game_id`),
  INDEX `fk_game_categories_idx` (`category` ASC),
  CONSTRAINT `fk_game_categories`
    FOREIGN KEY (`category`)
    REFERENCES `igdb`.`categories` (`category_id`));

drop table if exists `igdb`.`game_covers`;
CREATE TABLE `igdb`.`game_covers` (
  `cover_id` INT NOT NULL,
  `game_id` INT NULL,
  `height` INT NULL,
  `width` INT NULL,
  `url` VARCHAR(200) NULL,
  PRIMARY KEY (`cover_id`),
  INDEX `fk_game_covers_idx` (`game_id` ASC),
  CONSTRAINT `fk_game_covers`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE);

drop table if exists `igdb`.`game_modes`;  
CREATE TABLE `igdb`.`game_modes` (
  `game_mode_id` INT NOT NULL,
  `name` VARCHAR(45) NOT NULL,
  `url` VARCHAR(200) NULL,
  PRIMARY KEY (`game_mode_id`));  
  
drop table if exists `igdb`.`genres`;  
CREATE TABLE `igdb`.`genres` (
  `genre_id` INT NOT NULL,
  `name` VARCHAR(45) NOT NULL,
  `url` VARCHAR(200) NULL,
  PRIMARY KEY (`genre_id`));   
  
drop table if exists `igdb`.`platforms`; 
CREATE TABLE `igdb`.`platforms` (
  `platform_id` INT NOT NULL,
  `name` VARCHAR(100) NOT NULL,
  `alternative_name` VARCHAR(200) NULL,
  `url` VARCHAR(200) NULL,
  PRIMARY KEY (`platform_id`));
  
drop table if exists `igdb`.`game_video`;
CREATE TABLE `igdb`.`game_video` (
  `video_id` VARCHAR(200) NOT NULL,
  `game_id` INT NOT NULL,
  `name` VARCHAR(200) NULL,
  PRIMARY KEY (`video_id`, `game_id`),
  INDEX `fk_game_video_idx` (`game_id` ASC),
  CONSTRAINT `fk_game_video`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE);  

drop table if exists `igdb`.`websites`;
CREATE TABLE `igdb`.`websites` (
  `website_id` INT NOT NULL,
  `game_id` INT NOT NULL,
  `trusted` TINYINT NULL,
  `url` VARCHAR(200) NOT NULL,
  PRIMARY KEY (`website_id`, `game_id`),
  INDEX `fk_game_video_idx` (`game_id` ASC),
  CONSTRAINT `fk_game_website`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE);

drop table if exists `igdb`.`screenshots`;
CREATE TABLE `igdb`.`screenshots` (
  `screenshot_id` INT NOT NULL,
  `game_id` INT NOT NULL,
  `height` INT NULL,
  `width` INT NULL,
  `url` VARCHAR(200) NOT NULL,
  `animated` TINYINT NULL,
  PRIMARY KEY (`screenshot_id`),
  INDEX `fk_screenshots_game_idx` (`game_id` ASC),
  CONSTRAINT `fk_screenshots_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE);  
    
drop table if exists `igdb`.`characters`;
CREATE TABLE `igdb`.`characters` (
  `character_id` INT NOT NULL,
  `name` VARCHAR(200) NOT NULL,
  `url` VARCHAR(200) NULL,
  PRIMARY KEY (`character_id`));
  
drop table if exists `igdb`.`game_game_mode`;
CREATE TABLE `igdb`.`game_game_mode` (
  `game_id` INT NOT NULL,
  `game_mode_id` INT NOT NULL,
  PRIMARY KEY (`game_id`, `game_mode_id`),
  INDEX `fk_game_game_mode_gm_idx` (`game_mode_id` ASC),
  CONSTRAINT `fk_game_game_mode_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_game_mode_gm`
    FOREIGN KEY (`game_mode_id`)
    REFERENCES `igdb`.`game_modes` (`game_mode_id`));
    
drop table if exists `igdb`.`game_characters`;
CREATE TABLE `igdb`.`game_characters` (
  `game_id` INT NOT NULL,
  `characters_id` INT NOT NULL,
  PRIMARY KEY (`game_id`, `characters_id`),
  INDEX `fk_game_characters_character_idx` (`characters_id` ASC),
  CONSTRAINT `fk_game_characters_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_characters_character`
    FOREIGN KEY (`characters_id`)
    REFERENCES `igdb`.`characters` (`character_id`));

drop table if exists  `igdb`.`game_genre`;    
CREATE TABLE `igdb`.`game_genre` (
  `game_id` INT NOT NULL,
  `genre_id` INT NOT NULL,
  PRIMARY KEY (`game_id`, `genre_id`),
  INDEX `fk_game_genre_genre_idx` (`genre_id` ASC),
  CONSTRAINT `fk_game_genre_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_genre_genre`
    FOREIGN KEY (`genre_id`)
    REFERENCES `igdb`.`genres` (`genre_id`));

drop table if exists `igdb`.`game_platform`;    
CREATE TABLE `game_platform` (
  `game_id` int(11) NOT NULL,
  `platform_id` int(11) NOT NULL,
  PRIMARY KEY (`game_id`,`platform_id`),
  KEY `fk_game_platform_platform_idx` (`platform_id`),
  CONSTRAINT `fk_game_platform_game` FOREIGN KEY (`game_id`) REFERENCES `games` (`game_id`)
    ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_platform_platform` FOREIGN KEY (`platform_id`) REFERENCES `platforms` (`platform_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

drop table if exists `igdb`.`game_similar`;  
CREATE TABLE `igdb`.`game_similar` (
  `game_id` INT NOT NULL,
  `similar_game_id` INT NOT NULL,
  PRIMARY KEY (`game_id`, `similar_game_id`),
  INDEX `fk_game_similar_similar_idx` (`similar_game_id` ASC),
  CONSTRAINT `fk_game_similar_game`
    FOREIGN KEY (`game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE,
  CONSTRAINT `fk_game_similar_similar`
    FOREIGN KEY (`similar_game_id`)
    REFERENCES `igdb`.`games` (`game_id`)
      ON DELETE CASCADE
  ON UPDATE CASCADE);

This is what the database ended up looking like:

After creating the tables, the last step was to start populating them. I wrote sample inserts in the workbench to start from and then returned to the php side of things to start reading the files and writing the data.

Line by line, I had a php script read each of the json files, convert the json string to a php object and then run an insert on the database. I followed a specific order for the inserts to avoid any foreign key violations. I could have turned foreign key checks off, but I did not want to introduce any kind of broken relationships, preferring instead to let the possibility of a failed insert to occur should a foreign key violation occur due to a previously failed insert. I could have also had this process go much faster if I put multiple inserts per sql execution, but again, I did not want one record to screw up a batch and lead me wondering which of the records caused the issue. Slow and steady was the approach here.

Here’s a sample of the php

function buildGames(){
	global $pdo;
	global $errorlog;
	$handle = fopen("games.json", "r");
	if ($handle) {
		while (($line = fgets($handle)) !== false) {
			// process the line read.
			try{
				$game = json_decode($line);
				$sql = "INSERT INTO `igdb`.`games` (`game_id`, `name`, `category`, `summary`) VALUES (?,?,?,?) ON DUPLICATE KEY UPDATE game_id = game_id";
				$pdo->prepare($sql)->execute([$game->id, $game->name, $game->category, $game->summary]);
			}catch(Exception $e) {
				fwrite($errorlog, "buildGames() {$line} error: {$e->getMessage()}\n");
			}
			
		}
	
		fclose($handle);
	}
}

These scripts took a long time to run. Throughout my days (yes, days), I would frequently check back to see that they were still running and make sure the record counts were still growing. The tables that took the longest were the intersection tables (Eg. game_genre, game_platform, game_similar) because there were about 250,000 games in the db, and for each of those, there were several genres, platforms, and similar titles. Frankly, I probably should have written the php to write the insert statement to file (like a db dump) and then just ran the file. That would have cut back on db transactions and probably made the whole process way faster, but this was a iteration 0 effort. By the time I realized how long it was going to take, I really didn’t want to refactor things midway through the process. I will probably try something like that next time I pull it all.

Anyhow, this was a really fun project I took on, hopefully it will pay off for the students. As I said in the IMDb database article, the hardest thing about exploring SQL is sharing the value and interest of the data. I spent the last week on this because I know most of my students love video games. Perhaps with content like this, they will find joy “swimming” through the data in ways that I always have.

Note: For images within the database, the api returned urls like the following: “//images.igdb.com/igdb/image/upload/t_thumb/co1uhl.jpg” – these are thumbnail images. To change the image to be something of a larger size, just replace the “thumb” with one of the following names (eg. “//images.igdb.com/igdb/image/upload/t_screenshot_huge/co1uhl.jpg“)

NameSizeExtra
cover_small90 x 128Fit
screenshot_med569 x 320Lfill, Center gravity
cover_big264 x 374Fit
logo_med284 x 160Fit
screenshot_big889 x 500Lfill, Center gravity
screenshot_huge1280 x 720Lfill, Center gravity
thumb90 x 90Thumb, Center gravity
micro35 x 35Thumb, Center gravity
720p1280 x 720Fit, Center gravity
1080p1920 x 1080Fit, Center gravity

Here are the results of the work, for import into a mysql database: Includes single file import to get all tables at once, and a project folder for individual table imports.

Leave a Reply

Your email address will not be published. Required fields are marked *