by Steve Lopez

Egad! Who dreamed up the name of this function? ("This function" rhymes with "dysfunction" -- I'm seeing hidden patterns already). "Cannibalize database" is a very useful function with a very unsavory name. This week, we'll chow down on some details (and I'll try to resist bad jokes linking chess to cannibalism, such as mentioning the "Jan Hein Donner Party", etc.).

The purpose of "cannibalize database" is to see if (and how many) games from a new game collection you've downloaded or purchased already exist in your master reference database. In a way, it's a means of killing doubles before they ever get into your database.

The first step in the process is to designate a reference database. This is typically going to be your master game collection or largest database. I have Mega Database 2000 listed as my reference database. To designate a database as the reference database, just right-click on that database's icon, select "Properties" from the menu, and then put a check in the box next to "Reference database".

Next you select the database to be "cannibalized". Left-click once on that database's icon to highlight it with the black box, then go to the Technical menu and select "Cannibalize database". You'll see a dialogue box appear that asks you to confirm the identity of the reference database as well as that of the "victim" database (the term "victim" is actually used here; whoever writes the dialogue box texts is one sick pup). Once you've clicked "OK", the Windows file select dialogue appears, allowing you to create a new database into which the program will copy the games that are not already in the reference database. You can name this database whatever you like. "Cannibal.cbh" works for me. I was going to use "Lechter.cbh" until I realized I didn't know how to spell it.

Once you name the database and click "OK", the process starts immediately. You'll see a status bar appear in the middle of the screen:

The program will chew on the database for a while and eventually spit out a list of games from the cannibalized database which are not already found in the reference database. These are automatically copied into the database you created earlier in the process ("Cannibal.cbh" or whatever you named it); the icon for this database appears in the Database window as a database called "Cannibal result". It has a picture of a wrench on it (I'd have gone for a knife and fork -- I mean, as long as we're going to call this function "cannibalizing" a database, why not crank up the "gross-out factor" to 11 while we're at it?).

You can drag and drop the whole "Cannibal result" database into your reference database if you like, or you can do some more work on it. For example, if you do an immediate drag and drop, you may not get all the games from the same tournament listed together in the database (the games of the tournament may be scattered around the database). If getting those games together is important to you (for generating crosstables, for example), follow the recipe, errr, I mean instructions given on page 80 of the ChessBase 7 manual (in section 3.5.7). Otherwise you can safely disregard them.

As for the technical specs, I wouldn't try this function with less than 64 MB RAM. I tried it on a 32 MB machine last week and got an error message saying there wasn't enough RAM. Unfortunately, my 64 MB machine didn't have a large enough hard drive to hold my databases. My solution? I bought a new box, with 64 MB RAM, a faster processor, and a bigger hard disk. Now "Cannibalize database" runs wonderfully well.

How long does the process take? That's a tough question to accurately answer, but I can safely say that this is not a trivial operation. In general, the smaller the "victim" database, the faster the process (just think of that old Kliban cartoon, "Never Eat Anything Bigger Than Your Head"). The first time I ran it, I checked a 55,118 game database against Mega Database 2000 (which has 1.3 million games). The process took about 20 minutes from beginning to end using 64 MB RAM on a Pentium III 800 MHz machine. I subsequently cannibalized a few more large databases the same day and the process was even quicker (the Mega 2000 info was already cached). However, I also ran the function with a "victim" database of 200,000 extremely low quality games (some with years missing, for example) and a reference database of around 300,000 games; the process took over an hour to complete. This is what leads me to believe that the bigger the "victim", the more time CB7 takes to cannibalize it (interesting imagery -- let's not go there, OK?). The quality of the games may have some bearing here, too, so take all of this with a pinch of salt.

So there's nothing really mysterious about the process; it's a pretty simple and straightforward way to "pre-screen" a new database for duplicate games. Now, if you'll excuse me, I'm going to dig into a bucket of chicken and watch that Ken Burns documentary about the Donner Party. Until next week, bon appetit and have fun!

You can e-mail me with your comments, suggestions, and analysis for Electronic T-Notes. If you love gambits and sacrificial play, stop by my Chess Kamikaze Home Page and the Yahoo Chess Kamikazes Club.