ELECTRONIC T-NOTES


CHESSBASE USA'S WEEKLY ON-LINE NEWSLETTER


FOR THE WEEK OF NOVEMBER 4, 2001


THE USE OF DATABASES -- PART THREE

by Steve Lopez

The single biggest advantage to using an electronic database is speed. Two weeks ago in ETN, I mentioned a Doug Root game from a back issue of the Chess Informant. He'd played 4.Qg4 in the French Advance (one of my favorite gambit lines) in a US tournament back in the 1970's. If I wanted to dig the game out of the printed version of the Informants, I'd easily spend a half-hour or more looking for it, pulling books off the shelf one by one, riffling the pages to the C02 section, and visually scanning the pages in search of the game. Using ChessBase, I can do a search in my Informant database and pull the game up in just over one second on my PIII machine.

That's actually not too tough a search to do manually, considering that I know the name of the White player, the approximate time period when the game was played, and the ECO code for the opening in question. But if I didn't remember the player name or time period and only knew that one 4.Qg4 C02 game was played in some Informant back issue, I'd spend an hour or two hunting it down.

That's still not too labor-intensive (relatively speaking) compared to other searches one might want to do. Let's say that you want to find all of the games in which a particular maneuver was played, or in which White has four pawns against Black's three in the endgame. Such searches are impossible to do manually using printed books -- you'd have to play through every move of every game to find the information you seek. That's simply not possible using information stored in a paper format. But it's a snap with an electronic database and you can often do these searches in less time than it would take you to walk to the bookshelf, thumb through a single volume, find a game, and set up a chessboard to start playing through it.

Let's look at an example of such a search (and learn a few useful facts about database searches along the way). We'll fire up ChessBase 8 and use the Big or Mega Database 2001 to see what turns up.

For those coming in late, Big Database 2001 and Mega Database 2001 contain the same number of games: 1,687,181. The difference is that the games in Big 2001 are entirely unannotated, while Mega 2001 contains between 40,000 and 50,000 annotated games. Either way, nobody in their right mind expects anyone to play through all of those games. I'm sometimes asked "Why would I need all of those games?" That's exactly the question that we'll address in this article. The answer is that a large database is an excellent reference tool for finding illustrative games which we can use to improve our own chess play and find answers to those burning chess questions we all ask (as discussed two weeks ago in ETN). You don't play through all of the games in a huge reference database. In fact, I'd be surprised if anyone plays through more than a few thousand of them in the course of their chess career. You use the search functions of ChessBase to find the games that matter to you.

Here's a common example, suggested by conversations I've had with several users down through the years. A common motif that occurs in attacking the castled King position is the sacrifice of a Bishop for a pawn on the h-file (either h7 or h2, depending on which player is tossing his Bishop overboard). The Bishop sacs itself for the h-pawn, blows open the opposing pawn structure, and uncovers the enemy King (hopefully launching a furious attack in the process). There's more to the attack than just pitching the Bishop back into the box (otherwise we'd all be doing it all the time) -- the followup is crucial. So how do we find games in the database to illustrate the followup to the Bishop sac?

This is really one of the easiest maneuver searches to execute in ChessBase. Bring up the search mask for the Big/Mega Database (I'll be using Mega 2001 in my examples and will refer to Mega from now on -- but be aware that you can use Big 2001 as well and get the same games to come up). Right-click on the Mega Database's icon and select "Search" from the popup menu. When the Search mask appears, click the Manoeuvers tab.


We'll start by finding all of the games in which White sacked his Bishop for a pawn on h7 and gave check in the process. You start defining a maneuver by selecting the color of the moving side, so click the radio button next to "W" (for White). In the window to the left of it, you'll see a "W" appear followed by a series of question marks; this window shows the maneuver(s) you've defined. Next click on the pulldown window immediately below the "W" radio button to get a list of piece abbreviations; this part of the dialogue lets you select the moving piece. Select "B" for Bishop. The next white box to the right lets you type in the starting square for the moving piece. We really don't care which square the Bishop starts from in this maneuver, so we'll type a pair of question marks into this box.

The next step is to define the Bishop's ending square, so we type "h7" (without the quotes) into the next white box to the right. The next box is a small square; selecting this box means that a capture is required on the destination square. The next box lets us define the piece to be captured. Since we want games in which White sacs his Bishop for a pawn on h7, we select "P" from the pulldown menu.

So now we've asked ChessBase 8 to list all games in which a White Bishop captures a Black pawn on h7. But a key ingredient is still missing -- the check of the Black King. This is easily designated by checking the box next to "Check". If you've successfully followed the steps so far, your dialogue should look like this:


Note the code in the large white box: "wB??h7xP+". There's nothing mysterious about these symbols; in fact, it's just a modified form of standard chess notation. Reading these abbreviations and symbols from left to right, they simply mean "White's Bishop (starting on any square) moves to h7, capturing a pawn with check". Simple as that. Note that we don't check the "Sacrifice" box. Putting a check in this box invokes a special additional algorithm that looks for sacrifices. Using this formula will cause the search to take a lot longer and, besides, it's really not necessary to check it here. In the criteria we've provided, the Bishop is being sacrificed almost by definition (the King will almost certainly recapture, unless the Bishop guarded by another piece). So leave the "Sacrifice" box unchecked.

But we're still not finished. Performing a maneuver search requires ChessBase to look at every position in every game in the database to see if the designated maneuver occurs. While a computer can do this frightening quickly, it still will take a bit of time to accomplish. We can cut ol' ChessBase some slack by limiting the search to a certain span of moves (and thus cutting down the amount of work the program needs to do). Have a look at the following portion of the Manoeuver dialogue:


This is where we limit the search to a certain span of moves and the time period in which the maneuver occurs. In the illustration, I've chosen to place "10" in the "First" box and "35" in the "Last" box. This means that ChessBase will only look at positions that occurred between moves ten and thirty-five in each game of the database, ignoring all other moves of the games. So our search criteria can now be defined (in plain English) as "all games in which a White Bishop (starting on any square) moved to h7 and captured a pawn with check between moves ten and thirty-five". The "Length" value isn't a factor here, as we're looking at a single move maneuver (but keep this part of the dialogue in mind, as we'll come back to it later).

Now that we have our values entered, we can click "OK" and start the search. The Search Results window opens and displays the games found in the search. The lower right corner of this window is an information pane telling you how much of the database has been searched and approximately how much longer the search will take. Obviously, times will vary from machine to machine with the processor speed being the main determinant. On a Pentium III 800 MHz machine, the search should take about two minutes, considerably less time than it would take me to haul my butt out of my desk chair and set up a chess set on the card table in my office.

When the search is completed, scroll the search results list down to the bottom. If you're using Big or Mega 2001 and you set up the parameters as I did, you should see a total of 12,657 games that contain the required Bishop sac. That's a whopping great pile of games. Even though you can just single-click on a game in the list and play it on the chessboard provided in the search results window, you'd only get about a hundred or so viewed before your eyes began to bleed. So how do we cut down further on the material discovered?

I don't want to get too technical here, but I have to drag Boolean algebra (kicking and screaming) into the discussion. A search in ChessBase 8 is an "and" search, not an "or" search. In other words, if you do a search and provide a player name, a year, and a maneuver, what you'd get back from the program would be all the games in which that player performed that particular maneuver in the given year. You would not get all the games of that player, plus all the games from that year, plus all the games in which the maneuver was played. So, using the following criteria:

...you'd get all the games in which that player participated AND played that maneuver AND played it in the year you specified, as opposed to all games in which that player was a participant OR in which that maneuver was played OR that were played in the specified year. See the difference? The OR search would yield a huge amount of unrelated material. The AND search (the way ChessBase does it) gives less material, but all of it will be related.

If you're unusually bright (or unnaturally precognitive), you already see where this is headed: the more stuff you put in the Search mask, the fewer games you find in the search (and as an extra bonus, when some brainy mug starts babbling about "Boolean AND searches" you'll know what he's prattling about, thanks to me. And Math was always my worst subject. I'm a History guy. Go figure). So if you find that a search yields too much information, you can cut the material down to a more manageable level by adding criteria to your search.

Let's look at this in action. 12,000+ is a few too many games to comfortably study in an organized manner, so let's get even more specific. Let's add a second move to our maneuver. Click the "Insert/New" button, which tells ChessBase that you want to add a new move to the maneuver. You'll see a duplicate of the previous maneuver appear highlighted in the white box -- don't worry, since we're about to edit this entry. This time, we'll start by clicking the radio button next to "B", showing that it's Black's turn to move. Moving to the right, select "K" (for "King") from the pulldown list. In the next box type in "g8" (instead of question marks); since the White maneuver was Bxh7+, the Black King must (by definition) be on g8. Type "h7" in the next box. Put a check in the small square and then select "B" (for "Bishop") from the pulldown list in the next box to the right. Then uncheck the box next to the work "Check". Your final result should read: "bKg8h7xB" (meaning that the Black King on g8 moves to h7 and captures the Bishop)..

But we're not quite done yet. In the "Length" box, set a value of "2", since the maneuver will take two plies to complete. And, off to the left, put a check in the box next to "Check move order". If you've followed the steps carefully, your Manoever search mask should look like this:


Now we're ready to see how adding criteria to a search will reduce the number of games ChessBase finds. You'll recall that in our first search we looked for games in which "a White Bishop (starting on any square) moved to h7 and captured a pawn with check between moves ten and thirty-five", for which ChessBase discovered 12,657 games. By adding an additional maneuver, we've redefined the search -- it can now be defined as "all games in which a White Bishop (starting on any square) moved to h7 and captured a pawn with check, and in which Black replied with ...Kxh7 between moves ten and thirty-five".

When you're ready to start the search, click "OK". After a couple of minutes, ChessBase will return the results: 5,245 games in which this pair of maneuvers was played. It's readily apparent that the addition of this second maneuver drastically decreased the number of games found in the database.

It's obvious that over 5000 games is still too many to play through. But there are other uses for such a database search besides trying to play through individual games. You could merge the games into a tree to study the statistics -- for example, you could see what openings frequently lead to the maneuver. You could copy the games into a separate database and then use the Player Index to see what players found themselves in this situation more than once.

But if your main goal is to find games to replay, you'll likely want to find another way to pare down the results by adding even more criteria to the search. Leaving the Manoever tab as it's pictured above, click on the Game data tab and look for the Result section:


Put a check in the box next to "1-0" as shown in the illustration. Note that when you do, a check appears automatically in the "Game data" box (in the lower left of the illustration). You'll notice that there are checks in two of these boxes: Game data and Manoevers. This is done to remind you that you're performing a combined search, using parameters that have been set in both the Game data and Manoever sections of the Search mask.

We've just added an additional criteria to our search: games won by White. So our search can now be defined as "all games in which a White Bishop (starting on any square) moved to h7 and captured a pawn with check, and in which Black replied with ...Kxh7 between moves ten and thirty-five, and which White ultimately won".

Give "OK" a click and, after a minute or so, you'll discover that there are 2,488 games in which this occurred. You'll notice that ChessBase took a lot less time to perform this search compared to the previous two searches. Why is this? After all, we did add an additional parameter. So why is the search faster? The answer's pretty easy if you think a bit about it. Notice that we added a Game data parameter to the search and that dang near everything in the Game data section of the Search mask applies to information found in the headers of games. This points us right to a handy tip to remember when using ChessBase: the program always searches for header information first, before looking for positions, maneuvers, and material balances. This speeds up searches considerably. In our first two searches, ChessBase had to look at moves ten through thirty-five in every game in the database to see if the maneuver criteria applied. In this most recent search, it only looked at moves ten through thirty-five of the games that White won, ergo there was much less material that had to be sifted through.

All right, let's say that you want to narrow down the seach even further. Let's pick a player and see how he does with this maneuver. Leaving all other parameters as they were for the last search, we'll go to the top of the Game data section of the Search mask, type "Alekhine" for the White player and uncheck the "Ignore colors" box:


Note that the Player fields are case-sensitive, i.e. they recognize capital and lower-case letters -- so make sure that the "A" in "Alekhine" is capitalized. Also make sure that it's typed in the left-hand box, since ChessBase lists players with their last names first and first initials or names second. You don't need to put anything in the right-hand box. You could put an "A" (for "Alexander") in this box, but you'd then miss all Alekhine games in which no first name or initial was given (which is not a problem with Big/Mega Database, but could be critical with other databases, especially those you've downloaded from the Interrant). Likewise, you could type "Alexander" in the right-hand box, but this would miss all games in which the name was given as "Alekhine, A" or just plain "Alekhine". Remember the rule of thumb we've been discussing: the more information you put in the Search mask, the less information you tend to get back from the program.

Keep in mind, too, the reason we unchecked "Ignore colors": we want only games in which Alekhine played the White pieces.

Our new search can be defined as "All games in which Alekhine played White, and a White Bishop (starting on any square) moved to h7 and captured a pawn with check, and in which Black replied with ...Kxh7 between moves ten and thirty-five, and which White ultimately won".

Click "OK" to see what turns up. In considerably less than a minute (on a PIII 800), you'll see that there are six games in which this occurred -- all of which were won by Alekhine.

All right, I think that's plenty for one week. We've examined:

That last point is crucial to your understanding of how to use a database. I've answered dozens of e-mails from players who go hog wild when putting parameters into the Search mask and who then complain of the paucity of information returned: "I did a search for all annotated Fritz Saemisch games as Black in the Catalan in which a White Bishop was on the a8-h1 diagonal, a Black Rook controlled the open c-file, Black had a f7-g6-h7 pawn structure, White sacrificed a Queen on a6, with a Bishop vs. Knight endgame with four pawns for each player, and which Black won in under 50 moves, and I got nothing back. There should have been thousands of games." Yeah, right. I hope this week's column has provided the answer to that particular problem. It's not the program, it's the search.

Until next week, have fun!

You can e-mail me with your comments, suggestions, and analysis for Electronic T-Notes.