by Steve Lopez

Last week we put together a database of game on the selected opening (our example uses the Queen's Gambit Accepted, ECO codes D20 through D29). This week we'll merge them into an opening tree so we can do some statistical analysis.

Follow the instructions for creating a new opening book/tree (the terms are used interchangably) in ChessBase 8 as given in Part 5 of this series (ETN, March 4, 2001). When you're done, you'll have a new icon for the opening book. Then you just drag and drop the icon for the opening database you created in last week's ETN over to the icon for the opening tree. You'll see a dialogue that looks like this:

The "Games" fields are self-explanatory. In the graphic, the program will merge games 1 through 22940 (the whole database) into the tree. If, for some reason, you only want to merge a part of the database you can use these fields to configure what games will be merged into the tree.

"Length" requires a bit of explanation. A box allows you to type in the number of plies (half-moves) that will be merged into the tree. The default is "20", meaning that the lines of the opening book will be 20 plies (10 moves) long. This might be sufficient for some openings. Some heavily analyzed theoretical openings (such as the Ruy Lopez Zaitsev) might require a much higher value, such as 40.

"Absolute length" means that every variation imported into the tree will be the length you set in the box, regardless of whether it's a main line or an oddball branch variation. You'll typically use this for openings that cover a single ECO code.

"ECO relative length" means that frequently-played main lines will result in variations that are longer than the less-often played branch variations. With the ply value set for 30, the branch variations might be cut off after 8 or 9 moves while the main lines will be included out to move 15.

It's your choice, ultimately. If you're only interested in the tried and true theoretical lines, use the "ECO relative length" setting. If you want it all, no matter how odd, unsound, or poor the variation, use "Absolute length". My personal preference is to usually select "Absolute length". I frequently use these statistical trees in correspondence games; I find that my opponents often play sub-optimal opening moves. Having a larger, more inclusive tree lets me see these variations in action.

The final setting is "Include variations". If you have a lot of annotated games in your database, you'll have a lot of illustrative variations included by the annotators. These might be additional main line theory or they might be crappy little pitfalls and traps which are included to show ways in which a player can go wrong and how these errors can be punished. Checking this box includes the variations in the tree. Again, as a correspondence player, I live by these variations. I'm the "cheapo king" in correspondence chess -- I win about 25% of my games right in the opening, mainly due to these variations that I include in the trees I construct. Again, it's ultimately your choice.

Once you've made your selections, click "OK" and the program will go to work constructing a tree. The speed of the process depends on the number of games, whether or not you choose to allow variations, the depth you select for the tree, and the speed of your processor. With 22940 games, set for an absolute length 30 ply tree with variations included, it took exactly 40 seconds to generate a tree on a Pentium III 800 MHz machine.

In Fritz6, you create a new empty tree by going to the File menu, selecting "New", and "Openings book". Choose a folder for the book and change the filename from "noname.ctg" to something that makes a bit more sense to you. Once the book is created, click on the "Openings book" tab in the "Notation" pane, and look under the little picture of a tree to make sure the filename is correct (in other words, make sure you're in the correct book). Then go to the Edit menu, select "Openings book", and "Import games". The Windows file select dialogue appears; use it to select the database you created (in last week's ETN), and then follow the steps as given above for setting the parameters.

In ChessBase 8, you open the tree by double-clicking on the icon for it. In Fritz6, you can load an opening book by going to the File menu, selecting "Open", and then "Openings book" and making your choice from the Windows file select dialogue, followed by clicking the "Openings book" tab in the Notation pane. You should then see a display that looks something like this:

Here's what we first see in the tree: a menu of White's first moves in the Queen's Gambit Accepted. Immediately we have questions. Why are there moves other than 1.d4 displayed? Transpositions. These other moves transposed into the QGA by a different move order. This can sometimes be some very useful information if you're looking for a way to hoodwink your opponent into unwittingly playing your pet opening by going a different route to get there.

Why is the "50%" after 1.e3 shown in half-tone? It was only played in three games, which isn't enough to get a valid statistical sampling, so the half tone is there as a reminder to you to take this number with a pinch of salt.

The columns in this tree are explained in detail in the ETN issue for February 20th, 2000, so I won't go into major explanations here. I'll just give a quick review of what the columns signify:

You can sort the order of the moves in the list by clicking on a column header. For example, clicking on the "%" symbol resorts the moves so that they're listed in order of success, best to worst.

There are some additional tweaks to the tree that can be performed by right-clicking in it and selecting "Properties":

"Np" shows the total number of unique positions contained in the tree.

Checking "Unplayed transpositions" will show transpositional move orders that never appeared in any of the games in the database. Since an opening tree contains thousands of positions, and these are "threaded" together, you can sometimes find strange never-played move orders that lead to the current board position.

"Retromoves" causes the tree to display the played moves just prior to the current board position (the moves that led to this point).

Unchecking "Show Elo numbers" causes the "Av" and "Perf" columns to become blank.

Checking "Statistics" displays a bar graph at the bottom of the opening tree pane. We'll cover this in a minute.

You'll recall way back in Part Two that we were researching a specific line of the Queen's Gambit Accepted:

1.d4 d5 2.c4 dxc4 3.Nf3 Nf6 4.e3 e6 5.Bxc4 c5 6.0-0 a6 7.a4 Nc6 8.Qe2 cxd4 9.Rd1 Be7 10.exd4 0-0 11.Nc3 Nd5

We can easily navigate through the tree to get to the position after 11...Nd5. Just click on a move with the mouse to cause it to be played. You can also use the cursor keys on the keyboard. The up and down arrow keys move the cursor up and down through the current move choices, while the left and right arrow keys step forwards and backwards through the tree. Hitting the right arrow key causes the currently highlighted move to be played. The menu of moves refreshes to show you the replies to that move. Hitting the left arrow key takes back the last-played move to go back to the previous board position.

Let's step through the moves and get to the display after 11...Nd5:

Looking at the top of the display, we see that this position (after 11...Nd5) occurred in 334 games in the database. White (the moving side) scored 50% from this position. The average Elo rating of the White players was 2448, while their performance rating was slightly lower at 2436.

White has a lot of choices here. Rather than look at individual choices, let's just talk about the display right now. Some of the moves are marked with the "!?" notation symbol. This means that there is at least one game in the database in which an annotator has marked the move with that symbol. There is no rating information after 12.h4; this means that there were no ratings available for either the White or Black player for games in which that move was played. The rating info after 12.Nxd5 might look a bit weird at first -- why is there a performance rating but no average rating? That's simple enough: no rating information was available for the White player in that one game, but there was a rating available for the Black player.

Beneath the main list, we see two Black moves: 11...Nf6-d5 and 11...0-0. This shows us that the position was reached by two different move orders by Black, so we know that there are some minor transpositional possibilities here.

At the bottom of the display, we see the bar graph I referred to earlier (by checking "Statistics" in the Properties dialogue). This graph is specific to whatever move is currently highlighted in the main display. In this example, these stats apply to the games in which 12.Qe4 is played. If we use the down arrow key to bump the cursor down to 12.Bd3, we'll see the stats display change accordingly.

There are three bars on the graph. The bright green one shows White wins, the khaki one shows draws, and the red one shows Black wins. The length of the bars shows us at a glance the relative proportion of these results. If we want more specific info, the actual numbers are shown to the right of the bar graph. In the illustration, White won 31 games (26% of the total number in which 12.Qe4 is played), 60 games (52%) were draws, and Black won 24 games (20%). "N=" shows us the total number of games in which 12.Qe4 was played, 115 in this case. Take this next part with some salt, please. I think the parenthetical number indicates that the move appears an additional eleven times in game annotations, as opposed to the moves as they were actually played in the game.

The last two figures deal with the average ratings for both White and Black. We see that White's average rating was 2450, based on 89 games in which ratings were available for the White player. Black's rating average was 2442, based on 92 games.

Last week we created a database and this week we merged all the games together into a tree. So what's next? We need to look into how to use these two information sources in combination to increase our knowledge of our selected opening. That one's going to be a long extended rant, so we'll save it for the next ETN. Until next week, have fun!

You can e-mail me with your comments, suggestions, and analysis for Electronic T-Notes.