by Steve Lopez

We've put together a database and opening book on our chosen opening. It's time to link them up and get to work looking at some numbers.

The first step is to take the new database we created (see ETN March 11, 2001) and designate it as the reference database. Right-click on the database's icon in ChessBase 8 and select "Properties" from the popup menu. Put a check in the box next to "Reference database" and ignore any notification saying that the reference database should be larger. I change reference databases multiple times a day. I usually have it set for Mega Database 2001, but if I'm doing work on a specific opening I'll switch to a smaller database on just that opening.

You'll recall that the sample opening line I've been using is in the Queen's Gambit Accepted and goes as follows:

1.d4 d5 2.c4 dxc4 3.Nf3 Nf6 4.e3 e6 5.Bxc4 c5 6.0-0 a6 7.a4 Nc6 8.Qe2 cxd4 9.Rd1 Be7 10.exd4 0-0 11.Nc3 Nd5

So what we want to do now in ChessBase 8 is open up the tree and get to this position. Double-click on the icon for the tree and step through the moves until you reach the position after 11...Nd5:

You'll recall that we looked at this display in last week's ETN. We're seeing a breakdown of every move in the database that was played after 11...Nd5, along with statistical information about each of those moves. I have the moves ranked by number of games played, but (as described in an earlier installment) this can be changed to any of the other three columns just by clicking on the column header (for example, we can change the listing to reflect the relative success rate for White by clicking on the "%" column header).

We have a whole lot of raw information here. What do we do with it now? How do we interpret it? We first need to look at the nature of statistics.

I'll valiantly fight off the impulse to quote the old cliché about "three kinds of lies", but it's certainly a true enough statement (which, I suspect, is how it became a cliché). Statistics will lie like a cheap rug (another cliché -- no extra charge). You can never take statistical information at face value. Advertisers used to toss stats around pretty freely in their product campaigns. For example, "four out of five dentists surveyed recommend BreemCreem toothpaste over the leading competitor". OK, fair enough. But what exactly does that mean? The line at face value seems to indicate that 80% of all dentists think that BreemCreem is pretty hot stuff. But let's look a bit more closely. Note that it says "four out of five dentists surveyed" -- this could mean a lot of different stuff. It's possible that this was a legitimate survey in which hundreds of dentists were queried as to their toothpaste preference. However, it's also possible that only five dentists were surveyed, two of whom own stock in BreemCreem, one of whom has a good-for-nothing brother-in-law who works in BreemCreem's shipping department, and the fourth dentist was actually making an honest recommendation. Another interpretation involves the use of the phrase "over the leading competitor". This could mean that those four dentists think that BreemCreem is utter crap but they still think it's better than FroomCreem, which sells better than BreemCreem (probably because FroomCreem's marketing mooks do a better job at manipulating stats than the BreemCreem guys). But maybe the survey asked dentists to rank several different brands and BreemCreem ranked higher than FroomCreem in four of the five surveys, but still finished way behind BlamCreem (the #1 toothpaste on all five surveys). This is precisely the reason why you see a lot of small print in American TV ads -- the fine print explains a lot of this stuff (and it makes for pretty interesting reading if you've taped a TV show and use the pause button when this stuff pops up in the ads. You'll be positively amazed by some of it).

We don't even have to look at advertising to see examples of bad stats in action; you'll sometimes see this stuff appear in everyday life. I used to work in auto parts and I'd often hear some interesting comments from cow-workers. "Man, the driver's door handles on the new Zoommobiles must be total garbage! Look how many we've replaced under warranty compared to the other Zoommobile handles". To which the only sensible reply is "Duh!" The guy making the comment wasn't thinking about what he was saying. The driver's door handle is the one that's used most often (not too many people get into the car through the passenger rear door and climb over the seat to get to the steering wheel). A moment's thought about the ratios (i.e. number of breakages per handle compared to the number of uses per handle), you find out that the handles aren't any more liable to breakage than any other door handle on the car.

The same thing applied to overall auto performance. "I'd never own one of these Zoommobiles," a technician once said. "Look how many of them come into the shop". Sure -- that's because Zoom Motor Company sells twice as many Zoommobiles as any other car they offer. More cars on the road equals more cars coming in for service. If we were going strictly by his criterion when purchasing a car, we'd all be driving $74,000 WhizzerSharks instead of $12,000 Zoommobiles.

The lesson is this: when confronted by stats, we need to look a bit deeper than the raw numbers. And this certainly applies to chess tree statistics.

When looking at this particular tree, we need to immediately consider a few things. Should we even be looking at the last four moves on the tree? The success percentages for them are in half-tone, meaning that these moves haven't been played often enough to get an accurate statistical gauge of their ultimate success or lack thereof. We'll come back to them later. For now, let's just stick to the top eight moves on the list.

The move 12.Qe4 has been played the most often. The success rate isn't bad (53%, which makes it pretty average). Looking at the bar graph at the bottom of the display, we see that 52% of the games were drawn. Of the decisive results, White has the edge: 26% to 20%. But look at the number of games instead of the stats. In this case, White's edge doesn't look nearly as good: 31 games to 24 -- not a huge margin by any means. Looking at the rating information, we see that the average rating of the players who made this move is 2450, with a performance rating of 2463 -- impressive enough for those of us down here in the fishpond.

But there's another snag here: what if the move did fairly well for a number of years, but has recently been "busted" by a new move for Black? The stats will show that the move looks pretty good. It's been played in 115 games in the database. But what happens if it did well for 114 games and suddenly got smashed in one game? Consequently, the move 12.Qe4 would be toast and nobody plays it anymore. Hmmm...

Another criterion to consider is the straight percentage info. In this case, 12.Bd2 gets the nod. It only appeared in 18 games but has a phenominal success rate: 67%. The bar graph shows eight wins for White, eight draws, and only two wins for Black. But, once again, one of those wins might be a "bust", so we'll need to look at the games more closely.

Yet another evaluation criterion is the average rating of the players who chose a particular move. The move 12.Bb3 scores best here, with an average rating of 2512, a success rate of 57%, and a 5 to 2 win ratio for White (with 16 draws). But the fly in the ointment with this move is exactly the criterion we used to select it -- the rating average. The average rating of 2512 is pretty high; what if the move requires precise tactical or positional skills to exploit, skills possessed by a grandmaster but sadly lacking in Steve the Patzer?

The good news is that there are some tools provided in ChessBase 8 that we can use to at least partially answer these numerous questions. Let's examine these various approaches:

1) The Opening Report. This is a pretty handy tool, as we saw in ETN a few weeks ago (February 25, 2001). In this particular case, we want to examine the bar graph displaying when the move was played. If a move was very popular over a span of years but its usage suddenly drops off to nearly nothing, it's possible that the move's been busted. It's also possible that the move has just gone "out of style". What you'll want to do next is a database search on that position. In this case, we'll check out 12.Qe4. Highlight the move in the game tree and then either single-click on it or give the right cursor key a whack to make the move on the chessboard. Right-click in an empty spot on the game tree and select "Search games":

Looking at the board, we see that it's the correct position, so we just click "OK" to have ChessBase 8 search the reference database (you'll recall that we designated our database on the QGA as the reference database back at the beginning of this article). We get a small pane showing the games found in the search. Right-click on the first game to get a popup menu, select "Edit", and then "Select all" to highlight all of the games found in the search. Then right-click again, select "Edit", and then "Copy to clip database" to copy the games to the ChessBase 8 clipboard. Click on the "ChessBase 8" icon on your Windows taskbar to bring the database window back up on top and then double-click on the "Clip database" icon to open it up. Right-click on the first game in the database to get a popup menu, select "Sort", and then "Date" from the submenu. The clipboard list will be sorted to display the games in chronological order.

Scrolling down to the bottom of the list, we see that 12.Qe4 was played five times in the year 2000. There were two White wins, two draws, and a Black win. The win for Black took 67 moves, which seems to indicate that there's no immediate bust here. However, it's best to open up the game and play through it to make sure. For those of us down here in the fishpond, it's also a pretty good idea to have a chess engine running as we play through the game; we'll keep an eye on the evaluations provided by the engine to see when the swing to Black's favor occurred in the game. Double-click on the game in the list to open it, hit F3 to select an engine, and then start playing through the game. You can use the VCR buttons provided to do this; these work just like your TV's VCR (except that there's no blinking "12:00").

In the game in question (Hanko-Demina, Zvolen 2000), we see that White was rated an Expert while Black was rated at Master level. The position stays pretty even until White slips up on move 27 and gives Black a slight advantage. Black increases it to a definite edge by the 28th move. White makes another questionable move at move 39 and the evaluation shows that Black has the game in the bag. (The game also leads to a nice Rook vs. Bishop endgame that's worth replaying, by the way). So there's no bust here.

A handy reminder: the CB8 clipboard is not cleared automatically when you close it, so if you copy another set of games to it, they'll be added to the batch currently on the clipboard. To clear the clipboard, right-click on its icon and select "Erase". This does not affect the games in the original database -- it just empties the clipboard itself.

2) The Book Analysis window. This is another useful tool for further statistical analysis. With 12.Qe4 highlighted in the tree, go to the Window menu, select "Panes", and then "Book analysis window" from the submenu:

ChessBase 8 works for a few moments and then displays the above information pane. CB8 "peers ahead" at the possibilities in the tree and presents us with the most often played variations, the ones the program thinks we should look at based on the statistical results in the tree. These will not neccesarily include the highlighted move in the tree -- the display is based on the current board position, not on the highlighted move. Once again, this display is based on statistics and should be treated accordingly, but it does provide some valuable pointers.

For example, we immediately see that the top line presented is 12.Qe4 Nf6 13.Qe2 Nb4. But look at the figures given in parentheses: White has a 62% success rate, but the number of games in the database has jumped to 232 (up from 115 as displayed in the tree). So we know right away that there are transpositional possibilities after 12.Qe4.

The book analysis window can offer even more display options. Right-click in the window and select "Min %". This brings up a dialogue in which we enter a number. We can use this dialogue to have CB8 display a greater or fewer number of variations, depending on a percentage of the games in the tree in which a variation was played. The display above was generated with this figure set to "5", meaning that a variation had to appear in at least 5% of the games for it to be displayed. If we bump this up to "10", we get variations that appeared in at least 10% of the games (and this cuts the number of displayed variations down to two). This is useful for eliminating odd side variations from this display.

Right-clicking and selecting "Variation board" gives us a mini-board with some VCR buttons. We can use this to play through the displayed variation without disturbing the position on the main board and in the tree display. Just highlight a variation, activate this board, and you'll see the position at the end of the highlighted variation.

A potentially very useful command from the popup menu is "Critical line". This will display the most successful line for the moving side, based on a statistical analysis of the games in the tree. Or, as the CB8 Help file puts it: "The critical line is the one in which both White and Black make the statistically most promising moves. It is displayed in red at the bottom of the book analysis window." In this case, we see that 12.Bd3 Ncb4 13.Be4 Nf6 14.Ne5 scores only 25% for Black, based on two games in the database. Two games are a very small sampling upon which to base a decision, so it would be best to play through both of these games (with an engine running in the background) before deciding on a course of action.

3) Stepping through the tree. This can be a very useful procedure. In some ways, functions like "Critical line" in the book analysis window eliminate some of the need for this, but "Critical line" is used as a shortcut far too often. Even if you use "Critical line", you should step through the game tree a move at a time and try to discern the reasons why each move was played to increase your understanding of the variation. Starting with the position after 11.Nd5, have a look at some of the variations and positions farther along in the tree, based on criteria of your own choosing: success rate, number of games played, etc. What you want to accomplish here is a deeper understanding of these variations and the possibilities that arise once Black has played the Knight to d5. At key points in this examination, you might want to right-click in the tree and select "Search games" to look at some full games in which a particular position appeared. If you get stuck on a particular position when trying to figure out why a move was or wasn't played, consult a chess engine.

4) Playing through the games. Suggestions 1 and 2 provide some nice shortcuts, but nothing beats actually playing through the games themselves. Highlight 12.Qe4, give the right-cursor key a whack to make the move, right-click, and select "Search games" to get a list of the games in which that move appeared. Copy them to the clipboard (as in procedure #1 above), open the clipboard, and look for annotated games (the ones that have a "V" or "C" in the rightmost column). You can easily play through the games, play through the variations, and read the commentator's notes. You might even run a chess engine in the background to double-check the annotator's analysis (keeping in mind the positional limitations of chessplaying programs, of course). In many cases, you'll also see middlegame and endgame notes that definitely increase your understanding of the variation in question.

Now let's return to the four "bottom" variations in the game tree. 12.Ba2 just looks crummy (25% for White) but it was only played in four games, so it's an easy enough matter to play through the games and see if there's a decent idea hidden somewhere in the resulting positions. 12.h4 scores a whopping 75%, but it only appeared in two games (in which ratings aren't available for the participants), so again we need to look at the games. As always, the point is not to be driven by the numbers in the tree, but to look at the possibilities (or lack of them) that each of these moves affords. You can use any (or all) of the four procedures offered, but remember that the key is always to play through some games to verify the statistical information. The statistics can provide you with guideposts, but don't allow your thinking to be dominated by the numbers.

Note that in every one of the four procedures I offered, playing through games is repeatedly stressed. This is no accident. ChessBase is not and never has been a "magic answer machine". It is a data storage and retrieval system. It can eliminate literally hours of work when you're looking for information. You don't have to dig through book after book looking for all of the games in which a position appeared or in which a variation was played -- ChessBase will pull that information up for you in a flash. ChessBase will even offer suggestions as to what should be played in a position, based on the database information provided to the program when you invoke a function like the Opening Report or when you create a tree of games. In many cases, this information will be spot on. But there are sometimes instances in which a slavish devotion to the raw numbers generated by a statistical function will steer you wrong. If you memorize an opening based strictly on numbers and percentages, you learn nothing. But using statistical information in a game tree, or generated by an opening report or "Critical line" feature will offer you guidelines on what you should be studying. ChessBase isn't some kind of Delphic oracle. It's just a tool (albeit a very powerful one) and should be considered as such. It can't do your thinking for you. The process of assimilating the information ChessBase provides is all up to you -- and, as I've said numerous times earlier in this series, understanding beats memorization hands down every time.

The big things you want to do when playing through games you've uncovered in a search are to understand the positions/variations and to see if you're comfortable with them. A particular variation might be theoretically sound and score big statistically, but if the resulting positions lead to closed strategic games and you're a fiend for wide-open tactical positions, you might not like the positions you get from a certain variation (and you might even completely screw them up when confronted by them in your own games). On the plus side, if you find yourself weak in matters of long-term strategy, such positions might move you to start studying strategic themes in an effort to become a more well-rounded player (the old "when life gives you a lemon, make lemonade" cliché). As always, it's a matter of what you choose to do with the information you uncover using ChessBase 8.

A couple of side notes before I close off ETN for another week. Fritz6 owners can do database searches right from a tree as outlined in procedures #3 and #4 above. In this case, you'd load your Queen's Gambit Accepted database (F12 to open the database window, then go to the File menu and select "Open/Database" to load it). Leave this window open, but go back to the main chessboard screen by clicking the Fritz6 button on your Windows taskbar. Load your QGA opening book by going to the File menu and selecting "Open/Openings book". Then click on the "Openings book" tab in the Notation pane to display the book's contents. When you see an interesting position and want to play through some games, right-click on an empty spot in the tree and select "Search games" from the popup menu. This brings the database (game list) window back up on top and will display the search results after a few moments. Note, though, that procedures #1 and #2 ("Opening Report" and "Book analysis window") are not available in Fritz6.

And, finally, a subject near and dear to my heart. All of the above procedures involving using a chess engine are directed at players who are doing "home preparation" on an opening to get ready to use it in a future game. If you're a correspondence player researching moves in response to an opponent's move, PLEASE do not use a chess engine at ANY point in your research unless your particular correspondence federation allows such useage or the rules of the particular tournament you're playing in allow it. Database useage is allowed in just about every correspondence event, as well as the use of statistical game trees. Even functions such as "Critical line" are allowed (because, in theory, given enough time and determination, you could actually derive the information yourself by doing the bean counting manually). But using a chess engine to generate or "double-check" a move is almost universally disallowed. It's unfair, unethical, morally wrong, completely despicable, and you're scum if you do it. And the argument that "everyone's doing it" just doesn't wash. Two wrongs don't make a right (I've done the math). If you cheat by consulting a chessplaying program during a correspondence or online game, you're a lowlife, plain and simple. I can't make it any more plain than that.

Guess what? We're still not finished with this series. More tips are forthcoming next week. Until then, have fun!

You can e-mail me with your comments, suggestions, and analysis for Electronic T-Notes.