WELCOME!

Welcome to Electronic T-Notes, the weekly on-line newsletter of ChessBase USA. In Electronic T-Notes, you'll find the latest news and information about ChessBase software, along with tech tips and articles about the world of chess information that ChessBase provides.

Please allow me to introduce myself. I'm Steve Lopez, editor of ChessBase USA's on-line publications. If you're a long-time ChessBase user you might remember me from a few years ago: I was Director of Customer Services for ChessBase USA from 1993 to 1996. But now I've returned to my first love -- writing -- and I'll be your host in this forum over the coming weeks and months.

Please keep in mind that Electronic T-Notes is your forum. Please don't hesitate to send me suggestions for articles or tech questions that you'd like to see answered here. You can send your ideas, questions, and comments to Don Maddox at luzhin@intrepid.net and he'll forward them to me. Your feedback is greatly appreciated, so please let me know your ideas on how to improve Electronic T-Notes!


THE TYRANNY OF NUMBERS

(PART ONE)

by Steve Lopez

This series of articles is based on "An Introduction to Statistical Game Trees", which appeared in Touching Base #1-3. I've recently received a number of requests from readers to provide them with copies of the old article, so I thought that a revised version would be a good start for our new publication.

Poor CBTree! It's the most misunderstood program that ChessBase offers. It's the James Dean of our inventory. There are a couple of apparent reasons for this lack of understanding (and neither one involves driving cars off a cliff):

1) The concept of a game "tree" is not widely known by beginning to intermediate players;

2) Many people are unsure how to interpret the statistical data that CBTree generates.

Over the next several weeks, I propose to tear away the veil of mystery that surrounds statistical tree functions and make you an expert at interpreting the data that CBTree provides. (And the general statistical principles we'll examine apply to any statistical chess tree software, not just CBTree; we'll call it an "added bonus").

To understand statistics generated by a tree program, you must first understand what a chess tree is. A lot of chess writers have gotten plenty of mileage out of describing the tree concept (most notably Alexander Kotov in Think Like a Grandmaster and Andy Soltis in The Inner Game of Chess). It's actually a pretty simple idea and one that relates directly to something you do every time you play a game of chess.

We'll start with a board position. Here's the one that arises after 1.e4 e5 2.Nf3 Nc6 3.Bb5 a6 4.Bxc6:


After White has captured the Knight on c6, Black must decide what to play next. He studies the board and decides on a couple of moves that he might play. These are known as candidate moves. In this particular position, he has only two moves that make sense: 4...bxc6 and 4...dxc6. Anything else loses material to White.

If Black is smart, he'll try to figure out White's possible responses to either of his two candidate moves before he actually makes his move on the board. If he's feeling particularly energetic, he'll then try to visualize his replies to each of White's moves. If we make a diagram out of the moves that Black considers, we might wind up with something that looks like this:


At the bottom of the diagram, you'll notice the move that White actually played. Following the lines upward, you'll come to the moves that Black is considering playing. Above these, we see White's responses to those possible Black moves.

If we extended the diagram further, we'd see Black's 5th move responses to each of the White 5th moves, and above those we'd come to White's 6th move responses to each of Black's 5th moves. As we progressed to each new level, we'd see greater and greater numbers of moves spreading farther and farther out to the sides. This spreading effect makes the diagram look a lot like the branches of a tree. Consequently, we call a set of candidate moves, along with their responses, a game tree.

Notice that this not the complete game tree. Nobody in his right mind tries to look at every possible move he could play in a position, much less all of the possible responses, etc. In a typical middlegame, the moving side may have as many as 35 possible moves available to him. Assuming that his opponent has an equal number of responses to each of the moving player's options, there would be 1,225 moves for the moving player to evaluate after just one move for each player. Add on 35 responses for the moving player to each of his opponent's moves and the number suddenly jumps to 42,875 board positions that would have to be evaluated. Comparatively speaking, our candidate move tree is just a pleasant little bush!

So you can see that the idea of a game tree is not terribly difficult to understand. When it's applied to a database of games, it's even easier to comprehend. A tree created from a database of games takes into account all of the moves that were actually played (rather than those that could have been played) in a given position.

Here's where CBTree comes in. What CBTree does is take a batch of games and make a game tree out of them. You can start at the starting position for a game of chess and see all of White's first moves that were actually played in the games of that database. After choosing one of those initial White moves, the program will then show you all of Black's responses played in the games of that database. In effect, you're examining all the games in the database at once rather than looking at them one at a time as you normally do in ChessBase or Fritz. CBTree also catches transpositions (positions arrived at by more than one move order), so it's really a positional database rather than a games database. Be aware, though, that CBTree only examines main lines and not the variations that may be included in some of the games in the database.

And here's the really crucial feature of CBTree: it gives you statistical analysis of every move in the database. "Statistics" is an ugly, scary word to some people and that's not necessarily a bad thing. Fear breeds caution. Numbers don't lie but they can sure be misleading. Pointing out the pitfalls of blindly following that "tyranny of numbers" is what this series of articles is all about.

We've already defined "game tree" and looked at what CBTree does, so it's time to see this baby in action. First we need a database of games. CBTree will work on a database of up to 4096 games.

Allow me to digress for just a moment. Yes, the 4096 game ceiling is a programming limit and one that the program has taken a certain amount of criticism over. But this limit is not necessarily a bad thing. Submitted for your approval is Steve's Rule #1 for using a chess tree program: to avoid massive confusion and aggravation, don't run any tree program on a large batch of unrelated games. There are several reasons for this, among them:

1) It's very easy to lose your place and completely forget what you're looking at in a large games tree. Back in the old days of DOS Fritz and ChessBase, while studying a large survey game on a Fritz PowerBook disk, I frequently forgot where I was in the labyrinth of variations and had to hit the [N] key to reorient myself by calling up the complete gamescore;

2) Large mishmashes of unrelated info in tree form are harder to work with than a smaller database of related information. If you have 1,800 Austrian Attack games in your million-game database, the tree program will have 1,800 games to chew on whether they're in a million-game tree or in a smaller tree of just games using that variation. The difference? Your computer will create a tree quicker and likely access those 1,800 games faster if the tree is created on a smaller database rather than on your complete collection.

3) Tree files take up disk space. CBTree is the only chess tree program I've seen that isn't a major storage hog. Plus it's a bit silly to create a huge tree of millions of positions if you're only going to be looking at a fraction of them anyway.

So how do you use the 4096 game limit to your advantage? Simple: when setting up a database to convert into tree form, concentrate on games that follow a common theme. Perhaps you want to look at games of a certain ECO code or variation. Maybe you want to study the repertoire of a particular player. Or you might want to run a tree on the games of a recent tournament, to get a good overview of what openings are currently in vogue. Maybe you'd like to expand that to make a tree of all the games from a volume of the Informant or ChessBase Magazine.

I'm sure you see the idea. You find a common thread and run CBTree on the games that fit that thread. It gives you a base to work from and fulfills Steve's Corollary to Rule #1: confusion is bad -- cut corners wherever possible to avoid brain hemorrhages. Trust me on this one; experience is a harsh teacher.

OK -- let's get down to some concrete examples. I've assembled a database of Ruy Lopez Exchange games, killed the duplicates, ended up with 1,729 games, and run CBTree on them to create a game tree. Now I fire up the "tree viewer" part of the program and see what I've got.

On the left side of my screen I see a chessboard set up to begin a game of chess. On the right side of the screen is a chart of all of White's first moves from the database. Since the Ruy Lopez Exchange is a King's Pawn opening, there's obviously just one move listed. The listing looks like this:

MOVEFREQUENCY%RESULTELO
1.e4 1729 100% 0.54 2287

What exactly do these numbers mean? The move 1.e4 was played in 1729 games (which makes up 100% of the tree at that point). The average Elo rating of the players who chose to play 1.e4 is 2287. The 0.54 refers to the avaerage success of the move. How this number is derived could stand a little explanation.

The success rate (0.54 in this case) is a number that tells you whether the move was better for White or for Black, as judged by the results of the games in which that move was played. In CBTree, it's always a number between 0.00 and 1.00 and is always expressed from White's point of view (1.00 means that White won all the games in which the move was played while 0.00 means that Black won all the games in which a move appears). The success rate is an average of all of the final results of all of the games in the database. To derive this number, White wins are avaeraged in as "1.00", Black wins as "0.00", and draws as "0.50".

I can already hear you protesting: "Some of the games in the database are just analysis lines that end in evaluation symbols! How do they factor in?"

Lines that end in analysis symbols (such as "+-" for "White has a won game") are assigned a numerical value. I must apologize at this point: there are no ASCII characters for some of these evaluation symbols. We have to use a close approximation: two symbols separated by a slash should be read as though the first symbol is over the second symbol. Thus "+/=" should be read as though the plus sign was over the equal sign (the symbol for "White is slightly ahead"). Here's a list of the symbols and the numerical values that CBTree assigns them:

Unclear or unevaluated lines get the benefit of the doubt (as they could go to either player) and are rated as 0.50. As previously stated, CBTree identifies all of the games in which a position appeared, averages their final outcomes, and presents you with a number that is an evaluation of the relative strength of that move. (If you're a real statistics junkie, these relative values are interval measurements, not ratio measurements; i.e. 0.70 is not "twice as good" as 0.35).

Returning to our example, we see that 1.e4, played in 100% of the games, averages out to an evaluation of 0.54. What this means simply is that the Ruy Lopez Exchange variation, when played competently, doesn't offer White a particularly large advantage over Black; in fact, White and Black seem to split the points fairly evenly.

Please note, though, the use of the term "played competently". As is the case with many openings there is plenty of room for error in the Ruy Exchange. Helping you find these pitfalls before you tumble into them is what CBTree is all about. We'll look at how CBTree provides you with that help in next week's Electronic T-Notes.