What customer loyalty has to do with chess – AI in practice

The Reinforcement Learning method not only solves chess problems, but also helps companies achieve customer loyalty through optimized customer journeys.

“All that matters on the chessboard is good moves.” This quote is from Bobby Fischer, the winner of the “Match of the Century” in which he defeated his Soviet opponent Boris Spasski in 1972 to become the eleventh world chess champion. But what makes moves “good moves” or the opposite? And what does that have to do with customer loyalty?

All that matters on the chessboard is good moves.

Let this post clarify that for you. Without the hassle of math, you’ll learn about a special method of artificial intelligence called reinforcement learning. 

This method not only solves the chess problem, but also helps companies optimize customer experiences to maximize customer loyalty and ultimately customer lifetime value.

Customer loyalty - a chain of next best actions

Table of Contents

Decisions in chess: short-term disadvantage, long-term advantage

So what makes moves “good moves”? For example, is it bad to lose a piece? In the immediate aftermath, certainly. However it can give a later advantage that compensates for the initial loss, for example by destroying favorable enemy structures.

So there are situations in which decisions should not be evaluated solely on the basis of their immediate benefit, but must be placed in a longer-term context. In this context, it is important to make a chain of decisions as a whole as well as possible, and not just a series of isolated individual decisions.

Optimizing a chain of decisions as a whole

How do you formally achieve something like that, in other words, what is the ratio behind algorithms that would allow us to optimally make a chain of decisions?  The answer is very simple: if the result of a decision was good, one must not only valorize this decision, but also the decisions made before. Winning a game of chess is not only the result of mate-setting. If, for example, a pawn was sacrificed beforehand, this decision probably also contributed to the success.

If you find yourself in the same or a similar situation in a later game, you should consider this pawn sacrifice again. Very experienced players have often been in the same or comparable situations and know which moves make sense in the long run and can bring them a bit closer to victory.

A Pawn in a Customer Relationship?! An example from the telecommunications industry

What does this have to do with customer loyalty? Here, too, there is the proverbial pawn. Let’s assume a customer was about to renew his mobile communications contract. Let’s also assume that he had almost used up his monthly data volume, threatening him with throttling of the transmission rate. In line with the mobile operator’s strategy, he receives a text message with an overpriced offer for an additional data package.

Although the customer may accept the offer, especially since he urgently needs to do some research, he finds it impertinent. When it comes time to renew the contract, the customer decides to cancel it. The provider misses out on important future sales.

So what went wrong here? Could it be that a pawn sacrifice, i.e. a discounted data package, would have made more sense? Indeed.

A better approach would have been for the provider to proceed as follows. In a test, it may give a discount to some of its customers, and then offset short-term losses against long-term gains. In doing so, the provider will find that the customer from the above example and customers from the same customer segment (comparable customers) appreciate a little more generosity, renew their contract more frequently, and thus guarantee sustainable revenues

Prevent cancellations - churn prevention and customer retention tools

Early, discounted offers can therefore also be seen as an instrument of customer loyalty and churn prevention. The problem of churn does not normally arise at the end of the contract, but is often the result of a sequence of earlier wrong decisions.

In Figure 1, a typical customer relationship is shown. It should not be forgotten that individual touchpoints in an increasing number of channels (social, push, e-mail, SMS/MMS, etc.) combine to form an overall experience. Companies are therefore faced with the challenge of orchestrating touch points in a multitude of channels in order to increase customer loyalty. In this context, we also talk about the next-best-action (NBA) problem.

Here, the word “next” indicates the sequential nature of the decision problem. As often as this expression is used, in our experience it is rarely understood and implemented correctly.  The reason for this may be that reinforcement learning, a modern artificial intelligence tool for solving sequential decision problems is initially difficult to access and therefore daunting. 


Customer Relation AI
Figure 1: A typical customer relationship in a mobile company

The coordination of all measures in customer relationships, some of which have lasted for years, in many channels within a highly dynamic market, presents companies with massive challenges. Due to a lack of knowledge or against better knowledge, companies have so far optimized decisions in isolation and often based on rules. With the help of reinforcement learning, decisions can be optimized in their entirety and adapted dynamically.

Besides mobile communications providers, the target group for such a solution includes companies from industries with long-term customer relationships, such as banks, insurance companies, and energy providers.

Customer loyalty and the art of generalization

The previous section talked about comparable situations on the chessboard and comparable customer behavior. Having a notion of comparability is of enormous advantage in both contexts. The reason is that with the multitude of possible courses of play and the diversity of customer behavior, even with a huge wealth of experience, there are always situations that are different from those seen up to that point. Those who can then draw comparisons with what they have seen before, who are able to generalize previous experience, as it were, will be able to make a well-founded decision despite the new situation.

This formally means identifying invariant features of situations that require the same optimal decision. In this context, we also speak of pattern recognition. In chess, it may be that the exact position of a bishop that is neither itself threatened nor threatens another piece is negligible for the next best move. In the customer example, it may be that the effectiveness of, say, a particular upsell campaign does not depend on the age of the customer, but on whether the last invoice amount was comparatively high. The invoice amount would therefore be one of these sought-after invariant characteristics.

Identify features using Deep Learning

Another method of artificial intelligence, deep learning, has the outstanding property of independently identifying those features in data that allow the best possible generalization. This has made it possible to achieve performance on a par with human capabilities in some fields of application, such as the diagnosis of leukemias[1]. These algorithms were often able to detect and exploit features that were hidden from humans. 

However, in addition to complex strategic tasks, humans are still likely to outperform artificial intelligence in the long run in the following task: Knowledge transfer. Humans are outstandingly good at transferring knowledge acquired in one context to another. Software has had a hard time at this task so far[2]. 

Still, algorithms can be extremely useful in well-defined problems. Deep learning finds application, for example, in the design of drugs, the recognition of language, or the restoration of images. The combination with reinforcement learning allows users to enjoy the advantages of both worlds. Corresponding algorithms make it possible to make sustainable decisions even in unknown situations by means of generalization. 

Deep learning, however, is not always the best way to generalize. In fact, in our experience, a lot of money has been sunk in recent years into projects that would have been better implemented with less complex algorithms. Deep learning methods all require large amounts of data to create value. 

Not every application meets this requirement. That’s why you shouldn’t rely on a single algorithm, but always a set of complementary algorithms that can take over depending on the data.

The video below shows how a robotic hand using deep learning and reinforcement learning solves the Rubik’s cube.

Ability to adapt in customer loyalty programs

Besides sustained action and the ability to generalize, a third characteristic is needed to become a good chess player or to substantially improve customer experiences: the ability to adapt quickly to changing conditions. When playing against an unknown opponent, for example, the strategy painstakingly learned in previous games can suddenly become less effective.

For example, in the case of a mobile operator, a new competitor may enter the market, or existing competitors may initiate a new strategy. Both potentially influence the behavior of regular customers. The world is changing and it seems to do it faster and faster. It also happens that special marketing campaigns attract customers who show behavior that is not very similar to existing customers. The buying experience is then promising, but the subsequent customer relationship is unsatisfactory.

Just relying on an existing strategy and simply exploiting what has been learned up to that point would be negligent in this case. Instead, the strategy should be further developed through careful experimentation in line with the changes. 

Formell heißt das, nicht immer diejenige Maßnahme zu tätigen, die nach aktuellem Kenntnisstand optimal ist, sondern gelegentlich eine vermeintlich suboptimale Entscheidung zu versuchen, da nur so ermittelt werden kann, ob diese zwischenzeitlich doch das gesuchte Optimum herbeiführt.

Decisions that were suboptimal for customer retention management yesterday may be optimal today, and vice versa. This can only be found out through appropriate, timely tests.

A good solution strikes a subtle balance between exploiting existing knowledge and exploring new situations. Such software is de facto self-learning, recognizes trends early on and prevents losses in sales. See also Figure 2.


Customer Loyalty AI
Figure 2: Self-learning software can learn in real time from the interaction with each customer and apply this knowledge to the next interaction. In contrast, efforts to keep pace with a rapidly changing market through iterative improvement and the participation of multiple teams often do not lead to the hoped-for success.

Churn scores - useful metric, but not a universal remedy


A lot of companies still rely on so-called scores. A churn score, for example, is proportional to the probability that a customer will cancel his contract within a certain period of time.

In this case, the underlying probability is derived from statistical models of termination behavior. These models may often be cleverly designed, but there is a problematic disconnect here: The business side must decide what measures to take to retain customers depending on the level of the churn score.

A high score, for example, does not indicate the best countermeasure to prevent cancellations. If a bonus is set too high, the company gives away money; if it is too low, it loses the customer. The GoodMoves software closes this important gap by directly suggesting optimal measures for each customer, taking business logic into account.


Conclusion and outlook

To summarize, playing chess and retaining customers at a high level requires three types of skills.
  • First, to act sustainably
  • second, to generalize
  • and third, to be able to adapt independently.

All three of these capabilities should culminate in solid software in a single artificial intelligence.

The best software solutions currently available allow decisions about NBA to be automated to the greatest extent possible. However, humans are still required in thinking about these actions, and the software then simply selects one of them according to the situation. 

It can be expected that software will increasingly be able to take over creative tasks in the future. Initial examples in music and the visual arts provide impressive proof of this..



[1] “An Artificial Neural Network Providing Highly Reliable Decision Support in a Routine Setting for Classification of B-Cell Neoplasms Based on Flow Cytometric Raw Data”

Wolfgang Kern, MD , Franz Elsner, PhD , Max Zhao , Nanditha Mallesh , Richard Schabath, PhD , Claudia Haferlach, MD , Peter Krawitz, MD , Hannes Lüling, PhD , Torsten Haferlach, MD

[2] “Knowledge Transfer between Artificial Neural Networks for Different Multicolor Flow Cytometry Protocols Improves Classification Performance for Rare B-Cell Neoplasm Subtypes”

Nanditha Mallesh , Max Zhao , Franz Elsner, PhD , Hannes Lüling, PhD , Richard Schabath, PhD , Claudia Haferlach, MD , Torsten Haferlach, MD , Peter Krawitz, MD , Wolfgang Kern, MD


About Me

As a Data Scientist, I have been working for DAX companies and startups for many years. 

With my munich based company, res mechanica GmbH we started GoodMoves in 2020. It is a software service for NBA (Next Best Action).

The technology behind it achieved the highest score ever awarded in the prestigious “EXIST” grant from the German Federal Ministry for Economic Affairs and Energy.

Do you have any questions or comments? Then I am looking forward to hearing from you.


Get all your Customer Centricity topics conveniently in your mailbox.

100% free of charge | 25 min. individual consultation

This service is not only available because of the current corona situation.
But right now it is more important than ever, together and for each other.