Research

Universal Communication

How might we allow communication between any agents over any medium?
By Luc Caspar
|
April 22, 2024
A picture of the Nukabot at the National Museum of Emerging Science and Innovation.
Credit: The National Museum of Emerging Science and Innovation
READ THE PAPER

As a prequel to the upcoming Workshop entitled: "After Babel: The Quest for Universal Communication", we thought it would be interesting to share our progress on one of our own projects. As you might expect from being associated with such an event, the overarching goal of this research can be broadly stated as: "to allow communication between any agents over any medium". Put even more succinctly, we want to build a 'Universal Translator' (think 'Babel Fish' from The Hitchhiker's Guide to the Galaxy). Aside from letting you appreciate alien poetry, such a tool would, for example, enable you to converse with the organs in your body, and inquire about their current state. This clearly puts a heavy emphasis on the mechanisms that could allow such communication, while temporarily leaving aside other issues, such as common ground or technological requirement (e.g.: antennae, satellites, sensors, or actuators) for the actual transmission.

A Proof of Concept: Talking with Microbes Over a Game of Go

While Adam's babel fish can feed on information and excrete perfectly translated speech, such a miraculous organism has yet to be found. Consequently, before diving head first into such a colossal endeavor, it was necessary to make sure this was not a fool's errand. To build a proof of concept, the problem was "simplified" (so to speak) to unidirectional communication (i.e.: a single emitter and a single receiver). In this scenario, it was assumed that a human would always play the role of the receiver decoding any transmitted information. This left us with the task of designing a process for encoding the data to be sent, as well as defining the communication medium.

Jumping right into it, we chose to use the game of Go as the means to propagate information between sender and receiver. A few reasons have guided this choice. To start with, currently both Chess and Go are the two classical games that have garnered the public's attention. As a result, communities within the domain of machine learning have formed around them, developing and training models whose performances are on par with architectures such as Deep Mind's AlphaZero. Furthermore, prior research by McEwan already investigated the plausibility of using games as a communication medium between human players. Thus, giving us hope that it would generalize to different type of agents. The final decision, between Go and Chess, was guided by our cultural background (Cross Labs is a Japanese company after all), and the expertise of people working on the project.

Moving on to the agent responsible for sending data. As mentioned above, we already assume that a human will be on the receiving end. Though, what could fit in the other seat as their opponent? A neural architecture, or other artificial agent, would have been enough to fill those shoes, and made the whole process easier. However, to really put our ideas to the test we went down the biological route, and elected to use the 'Nukabot'. The Nukabot is best described as an interface developed by Dominique Chen and his colleagues at Waseda University[^nuka_1] [^nuka_2] [^nuka_3], to facilitate interactions between a human caretaker and its Nukadoko (fermented rice bran used in the pickling process of vegetables in Japan).

A picture of the Nukabot at the National Museum of Emerging Science and Innovation. On the right, a view of the nukadoko and its sensory package.

Taking inspiration from McEwan, rather than have the Nukabot's microbes try to convey arbitrary information, it seemed that focusing on personality would make it easier for the human opponent to infer from the Nukabot's play style/strategy. Now, how do you define the personality of a population of microbes? The long answer is by using various sensors to measure their state and translate those using one of the many dimensional models of emotions. The short answer is, using Mehrabian and Russell's Pleasure - Arousal - Dominance (PAD) scale. The PAD scale was originally developed as a model to describe a person's emotional state. In it, an emotion is represented by a single point in a three dimensional space. Through semantic differential analysis, Mehrabian and Russell identified pleasure, arousal and dominance as independent characteristics of all emotions. In this context, pleasure relates to how positive or negative an experience is. Arousal corresponds to the amount of cognitive or physical activity a stimulus elicits in the individual. Finally, dominance correlates to how in control, over the situation or others, an individual feels. The scale was later extended to encompass other affective phenomena such as mood, temperament, and personality.

Although broad sections of the puzzle have already been filled in by this point, the only missing piece is a system allowing the Nukabot's personality to influence the strategy used for playing a game of Go. This system can be further split into two components. The first one is KataGo, an open-source ` of AlphaZero with many improvements, and trained by the community, which will provide an 'optimal' (or as close to it as possible) policy. This saves us the trouble of teaching an agent how to play Go from scratch. This optimal policy is then fed to the second component which influences the selection process, based on the Nukabot's personality, by following a few handcrafted rules:

  • If the level of 'Arousal' is high (close to 1), it means the player is dedicating a lot of cognitive resources to the game. Therefore, they are unlikely to make a mistake. Inversely, an unfocused player will have a low arousal level (close to -1), which makes them prone to blunder. Within the scoring mechanism, this is translated by adding more Gaussian noise to the optimal policy, the less aroused a player is.
  • The concept of 'Dominance' was trickier to express in terms of strategy. However, a hallmark of beginner players is that they tend to follow where their opponent is playing. Expressed in terms of context, a dominant or expert player takes into account the board state in its entirety when deciding what to play next. A beginner or non-dominant player, on the contrary, only considers the local context. To mimic this behaviour, we computed the Euclidean distance between all possible moves from the current board state, and the opponent's last move. This distance measure is then merged with the noisy policy from the previous 'arousal' step. The result is a score assigned to each possible position in the board. For dominant players (close to 1), the score ignores any notion of distance, whereas for non-dominant contestants (close to -1) the it is heavily weighted toward positions close to the opponent's last action.
  • Finally, 'Pleasure' influences how the agent's next action is selected among the sorted list of possible positions. Counter-intuitively perhaps, the higher a player's level of pleasure is the more lax they will be with their opponent, opting instead to experiment and enjoy the game to its fullest. On the contrary, a bored or annoyed individual will try to end the game as quickly as possible, resulting in a strategy which leaves few openings. Consequently, if the pleasure level is high (close to 1) the mechanism will select a position among the top five to ten. Otherwise, for a low pleasure level (close to -1) the action is chosen within the top five.
  • While the agent's personality is stable for the duration of a game, this selection process must be repeated on each turn.

To connect the Nukabot to the whole system, we extracted three measures from the sensory dataset gathered by Chen and his team over a period of months. The concentration of ammonia inside the Nukabot is inversely proportional to its level of pleasure. Since yeast activity is estimated using the amount of ethanol in the mixture, it made sense to link it directly to the level of arousal. Dominance was taken from the Oxidation-Reduction Potential (ORP) of the Nukadoko. In general, a solution with a higher ORP means that it has the potential to oxidize (i.e.: steal electrons) from solutions with lower ORP. Therefore, it perfectly represents this concept of relative dominance between players.It is finally time to put all the pieces together, and play Go against microbes. Who will come out on top? Wait, that's not the point. What follows is an overview of the setup adopted for this experiment. First, level of amonia, ethanol and ORP are measured by the Nukabot. Those values are then transmitted to our system, which translate them into a corresponding personality vector. This vector along with KataGo's optimal policy are fed into the action selection mechanism described above. The result is the next position to play on the Go board. Repeat this enough times, and you have microbes playing Go against humans.

An overview of the experimental setup used for this study.

A very small scale study was conducted, in which participants were asked to play 10 games against the Nukabot, and after each match provide feedback regarding their opponent's personality. Although only five volunteers answered, the experiment produced noteworthy lessons.

Experimental results projected in the PAD space.
The same results as above, but with corresponding personality labels.

Even if the participants where not able to infer the exact personality for the Nukabot (more on that in moment), the answers still occupy similar spatial locations, and some of them even agree with each other (overlapping markers). In addition to personality, we also asked the players to provide free-form general feedback on the Nukabot's strategy. In this case, the response was almost unanimous. The microbes were seen as beginner-level players, whose actions where hard to grasp, and at time seemed even random. Hence, why no one was able to pin-point the Nukabot's personality.

Still, those results gave us enough confidence to move forward with this project. However, one thing was for sure. The handcrafted rules for action selection had to be replaced with a more efficient mechanism.

A Medium and Agent Agnostic Communication Framework

After some deliberation, we landed on the following framework:

A unidirectional overview of the communication process using the suggested framework.

As with any other translator, messages from the sending side need to be converted into tokens that a neural architecture or other machine learning algorithm can easily manipulate. Next, the tokens themselves have to be encoded so that they can be transmitted through the chosen medium. Finally, on the receiving end the whole process is repeated in reverse: decode data into tokens, then translate those tokens into the receiver's target language. It should be noted that, although the figure above seem to imply unidirectional communication, this system has been designed with bidirectional communication in mind. As a matter of fact, if we further simplify the system's overview, it makes it obvious how a discussion might occur.

A unidirectional overview of the communication process using the suggested framework.

At this point, the intermediate steps need some justifying. To start with, rather than encoding the sender's message directly to data that can be transmitted through the medium, we adopted a two-stage process. This has the advantage of decoupling the communication mechanism from the underlying communication channel. Concretely, what this means is that each mode of transmission will be associated with a single auto-encoder converting information from the 'common embedding' to the 'medium embedding', and back, assuming no loss occurs during propagation. Additionally, it also presents us with the opportunity to use the intermediate common embedding as a 'common ground' between the sender's and receiver's languages. Here, common ground is understood as a multi-dimensional space in which shared concepts are represented as vectors (usually referred to as embeddings). Consequently, it would be possible to use off-the-shelf translation tools whenever available, and train the translator and the medium specific auto-encoder separately otherwise. In theory, this decoupling would also make the training and inference processes more efficient.

Where to Next?

This is all well and good you might say, but how can this framework be used for actual communication? This is exactly the question this next step is striving to answer. Wait, let's walk that back just a little, because remember how the introduction mentioned that this was a work in progress. Well, it turns out that we are approaching the edge right here.As a matter of fact, this next part will look very similar to the proof of concept described above. The setting is almost the same (unidirectional communication, human receiver, ...), save for a few details. Rather than using Go as the medium, we switched to Chess. Given the universal popularity of Chess compared to Go, datasets made of games played by individuals of various skill levels are more readily available for training. In addition, we temporarily abandoned the idea of conveying personality through game, opting instead to focus on play style. Doing so not only brings this experiment closer to the study performed by McEwan, but it also allows us to concentrate on designing a better action selection mechanism that does not rely on handcrafted rules.To that end, we have at least two questions to answer:

  • How can we characterize a person's play style, based solely on the games they played?
  • How do we go about transferring a given play style onto an optimal policy?

The keened eyed among you will have noticed that the response to the first question will define the model encoding the sender's message into the 'common embedding' space. While the reply to the second inquiry will influence the design of the system mapping the common embedding space into the 'medium embedding' space. You might also be wondering why we only want to transfer a play style, instead of training a medium specific auto-encoder as mentioned above? This is another point that this step shares in common with the proof of concept. Our intention is to use Leela Chess Zero as a provider of 'optimal' policy to the action selection mechanism, thus not having to teach an agent how to play chess, and saving us precious time.

And this is where we stand at the moment. We have started exploring various solutions to both questions, but have not found any satisfactory answers yet. Therefore, stay tuned for the next part of this adventure.