Vectorizing Biological Feedback: Redefining Interaction with Text

Maxim Pletner
8 min readSep 19, 2024

--

The goal of this series of articles is to show how Artificial Intelligence (AI) and neural networks can evolve from high-level logic-driven systems into more emotionally intelligent entities. The next frontier involves leveraging biological feedback — data collected from the human body in real time, including heart rate, eye movement, facial expressions, and posture. By tracking sensory information, we can redefine how AI systems interact with users, transforming physical signals into measurable data that will add a new level of supervised feedback.

In this article, we explore how biological feedback can be quantified, integrated into AI learning models, and used to label information differently, not relying solely on the heavy post-processing every human does by simply hitting 👍 (is it really “like,” “amazing,” “cool,” or 👌)?

Introduction: Beyond the Thumbs-Up

In our digital age, interactions with AI often boil down to simplistic feedback mechanisms — clicking a “like” button or giving a star rating. Does it translate into excitement, curiosity, or mere acknowledgment?

To move beyond this oversimplification, we must delve into the biological signals that underlie our interactions. I will simply put what I have in my mind but definitely neural sciences have more features (and more sensory tracking instruments) — and my goal is to find the simplest and quickest way to implement an idea of a biological feedback.

What Can We Measure?

Advancements in sensor technology have made it possible to track a variety of physiological signals. Below is my proposal of what should be tracked when the user reads the LLM output:

User Resonance Tracking

Zooming Into the Diagram: Resonance Tracking

Let’s break down each section of the diagram to better understand how biological feedback can be tracked and vectorized into AI systems:

Eye Movement

Captured by camera systems, tracking eye movement offers real-time data about where and how long a user looks at various sections of the content. This can be critical in understanding focus and attention when reading or interacting with text:

  • Scanning Speed: Measuring how fast the eyes move across the screen from top to bottom or in other directions. Faster scanning could indicate skimming, while slower scanning may signal deeper engagement or processing.
  • Readout Frequency: Tracks the duration of blinks or periods when the eyes are closed. This metric can provide insights into cognitive load or moments of information processing when the user needs to pause and reflect (meditate?).
  • Attention Focus: By identifying in-screen or out-of-screen “hot spots”, this component reveals what parts of the text or image are capturing the user’s focus. Additionally, if the user looks away or breaks focus frequently, it may signal disengagement or distraction.

Facial Expression Recognition

By capturing facial expressions, AI systems can better understand emotional reactions to specific parts of the text or message. Advanced systems use emotion detection algorithms to categorize expressions like smiles, frowns, or surprise, making them valuable feedback for understanding how well the content resonates emotionally:

  • Emotional Reaction Tagging: This process utilizes advanced algorithms to detect and categorize emotional responses such as: happiness (smile, upward movement of the mouth corners, or “crow’s feet”), boredom ( drooping eyelids, “poker face”), sadness (frowning, downward movement of the mouth corners), frustration (tightened lips, furrowed brows, or gritted teeth), surprise (raised eyebrows, wide eyes, and sometimes an open mouth), etc.
  • Zone Tracking: Focuses on specific areas of the face, such as the lips, eyes, and cheeks. Tracking these zones provides a more detailed understanding of the user’s emotions. For instance, when someone’s cheeks rise while they smile or their eyebrows furrow, it adds nuance to the system’s emotional tagging. The lips and mouth may signal deep thought or concentration. Covering the face, like touching the nose or scratching the chin or ears, can signal discomfort or poor bonding with the text.
  • Uncategorized Fluctuations: refer to facial movements or micro-expressions that are less well-understood but still hold significance. These could include fleeting eyebrow raises, quick side glances, or subtle lip movements that aren’t immediately clear in meaning but could offer important insights over time. Micro-expressions in this case are brief, involuntary facial expressions that reveal true emotions (may last for only a fraction of a second). Over time, AI can learn to recognize and categorize these more ambiguous reactions by creating a personalized library of patterns needed to customize the model for a particular user.

Hands Control

Hand movements and input patterns, whether through a mouse, keyboard, or touchscreen, offer insights into how users physically interact with content. These metrics are useful for identifying engagement levels, as well as frustration or hesitation when navigating content:

  • Mouse Controls: Measures cursor dynamics, scrolling speed, and clicking frequency. For instance, slow or raw mouse movements may indicate uncertainty or disengagement, while rapid, smooth movements suggest familiarity with the content.
  • Keyboard Controls: Tracks typing patterns and the usage of keyboard shortcuts, which can reveal productivity or frustration. Frequent backspacing may suggest difficulty understanding instructions.
  • Hand Gesture Classification: Categorizes common hand movements like shoulder or palm positioning, looking for repeated movements that may indicate anxiety, restlessness, or excitement.

Body Control

This section focuses on overall physical engagement, using sensors in smart monitors, chairs, or wearables. It tracks both internal physiological signals (measured indirectly but precisely) and external body movements:

  • Internal Parameters: Includes heart rate, respiratory rate, and blood pressure — key indicators of emotional and cognitive responses like stress, excitement, or relaxation. Those are measured with smart watches, rings and other IoT devices which could be in-sync with the main processing algorithm.
  • External Positions: Monitors how users move, sit, or stand, detecting posture changes that could signify shifts in focus or discomfort. This is done with camera sensors or other detectors embedded in chairs, tables, etc. (could be once installed and used as the same IoT system).

Sound Control

Captured by camera and microphone systems, sound control focuses on vocal patterns and ambient noises that provide additional context for the user’s reaction:

  • Speech Recognition: Captures direct reactions or comments from the user, allowing AI to understand spoken feedback in addition to written or facial cues.
  • User Sounds: Tracks vocalizations like sighs, laughter, or even ambient sounds in the user’s environment. These sounds can help detect levels of focus or disengagement (e.g., external noise might indicate distraction).

Transforming Feedback into Actionable Data

Incorporating real-time feedback into AI systems allows for a deeper understanding of user interactions beyond simple binary responses. The diagram below illustrates the flow of how measured sensory information can be transformed into actionable data, enhancing AI’s ability to interact with users in a more dynamic and personalized manner:

Redefining LLM-Feedback with Emotional Reactions Measurement
  1. Text Input Interface: The user begins the interaction by entering text into the system, which is then sent as an API call to the LLM in the traditional manner.
  2. LLM Processing: The LLM processes the input and generates a response, which is displayed to the user through the graphically enriched text interface.
  3. Interaction with User: As the user interacts with the generated response, their real-time reactions are monitored using various sensors, such as eye-tracking, heart rate, and facial expressions.
  4. Reaction Measurement: Sensory data is captured and categorized based on response times. Real-time reactions are processed almost instantaneously, while more complex post-processing of reactions may take longer to analyze.
  5. High-Level Responses: In this traditional feedback mechanism, the user is required to read the output to the end (or scroll through it from top to bottom) and choose from preset reactions associated with predefined tags and standard labels.
  6. Storing Measurement Data: The sensory data is stored in a database, creating a collection of “digitized” responses that reflect the user’s emotional and cognitive states during the interaction.
  7. Conversion into Tags: The measured sensory information, representing direct measurements of the user’s physical reactions, is converted into a set of tags. These tags are quantitative (especially for body measurements) or indicate previously recognized patterns. Traditional labeled text reactions are also integrated into this section, though they may take longer to process and appear.
  8. Processing Tags: These tags are then used for further analysis and to label content in future interactions. This allows the LLM to adjust the text tone (utilizing a multi-tone LLM) or modify the interface (such as changing colors, screen size, or even experimenting with ambient music) based on the user’s feedback.

Moving Beyond Simplistic Feedback

Creating emotionally resonant AI requires more than just traditional binary feedback mechanisms. It involves the ability to automatically measure user reactions and trace them back to the model in real-time. Traditional models rely on large datasets for training, but with the integration of real-time feedback, we can dynamically label specific parts of a response as it’s being read. This means that no extensive retraining is needed.

One well-know approach which could be speed-up with this physical user responses tracking is an enhancing of computer-user interaction y experimenting with how the text is displayed and presented. Beyond static text, dynamic elements such as background music could also be incorporated, adjusting to the reader’s mood and making the reading experience more personalized.

Our next step will be to delve deeper into the intricacies of the Resonance Tracking system, exploring the specific types of resonances that can be achieved. It’s not just about fine-tuning the model, but also about fine-tuning the process of interaction with the model to better mirror the user’s state or cause the desired reaction.

We will also explore how these resonating effects can potentially increase the isolation of users within information bubbles — and, more importantly, how they can help navigate and move between these bubbles more safely.

--

--

Maxim Pletner

Engineer-physicist, inventing new architectures and approaches mainly in computer science and electrical engineering for 15 years.