Open Brain: Decoding Human Emotions Through Novel Brain Embedding

Introduction

Open-brain is trained on functional MRI data to predict the neural activation patterns that images elicit across 1,024 distinct brain regions. These predicted responses serve as brain embedding, encoding the emotional reactions each image evokes within a specific cohort.

Features:

Emotion-Based Image Clustering: cluster images according to the emotional responses they elicit.
Emotional Transformation: modify an image's emotional tone by adjusting activations in specific brain regions.
Media Characterization: evaluate any visual content by analyzing the emotions it provokes.

Brain embedding generation

Refer to Figure 1: visual stimuli (1) are first converted into image embeddings (2); these embeddings are paired with their measured brain responses using proprietary MRI scanning protocols (3) to train a supervised model that predicts the brain activation pattern (4) for any new image.

Figure 1: Brain embedding creation process.

The Valence-Arousal plane and the emotional separation in brain-embedding space

The first shared human cognitive ability is emotion, characterized along two orthogonal dimensions: (1) Valence — how positive or negative the feeling is, and (2) Arousal — its intensity or activation level. This framing traces back to Russell (1980), whose circumplex model of affect arranges emotion terms in a circular space defined by these axes. Building on that foundation, Bradley and Lang (1994) introduced the Self‑Assessment Manikin (SAM), a nonverbal pictorial scale in which simple cartoon figures convey gradations of pleasure (happy ↔ unhappy) and arousal (excited ↔ calm). Researchers have since extended the framework to language by compiling normative affective ratings for words. In a landmark effort, Warriner et al. (2013) collected valence and arousal judgments for nearly 14,000 English lemmas, producing a lexicon of average “pleasantness” and “intensity” scores. Figure 2 depicts the density of these words on the valence‑arousal plane, with select emotion‑related terms highlighted.

Figure 2: The density of words in Warriner et al. (2013) on the valence-arousal plane along with some highlighted words which are related to emotions.

In parallel, several standardized image databases have been developed to study emotional responses within the valence-arousal paradigm. The International Affective Picture System (IAPS; Lang et al., 1997) was among the first, offering over 1,100 photographs that span the spectrum from highly pleasant to intensely unpleasant, each accompanied by normative valence and arousal ratings. The Nencki Affective Picture System (NAPS; Marchewka et al., 2014) expanded this repertoire with 1,356 high-resolution images across categories such as people, animals, objects, and landscapes, all rated by hundreds of participants. More recently, the Open Affective Standardized Image Set (OASIS; Kurdi et al., 2017) introduced 900 color images covering a broad array of everyday themes, whose valence and arousal norms were collected via crowdsourcing.

We employed the OASIS dataset to demonstrate open-brain's capabilities. For each image, we generated a predicted brain response vector—termed the “brain embedding”—by processing its ImageBind embedding through our model, yielding a 1,024-dimensional representation. Both the original ImageBind embeddings and the brain embeddings were then projected into two dimensions using tSNE. Each point in the resulting scatterplot was colored according to the image's mean valence and arousal scores from OASIS. Figure 3 juxtaposes the tSNE visualization of brain embeddings with that of the raw image embeddings, revealing that the brain embeddings exhibit a more distinct separation of emotional categories.

Figure 3: Two-dimensional tSNE visualizations for the OASIS image set.

Top Left: brain embeddings (perplexity=5). Top Right: ImageBind embeddings (perplexity=5). Bottom Left: brain embeddings (perplexity=10). Bottom Right: Open CLIP embeddings (perplexity=10). Each point is colored according to its image's mean valence-arousal coordinates from the OASIS norms, illustrating how emotional content clusters more distinctly in the brain-embedding space.

Perceptor

Perceptor analyzes media (videos and images) content using a brain response prediction model (one which is adapted for continuous sttimuli). The tool helps content creators understand potential audience reactions through quantitative metrics.

Features:

Emotional Analysis: Temporal breakdown of emotional responses (sadness, happiness, etc.)
Valence-Arousal Data: Plots showing emotional intensity and sentiment changes
Perception Score: Numerical evaluation of content-persona alignment
Brain Region Mapping: Visualization of activated brain regions
LLM Integration: Text-based interface for content improvement questions
Cognitive Dimension Analysis: Multi-factor evaluation of memory, attention, and decision-making

Perceptor (tutorial)

Transformations in brain-embedding space

Image embedding spaces are learned by computer vision models to represent images in a compact, semantic form. In such spaces, each image is mapped to a point (vector) such that images with similar semantic content lie close together. Recent advances have shown that these embedding spaces not only capture high-level semantics but also support semantic transformations via simple vector operations. In other words, by moving in certain directions in the embedding space, one can meaningfully change attributes or concepts in the image representation. This phenomenon, often called embedding arithmetic, was first popularized in the word embedding domain (e.g., the famous example vector(“King”) – vector(“Man”) + vector(“Woman”) ≈ vector(“Queen”) ). Researchers have since explored analogous behavior in vision and vision-language models. Radford et al. (2015) observed that the latent space of a Generative Adversarial Network (GAN) learned on images supports arithmetic manipulations. In their DCGAN experiments, specific algebraic combinations of latent codes produced meaningful changes in the output image (e.g. arithmetic operations on face latent vectors could change attributes like adding glasses) . Subsequent research has formalized and expanded the discovery of semantic directions in image embeddings. For example, InterFaceGAN (Shen et al., 2020) showed that for pretrained GANs, one can find a linear boundary in latent space for attributes such as gender, age, or smile by training a simple classifier, and then use the normal to that boundary as a direction to manipulate the attribute in generated images . They found that latent spaces are surprisingly well-behaved: moving a latent code in the “gender” direction smoothly transitions a face from male to female, while keeping other aspects intact, up to a point . Similarly, Härkönen et al. (2020) introduced GANSpace, which uses Principal Component Analysis (PCA) on latent vectors to discover major axes of variation . These principal directions were found to correspond to semantic changes like lighting without any supervision.

Our brain responds differently to each visual image. While all visual images increase neural activity in regions dedicated to visual processing, emotionally salient images also engage brain areas associated with emotion regulation and can shift attention toward the emotional content. Furthermore, two images of similar emotional intensity may elicit distinct patterns of neural activation depending on their valence (e.g., positive vs. negative). For example, an image that would cause a feeling of happiness in the viewer's brain, would elicit a different response in specific brain areas from an image that would cause a feeling of sadness. Building on these principles, the BrainTwin™ Embedding Explorer allows users to upload an image, and transform its emotional profile - in a more positive or negative direction. This transformation is performed in the brain embedding space, by offsetting the brain embedding along a direction in a brain embedding space. The transformation vector itself is obtained by subtracting the average brain embeddings of low‑valence images from the average brain embeddings of high‑valence images. This operation eliminates cognitive activity associated with early visual processing, yielding a transformation vector that isolates perceptual emotional valence. Offsetting any brain embedding along this vector modulates perceived valence while leaving lower‑level visual representations largely unchanged. The transformation is depicted in Figure 4.

Unlike purely semantic embedding models such as CLIP or imagebind, which capture conceptual or content-based similarities between images, the transformation in brain embedding space directly alters how a human might perceive the emotional dimension of the image. Our method derives its transformation vector from actual neural responses (fMRI), thus reflecting not only the semantic aspects of the image but also the affective processes at play in the human brain. By harnessing this neural-level information, the BrainTwin™ Embedding Explorer contributes something new: it provides a means to shift an image's perceived emotional quality, rather than merely rearranging its semantic attributes.

Figure 4: Transformation in brain-embedding space. (2) brain-embedding is calculated from image-embedding using Open Brain. (3a), (3c) the brain response is modified in certain areas using a pre-calculated transformation vector. (5a), (5b) query for similar images in brain-embedding space.

BrainTwin™ Embedding Explorer

The BrainTwin™ Embedding Explorer app allows you to freely explore how images map onto brain embeddings. To get started, choose a LanceDB table (the default uses the OASIS dataset, as shown in the first screenshot). Next, upload an image and transform the emotion portrayed in the image by using the emotions slider, then click the “Modify Emotion” button. This transforms the brain embedding accordingly and displays five images from the OASIS dataset that share the most similar brain embedding (see second screenshot). Finally, delve deeper into the brain response by viewing which regions are most responsive and learning about their cognitive roles in perception (see third screenshot). These results can also be downloaded as a CSV file for further analysis.

BrainTwin™ Embedding Explorer (tutorial)

References

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161-1178. Link
Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. Journal of Behavior Therapy and Experimental Psychiatry.
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1997). International Affective Picture System (IAPS): Technical manual and affective ratings. NIMH Center for the Study of Emotion and Attention, 1(39-58), 3.
Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207. Link 1, Link 2
Marchewka, A., Żurawski, Ł., Jednoróg, K., & Grabowska, A. (2014). The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database. Behavior Research Methods, 46, 596-610.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434.
Kurdi, B., Lozano, S., & Banaji, M. R. (2017). Introducing the Open Affective Standardized Image Set (OASIS). Behavior Research Methods, 49(2), 457-470. Link
Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020). Interpreting the latent space of GANs for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9243–9252).
Härkönen, E., Hertzmann, A., Lehtinen, J., & Paris, S. (2020). Ganspace: Discovering interpretable GAN controls. In Advances in Neural Information Processing Systems, 33, 9841–9850.