A new attack surface for LLMs is gaining significant attention in IT circles, and it’s more menacing than it looks at first glance: the innocent smiley face 🙂. To be precise, this is a story about any emoji – or for that matter, any Unicode character – being used to conceal harmful or extensive text.
By encoding arbitrary-length data into single Unicodes via zero-width joiner sequences (ZWJ), attackers make the embedded content invisible to human reviewers, increasing the likelihood of bypassing organizations’ content filtration mechanisms. The fact that this vulnerability is relevant for both visible Unicodes and invisible Unicodes makes it especially disconcerting.
Earlier this year, we explored Unicode abuse against coding assistants, focusing on how invisible characters were being introduced into open-source code repositories to sabotage assistants’ outputs. Because invisible characters like whitespace characters (e.g., spaces, tabs, and newlines), control characters (e.g., carriage return CR and line feed LF), and zero-width characters (e.g., zero-width space U+200B and zero-width non-joiner U+200C) do not appear on text editor display screens, they constitute an understandable attack surface for Unicode abuse.
But it turns out that visible symbols, such as numbers, letters, and emojis, are also vulnerable to Unicode abuse, and as with invisible Unicode characters, visible Unicode characters can lead to attacks.
The most well-known type of attack that Unicode abuse can lead to is Prompt Injection, where an attacker manipulates an LLM through carefully crafted inputs. This includes jailbreaking, a specific category of prompt injection where the goal is to coerce a GenAI application into deviating from its intended behavior and predetermined guidelines. Attackers can embed a hidden prompt injection or jailbreak command within an emoji, which they go on to feed to an LLM.
Unicode abuse can also lead to token expansion attacks.
In language modeling, ‘tokens’ are the smallest unit of text that an LLM can comprehend and process, often as short as a single visible or invisible character. In a token expansion attack, an attacker exploits the process by which certain tokens, once processed by the model, expand into a larger number of tokens. By embedding excessive or malicious content within a single Unicode character, the attacker can cause that single token to expand into large numbers of tokens.
Token expansion can result in unbounded consumption, where an LLM is manipulated to process excessive amounts of information. When unbounded consumption occurs, the model’s computational capacity – already stressed by high computational demand – can become overwhelmed, leading to severe performance issues. This in turn increases the model’s vulnerability to Denial of Service (DoS) attacks, model theft, and other risks.
Both Prompt Injection and unbounded consumption are considered by the Open Worldwide Application Security Project (OWASP) to be among the top ten most critical vulnerabilities found in applications that use LLMs.
Prompt Security employs protection measures that inspect text down to Unicode level in real time. These measures are capable of restricting, blocking, and redacting visible and invisible characters, and are configurable so that organizations can customize protection according to their specific requirements.
To learn more about how Prompt Security can help protect your organization from Unicode abuse and other threats, get in touch today.