Meta’s LLamaGuard Filter
LLamaGuard is Meta’s LLM GuardRail model designed to detect harmful or inappropriate AI interactions. It can assess user prompt or model response. It evaluates prompts across 13 distinct categories, determining whether content is safe or unsafe and which categories are violated.
This demonstration leaverages OpenWebUI Compatible Pipeline availalbe on GitHub/christian-taillon/open-webui-pipelines and the OpenWebUI Community as as Function: LlamaGuard.
LlamaGaurd
Based on the MLCommons taxonomy, the AI model has been trained to evaluate and assign safety labels across 13 distinct hazard categories, which are organized in the following table:
| Category Code | Hazard Type | Description |
|---|---|---|
| S1 | Violent Crimes | Criminal acts involving violence |
| S2 | Non-Violent Crimes | Criminal acts without violence |
| S3 | Sex-Related Crimes | Criminal acts of sexual nature |
| S4 | Child Sexual Exploitation | Exploitation of minors |
| S5 | Defamation | False statements harming reputation |
| S6 | Specialized Advice | Potentially harmful guidance |
| S7 | Privacy | Personal information protection |
| S8 | Intellectual Property | Copyright and ownership issues |
| S9 | Indiscriminate Weapons | Weapons of mass destruction |
| S10 | Hate | Hate speech and discrimination |
| S11 | Suicide & Self-Harm | Self-destructive behavior |
| S12 | Sexual Content | Adult or explicit material |
| S13 | Elections | Electoral integrity issues |
Demo 1: LlamaGuard Filter in Action
In this demonstration, we see LLamaGuard’s privacy protection capabilities.
No LlamaGuard Filter: Initially, a user sends sensitive personal information to an external model hosted on CloudFlare. Although the model recognizes this information is sensative and rejects the users request, the damage is done and the sensative information has left the users systems.
This is seen in the CloudFlare AI Gateway logs. LlamaGaurd Filter Enabled: Then the user sends the same prompt through LlamaGaurd Model configured to use the LlamaGaurd Filter.
With LLamaGuard enabled, the system intercepts and blocks the request. Instead of processing potentially sensitive data, the system returns a privacy violation notice.
Other Categories
A second example shows permitted content (how to adopt a Llama) vs. blocked content (how to steal a Llama) triggering a seperate LlamaGaurd unsafe category.
Demo 2: Customizable Security Controls

The second demonstration showcases the OpenWebUI Filter Installation process and configuration options.
Not everyone will require the same use cases. Personally, I was primarily interested in LlamaGaurd for the probabilitic Privacy features as a second layer of preventing privacy related details from being sent to third party inference providers if my TokenGuard fails to properly sanitize the data.
The abilit to enable to disable certain filters therefore seemed to me to be a required feature.
- Users can choose which LlamaGuard model they use (default= llama-guard3:8b).
- Individual filter categories can be enabled or disabled
Note: Many OpenWeight Models are significantly less censored then closed source frontineir models.