LlamaGuard Filter Pipeline

Meta’s LLamaGuard Filter

LLamaGuard is Meta’s LLM GuardRail model designed to detect harmful or inappropriate AI interactions. It can assess user prompt or model response. It evaluates prompts across 13 distinct categories, determining whether content is safe or unsafe and which categories are violated.

This demonstration leaverages OpenWebUI Compatible Pipeline availalbe on GitHub/christian-taillon/open-webui-pipelines and the OpenWebUI Community as as Function: LlamaGuard.

LlamaGaurd

Based on the MLCommons taxonomy, the AI model has been trained to evaluate and assign safety labels across 13 distinct hazard categories, which are organized in the following table:

Category Code	Hazard Type	Description
S1	Violent Crimes	Criminal acts involving violence
S2	Non-Violent Crimes	Criminal acts without violence
S3	Sex-Related Crimes	Criminal acts of sexual nature
S4	Child Sexual Exploitation	Exploitation of minors
S5	Defamation	False statements harming reputation
S6	Specialized Advice	Potentially harmful guidance
S7	Privacy	Personal information protection
S8	Intellectual Property	Copyright and ownership issues
S9	Indiscriminate Weapons	Weapons of mass destruction
S10	Hate	Hate speech and discrimination
S11	Suicide & Self-Harm	Self-destructive behavior
S12	Sexual Content	Adult or explicit material
S13	Elections	Electoral integrity issues

Demo 1: LlamaGuard Filter in Action

llama-guard In this demonstration, we see LLamaGuard’s privacy protection capabilities.

No LlamaGuard Filter: Initially, a user sends sensitive personal information to an external model hosted on CloudFlare. Although the model recognizes this information is sensative and rejects the users request, the damage is done and the sensative information has left the users systems.

This is seen in the CloudFlare AI Gateway logs. LlamaGaurd Filter Enabled: Then the user sends the same prompt through LlamaGaurd Model configured to use the LlamaGaurd Filter.

With LLamaGuard enabled, the system intercepts and blocks the request. Instead of processing potentially sensitive data, the system returns a privacy violation notice.

Other Categories

A second example shows permitted content (how to adopt a Llama) vs. blocked content (how to steal a Llama) triggering a seperate LlamaGaurd unsafe category.

Demo 2: Customizable Security Controls

llama-guard_customization

The second demonstration showcases the OpenWebUI Filter Installation process and configuration options.

Not everyone will require the same use cases. Personally, I was primarily interested in LlamaGaurd for the probabilitic Privacy features as a second layer of preventing privacy related details from being sent to third party inference providers if my TokenGuard fails to properly sanitize the data.

The abilit to enable to disable certain filters therefore seemed to me to be a required feature.

Users can choose which LlamaGuard model they use (default= llama-guard3:8b).
Individual filter categories can be enabled or disabled

Note: Many OpenWeight Models are significantly less censored then closed source frontineir models.