Why I Treat Chrome in Puppeteer as a Security Boundary, Not a Dependency

If you run Puppeteer against untrusted pages, you are not just automating a browser.

You are exposing a browser to hostile input on your own infrastructure.

And once I started looking at it that way, a lot of “normal” decisions stopped looking normal to me.

Things like:

launching Chrome with --no-sandbox because it is easier in Docker
inheriting whatever browser version happens to be inside a base image
disabling isolation features because a few pages behave badly
assuming “non-root in a container” is good enough

All of that is much riskier than it looks.

I think many teams still treat headless Chrome as if it were just a rendering engine. It is not. It is one of the most exposed and complicated parts of the system. If your service opens attacker-controlled pages, then Chrome is part of your security perimeter.

That is the mental model I would recommend adopting.

The Real Threat Model

A typical browser automation service does something simple:

accepts a URL or raw HTML
opens it in Chrome
waits for it to load
takes a screenshot, PDF, or extracts data

From a product point of view, that sounds harmless.

From a security point of view, it means I am willingly asking my backend to load hostile JavaScript, hostile HTML, hostile WebAssembly, hostile media, hostile frames, and everything else modern websites can throw at a browser.

So the question is not whether browser exploits are relevant.

They are.

The real question is what happens after the browser is compromised.

Does Chrome contain the exploit the way it was designed to?

Or did I quietly remove the exact protections that were supposed to save me?

The Worst Mistake I See: Disabling the Sandbox

The single most dangerous mistake in a Puppeteer deployment is treating these flags as normal:

--no-sandbox
--disable-setuid-sandbox

I want to be very direct here:

I would not ship them in production for a service that renders untrusted pages.

Chrome’s sandbox is not some optional hardening extra. It is one of the main reasons a renderer compromise does not immediately become a system compromise.

The practical difference is straightforward:

with the sandbox enabled, a browser exploit usually still needs another step to escape into the container or host
with --no-sandbox, the attacker is much closer to native code execution in your backend environment

That is not a subtle difference. That is the difference between “the browser got compromised” and “my worker got owned”.

And yes, I know exactly why people use these flags.

Chrome fails to start in containers. Someone finds a workaround. The workaround works. The team moves on.

But that convenience is expensive. In this case, it can erase the main boundary Chrome gives you.

My Rule: If Chrome Only Works Without the Sandbox, the Runtime Is Wrong

This is the principle I came away with:

If Chrome cannot start sandboxed in the environment, I should fix the environment. I should not disable the sandbox and call it done.

That usually means the problem is somewhere in the runtime:

user namespaces
seccomp
container capabilities
AppArmor
Kubernetes settings
Docker host configuration

Not in Chrome.

And the fix is usually operational, not application-level.

“Sandbox Enabled” Is Necessary, But It Is Not the Same as “Safe”

There is another mistake that shows up right after teams re-enable the sandbox:

they assume the problem is solved.

I do not think that is the right conclusion either.

If the Chrome build itself is vulnerable, then the browser may still be exploitable with the sandbox on. The sandbox changes the blast radius. It does not magically patch the browser bug.

So I think the correct requirement is:

keep Chrome sandboxed
keep Chrome patched

Both matter.

If I am running a vulnerable browser build and relying on the sandbox alone, I am still accepting unnecessary risk. The exploit path may be harder. The damage may be smaller. But the bug is still there.

Site Isolation Is Not the Main Boundary, But I Still Would Not Disable It

Another thing I see people disable too casually is site isolation:

IsolateOrigins
site-per-process

To be precise, these are not the same thing as --no-sandbox. Disabling them does not by itself mean “remote page gets shell”.

But I still would not keep them disabled by default in a production browser worker that opens attacker-controlled content.

Why?

Because site isolation is part of Chrome’s blast-radius reduction story.

When it is enabled, different sites are more strongly separated. When it is disabled, a compromised renderer can end up with a broader in-browser reach, more useful memory layout, and more damage potential across origins.

So my view is simple:

--no-sandbox is the catastrophic mistake
disabling site isolation is the unnecessary mistake

One is worse. Both are bad.

The Docker and Kubernetes Layer Is Where Good Intentions Often Go to Die

This is probably the most practical lesson I took away from working through Chrome hardening.

A lot of teams do want to run Chrome securely. They remove the unsafe flags. They update the image. They deploy. And then Chrome does not start.

At that point, the security decision gets replaced by an operational one.

And if the operational answer is “just add --no-sandbox”, all the good intentions disappear.

What I have learned is that containerized Chrome lives or dies by the runtime details:

whether the host allows the namespace operations Chrome needs
whether seccomp blocks them
whether AppArmor gets in the way
whether Kubernetes actually supports the features you think it supports
whether your service launches Chrome with flags that silently conflict with sandboxed startup

That last point is easy to miss.

For example, a minimal smoke test may show that Chrome starts fine, but the real application still fails because it passes a flag like --no-zygote, which is fine in one model and wrong in another.

That is why I do not trust toy checks alone anymore. I want to test the exact launch path the service uses in production.

I Try Very Hard to Avoid `SYS_ADMIN`

One of the most tempting shortcuts in containerized Chrome setups is giving the container SYS_ADMIN.

It often makes things work.

I still do not like it.

SYS_ADMIN is broad. Too broad. If I can avoid it, I will.

What I prefer instead is:

keep Chrome sandboxed
avoid SYS_ADMIN
make the runtime support the sandbox properly

In practice, that can mean:

user namespaces where available
a seccomp profile that allows the namespace syscalls Chrome actually needs
validating the host and cluster behavior instead of assuming it

This is slower than the shortcut, but it is the right tradeoff for a service that renders hostile pages all day.

Browser Versioning Has to Be Explicit

I have become increasingly skeptical of browser versioning by accident.

What I mean by that:

Chrome comes from a base image
the base image uses latest
Puppeteer is pinned somewhere else
nobody is fully sure which browser is actually in production

That setup is fragile operationally and dangerous from a security perspective.

What I want instead is very boring:

an explicit browser version in the image
an explicit Puppeteer version in the app
a deliberate decision that those versions belong together

If Chrome is security-critical, then I do not want it hiding in the background as inherited image state.

I want to own it in the Dockerfile.

There Is an Annoying Detail Most Teams Eventually Hit

Pinned browser packages age out.

This is one of those problems that looks obvious only after it breaks your build.

You pin a specific Chrome Debian package version. Everything is stable. Weeks or months later, you rebuild. The package is gone from the upstream repo. Suddenly your build fails even though your application code did not change at all.

That is not a reason to stop pinning.

It is a reason to treat pinning as an active process.

In practice, I think you need one of these strategies:

refresh pins regularly
mirror the exact packages you depend on
accept that external package history is not permanent

The important part is not being surprised by it.

Base Images Hide More State Than People Think

Another operational lesson I think is worth calling out: custom base images tend to accumulate hidden package-manager state.

If a base image already ships Chrome, it may also already ship:

a Google apt source
a signing key
a package preference
duplicated repo entries

Then later, when you try to “take control” of Chrome versioning in your own Dockerfile, apt starts failing with confusing repository errors.

I have seen this kind of problem waste a lot of time because the first guess is usually wrong. People assume the package version is wrong, but the real issue is conflicting repo configuration inherited from the base image.

My takeaway is simple:

If I decide to own browser installation in the application image, I need to own it completely. That means removing conflicting inherited repo state, not layering more state on top of it.

“It Runs as Non-Root” Is Good, But It Is Not the End of the Discussion

I am strongly in favor of running the browser as a non-root user. That should absolutely be the default.

But I do not think it is enough to declare victory.

If an exploit can still execute arbitrary commands in the container as that non-root user, the worker is still compromised.

And a compromised browser worker can still do a lot:

read secrets available to the process
exfiltrate rendered content
call internal services
abuse network access
persist within the lifespan of the worker
attack adjacent systems

So the question I care about is not only:

“Did I avoid root?”

It is:

“Did I keep attacker-controlled browser execution inside the browser boundary?”

That is the bar.

What I Would Do

If I were setting up or reviewing a Puppeteer service that renders untrusted pages, this is the baseline I would want:

patched Chrome
Puppeteer aligned to the Chrome line in use
no --no-sandbox
no --disable-setuid-sandbox
no extra flags that quietly break sandboxed startup
site isolation left enabled
explicit browser installation in the application image
explicit browser version pinning
non-root runtime user
no unnecessary Linux capabilities
allowPrivilegeEscalation: false where compatible
restrictive seccomp and AppArmor policies
restricted egress for browser workers
short-lived workers where practical
isolated worker pools for browser jobs

If the environment supports stronger isolation, I would also seriously look at:

gVisor
microVM-style isolation
stronger worker segregation by trust level

None of these replace patching Chrome.

They just give me better odds when something still goes wrong.

What I Would Not Do

And this is the shorter list I would keep in front of me:

I would not ship --no-sandbox to production.
I would not keep --disable-setuid-sandbox just because it once fixed startup.
I would not disable site isolation globally as a default compatibility setting.
I would not let Chrome float invisibly through a base image.
I would not assume Docker or Kubernetes “naturally” preserve Chrome’s security model.
I would not stop at “the page rendered” as proof that the deployment is secure.

That last one matters a lot.

A successful render is not a security signal.

It only proves that the browser launched.

The Main Shift in Thinking

The biggest mindset shift for me is this:

I no longer think of Chrome in Puppeteer as a library dependency that happens to run web pages.

I think of it as a security boundary that happens to render web pages.

That change in perspective affects everything:

how I deploy it
how I upgrade it
how I test it
which flags I allow
what I treat as a blocker

Once I started thinking about it that way, a lot of tradeoffs became much clearer.

And honestly, I think that is the real takeaway.

If your service opens untrusted pages, then securing Chrome is not optional infrastructure polish.

It is core product security work.