Why I Treat Chrome in Puppeteer as a Security Boundary, Not a Dependency
Why I treat Chrome in Puppeteer as a security boundary, not a dependency.
If you run Puppeteer against untrusted pages, you are not just automating a browser.
You are exposing a browser to hostile input on your own infrastructure.
And once I started looking at it that way, a lot of “normal” decisions stopped looking normal to me.
Things like:
- launching Chrome with
--no-sandboxbecause it is easier in Docker - inheriting whatever browser version happens to be inside a base image
- disabling isolation features because a few pages behave badly
- assuming “non-root in a container” is good enough
All of that is much riskier than it looks.
I think many teams still treat headless Chrome as if it were just a rendering engine. It is not. It is one of the most exposed and complicated parts of the system. If your service opens attacker-controlled pages, then Chrome is part of your security perimeter.
That is the mental model I would recommend adopting.
The Real Threat Model
A typical browser automation service does something simple:
- accepts a URL or raw HTML
- opens it in Chrome
- waits for it to load
- takes a screenshot, PDF, or extracts data
From a product point of view, that sounds harmless.
From a security point of view, it means I am willingly asking my backend to load hostile JavaScript, hostile HTML, hostile WebAssembly, hostile media, hostile frames, and everything else modern websites can throw at a browser.
So the question is not whether browser exploits are relevant.
They are.
The real question is what happens after the browser is compromised.
Does Chrome contain the exploit the way it was designed to?
Or did I quietly remove the exact protections that were supposed to save me?
The Worst Mistake I See: Disabling the Sandbox
The single most dangerous mistake in a Puppeteer deployment is treating these flags as normal:
--no-sandbox--disable-setuid-sandbox
I want to be very direct here:
I would not ship them in production for a service that renders untrusted pages.
Chrome’s sandbox is not some optional hardening extra. It is one of the main reasons a renderer compromise does not immediately become a system compromise.
The practical difference is straightforward:
- with the sandbox enabled, a browser exploit usually still needs another step to escape into the container or host
- with
--no-sandbox, the attacker is much closer to native code execution in your backend environment
That is not a subtle difference. That is the difference between “the browser got compromised” and “my worker got owned”.
And yes, I know exactly why people use these flags.
Chrome fails to start in containers. Someone finds a workaround. The workaround works. The team moves on.
But that convenience is expensive. In this case, it can erase the main boundary Chrome gives you.
My Rule: If Chrome Only Works Without the Sandbox, the Runtime Is Wrong
This is the principle I came away with:
If Chrome cannot start sandboxed in the environment, I should fix the environment. I should not disable the sandbox and call it done.
That usually means the problem is somewhere in the runtime:
- user namespaces
- seccomp
- container capabilities
- AppArmor
- Kubernetes settings
- Docker host configuration
Not in Chrome.
And the fix is usually operational, not application-level.
“Sandbox Enabled” Is Necessary, But It Is Not the Same as “Safe”
There is another mistake that shows up right after teams re-enable the sandbox:
they assume the problem is solved.
I do not think that is the right conclusion either.
If the Chrome build itself is vulnerable, then the browser may still be exploitable with the sandbox on. The sandbox changes the blast radius. It does not magically patch the browser bug.
So I think the correct requirement is:
- keep Chrome sandboxed
- keep Chrome patched
Both matter.
If I am running a vulnerable browser build and relying on the sandbox alone, I am still accepting unnecessary risk. The exploit path may be harder. The damage may be smaller. But the bug is still there.
Site Isolation Is Not the Main Boundary, But I Still Would Not Disable It
Another thing I see people disable too casually is site isolation:
IsolateOriginssite-per-process
To be precise, these are not the same thing as --no-sandbox. Disabling them does not by itself mean “remote page gets shell”.
But I still would not keep them disabled by default in a production browser worker that opens attacker-controlled content.
Why?
Because site isolation is part of Chrome’s blast-radius reduction story.
When it is enabled, different sites are more strongly separated. When it is disabled, a compromised renderer can end up with a broader in-browser reach, more useful memory layout, and more damage potential across origins.
So my view is simple:
--no-sandboxis the catastrophic mistake- disabling site isolation is the unnecessary mistake
One is worse. Both are bad.
The Docker and Kubernetes Layer Is Where Good Intentions Often Go to Die
This is probably the most practical lesson I took away from working through Chrome hardening.
A lot of teams do want to run Chrome securely. They remove the unsafe flags. They update the image. They deploy. And then Chrome does not start.
At that point, the security decision gets replaced by an operational one.
And if the operational answer is “just add --no-sandbox”, all the good intentions disappear.
What I have learned is that containerized Chrome lives or dies by the runtime details:
- whether the host allows the namespace operations Chrome needs
- whether seccomp blocks them
- whether AppArmor gets in the way
- whether Kubernetes actually supports the features you think it supports
- whether your service launches Chrome with flags that silently conflict with sandboxed startup
That last point is easy to miss.
For example, a minimal smoke test may show that Chrome starts fine, but the real application still fails because it passes a flag like --no-zygote, which is fine in one model and wrong in another.
That is why I do not trust toy checks alone anymore. I want to test the exact launch path the service uses in production.
I Try Very Hard to Avoid SYS_ADMIN
One of the most tempting shortcuts in containerized Chrome setups is giving the container SYS_ADMIN.
It often makes things work.
I still do not like it.
SYS_ADMIN is broad. Too broad. If I can avoid it, I will.
What I prefer instead is:
- keep Chrome sandboxed
- avoid
SYS_ADMIN - make the runtime support the sandbox properly
In practice, that can mean:
- user namespaces where available
- a seccomp profile that allows the namespace syscalls Chrome actually needs
- validating the host and cluster behavior instead of assuming it
This is slower than the shortcut, but it is the right tradeoff for a service that renders hostile pages all day.
Browser Versioning Has to Be Explicit
I have become increasingly skeptical of browser versioning by accident.
What I mean by that:
- Chrome comes from a base image
- the base image uses
latest - Puppeteer is pinned somewhere else
- nobody is fully sure which browser is actually in production
That setup is fragile operationally and dangerous from a security perspective.
What I want instead is very boring:
- an explicit browser version in the image
- an explicit Puppeteer version in the app
- a deliberate decision that those versions belong together
If Chrome is security-critical, then I do not want it hiding in the background as inherited image state.
I want to own it in the Dockerfile.
There Is an Annoying Detail Most Teams Eventually Hit
Pinned browser packages age out.
This is one of those problems that looks obvious only after it breaks your build.
You pin a specific Chrome Debian package version. Everything is stable. Weeks or months later, you rebuild. The package is gone from the upstream repo. Suddenly your build fails even though your application code did not change at all.
That is not a reason to stop pinning.
It is a reason to treat pinning as an active process.
In practice, I think you need one of these strategies:
- refresh pins regularly
- mirror the exact packages you depend on
- accept that external package history is not permanent
The important part is not being surprised by it.
Base Images Hide More State Than People Think
Another operational lesson I think is worth calling out: custom base images tend to accumulate hidden package-manager state.
If a base image already ships Chrome, it may also already ship:
- a Google apt source
- a signing key
- a package preference
- duplicated repo entries
Then later, when you try to “take control” of Chrome versioning in your own Dockerfile, apt starts failing with confusing repository errors.
I have seen this kind of problem waste a lot of time because the first guess is usually wrong. People assume the package version is wrong, but the real issue is conflicting repo configuration inherited from the base image.
My takeaway is simple:
If I decide to own browser installation in the application image, I need to own it completely. That means removing conflicting inherited repo state, not layering more state on top of it.
“It Runs as Non-Root” Is Good, But It Is Not the End of the Discussion
I am strongly in favor of running the browser as a non-root user. That should absolutely be the default.
But I do not think it is enough to declare victory.
If an exploit can still execute arbitrary commands in the container as that non-root user, the worker is still compromised.
And a compromised browser worker can still do a lot:
- read secrets available to the process
- exfiltrate rendered content
- call internal services
- abuse network access
- persist within the lifespan of the worker
- attack adjacent systems
So the question I care about is not only:
“Did I avoid root?”
It is:
“Did I keep attacker-controlled browser execution inside the browser boundary?”
That is the bar.
What I Would Do
If I were setting up or reviewing a Puppeteer service that renders untrusted pages, this is the baseline I would want:
- patched Chrome
- Puppeteer aligned to the Chrome line in use
- no
--no-sandbox - no
--disable-setuid-sandbox - no extra flags that quietly break sandboxed startup
- site isolation left enabled
- explicit browser installation in the application image
- explicit browser version pinning
- non-root runtime user
- no unnecessary Linux capabilities
allowPrivilegeEscalation: falsewhere compatible- restrictive seccomp and AppArmor policies
- restricted egress for browser workers
- short-lived workers where practical
- isolated worker pools for browser jobs
If the environment supports stronger isolation, I would also seriously look at:
- gVisor
- microVM-style isolation
- stronger worker segregation by trust level
None of these replace patching Chrome.
They just give me better odds when something still goes wrong.
What I Would Not Do
And this is the shorter list I would keep in front of me:
- I would not ship
--no-sandboxto production. - I would not keep
--disable-setuid-sandboxjust because it once fixed startup. - I would not disable site isolation globally as a default compatibility setting.
- I would not let Chrome float invisibly through a base image.
- I would not assume Docker or Kubernetes “naturally” preserve Chrome’s security model.
- I would not stop at “the page rendered” as proof that the deployment is secure.
That last one matters a lot.
A successful render is not a security signal.
It only proves that the browser launched.
The Main Shift in Thinking
The biggest mindset shift for me is this:
I no longer think of Chrome in Puppeteer as a library dependency that happens to run web pages.
I think of it as a security boundary that happens to render web pages.
That change in perspective affects everything:
- how I deploy it
- how I upgrade it
- how I test it
- which flags I allow
- what I treat as a blocker
Once I started thinking about it that way, a lot of tradeoffs became much clearer.
And honestly, I think that is the real takeaway.
If your service opens untrusted pages, then securing Chrome is not optional infrastructure polish.
It is core product security work.