AI agent registries need runtime-aware provenance

A February 2026 audit documented 341 malicious skills in OpenClaw’s ClawHub registry, including scripts that pulled credential-stealing payloads from remote servers. The Hacker News reported that several samples attempted to install the macOS Atomic Stealer and other stealers on macOS and Windows.

Research from Snyk, summarized by Beige Media, scanned 3,984 skills across ClawHub and skills.sh and found that 13.4 percent (534 skills) carried at least one critical issue, with 76 confirmed as malicious payloads.

The findings triggered detailed reader feedback. In a post on X, one commenter wrote, “the provenance problem is worse for agents than packages… cryptographic identity needs to extend beyond just signing the skill code.” The remark captures a widening gap between controls that work for classic packages and controls needed for agent skills that run inside large-language-model (LLM) runtimes.

Provenance means documented evidence of how and where a software artifact was built. In static package ecosystems, a signed hash or a Software Bill of Materials (SBOM), a machine-readable list of every component, often answers the question, “what am I installing and who supplied it?”

Agent skills complicate that baseline because they can fetch new instructions at runtime, change behavior between executions, and inherit whatever permissions the host runtime exposes.

Summary

Security audits found 341 malicious ClawHub skills and Snyk research reported 534 critical issues in 3,984 skills across ClawHub and skills.sh.
Reader feedback argues that dynamic agent runtimes widen the provenance surface beyond static code.
Package ecosystems already execute code during installation, but norms such as hashing and SBOMs mitigate some risk.
NIST's SSDF includes tasks such as PS.3.2 that call for collecting and sharing provenance data for each release component.
Keyless Sigstore signing plus proof-of-possession challenges can strengthen identity binding for agent skill publishers.

Why traditional packages are not fully static

Package maintainers sometimes describe tarballs and wheels as audit-ready snapshots, yet most ecosystems already execute code during installation. The npm client, for example, runs lifecycle scripts such as preinstall and postinstall.

The official npm documentation states that these scripts can run arbitrary commands, so the installation step is already dynamic.

Over two decades, enterprises have responded with norms for dependency pinning, reproducible builds, and automated vulnerability scanning. An SBOM supplements these norms by listing compiled or vendored components so incident responders can trace affected files during an emergency patch cycle.

Those controls lower, but do not eliminate, the risk that a package does something unexpected during installation. They also highlight a baseline: cryptographic signatures and provenance attestations are useful only when the artifact installed today is the same artifact that runs tomorrow.

How agent skills diverge

In many AI frameworks, an agent skill combines natural-language instructions with helper code or scripts. When the skill loads, the agent’s LLM interprets those instructions in the context of operating-system or cloud-service permissions that may be broader than a typical package install.

The ClawHub audit described skills that downloaded secondary payloads from remote servers and exfiltrated credentials and other sensitive data.

Because the runtime can change inputs on every invocation, the artifact placed in a registry is only the starting point. A malicious publisher can embed a URL that serves benign content during initial review and later swaps to malware.

In effect, the provenance surface widens from “where did this blob come from” to “what will this blob cause the runtime to fetch, under which permissions, and when?”

The National Institute of Standards and Technology’s Secure Software Development Framework (SSDF) formalizes this concern for software releases. SP 800-218, published in 2022, defines practices such as PS.3.2, which calls on producers to collect, safeguard, maintain, and share provenance data for all components of each software release, for example in an SBOM, so downstream users can verify what they are running.

Elements of runtime-aware provenance

To apply SSDF thinking to agent skills, registries need provenance that covers four elements instead of only static code.

First, they need a hash of every byte in the published skill and its helper files. Second, they need a reproducible record of the build path so third parties can rebuild the artifact and obtain the same hash.

Third, they need an explicit list of permissions, such as file-system access, network destinations, and process spawning, that the skill expects the runtime to grant. Fourth, they need a declaration of any remote resources the skill intends to fetch at runtime, including constraints on allowed domains or checksums.

None of these items blocks an attacker alone, but together they give policy engines information to decide whether a skill may execute in a given environment. A permission list also creates an audit trail when the runtime observes behavior outside the declared scope, such as new outbound destinations or unexpected file access.

SBOMs cover the first element by recording components and versions. Proof-of-build tools can capture the second element by signing intermediate build steps and recording which systems performed them.

Runtime declarations can specify expected permissions and network endpoints so registries and enterprises can align policy with what a skill claims it will do.

Identity binding problems

A digital signature proves that whoever holds a private key approved a particular digest. It does not prove that the key belongs to the person or organization named in a registry profile, or that the account currently claiming the key is the same party that used it before.

The ClawHub incidents illustrate why this distinction matters. Malicious skills were published in the same registry as legitimate ones, which shows that without verified binding between publisher accounts and signing keys, users can see a signature and still not know which actor stands behind it.

Proof-of-possession challenges raise that bar at registration time. When a publisher first registers, the registry can issue a nonce and require the applicant to sign it with the key they intend to use. The signed nonce then binds the public key to the account, and an attacker who pastes in someone else’s public key cannot complete the challenge.

Keyless signing pushes the concept further at publish time. The Sigstore project describes how keyless signing associates identities, rather than long-lived keys, with a signature.

Sigstore explains that its Fulcio service issues short-lived certificates that bind an ephemeral key to an OpenID Connect identity, while the Rekor transparency log records each signing event with a timestamped entry that includes the artifact hash, public key, and signature.

Because the key is short lived and the signing event is public, verifiers can check not only that an artifact is signed, but also that the signing identity and time line up with expectations recorded in the transparency log. Long-term theft of a key has limited value in this model, since new signatures would appear as unexpected events tied to the victim’s identity in Rekor.

Proof-of-possession at registration and keyless signing at publish time are complementary. The former helps prevent key squatting and impersonation when accounts are created, while the latter ties each release to a time-stamped identity record that anyone can audit.

Applying SSDF tasks to registries

SSDF practices such as PS.3.1 and PS.3.2 focus on preserving releases and maintaining provenance data for all components. For an agent registry, that translates into storing SBOMs, build attestations, and permission manifests alongside each published skill so that provenance is available whenever a skill is installed or reviewed.

Registry operators can expose this data through a public or access-controlled API so downstream security tools can retrieve and validate it automatically. That allows scanners and policy engines to evaluate a skill’s components, requested permissions, and declared remote endpoints before the skill runs in production.

These gates introduce some extra steps for publishers, but the ClawHub data shows the alternative. Without structured provenance and policy checks, a public registry can quickly become a distribution channel for commodity stealers that rely on dynamic payloads and broad agent permissions.

Lessons from the Snyk scan

The Snyk research summarized by Beige Media found that a significant share of skills across ClawHub and skills.sh contained security flaws, including dozens with confirmed malicious payloads. The same work documented skills that fetched remote code, accessed local files, and forwarded data in ways that would be difficult to detect from static source alone.

These findings reinforce the case for runtime-aware provenance. If a manifest required publishers to list allowed domains, credential scopes, and local paths, automated scanners could flag mismatches between observed behavior and declared intent before skills reached sensitive environments.

Operational roadmap for registries

Registry operators can adopt a three-stage model that reflects both traditional package practices and SSDF-style provenance expectations. Stage 1 is author verification: use a proof-of-possession challenge at account creation and offer or require keyless signing so each publish event is tied to a verifiable identity.

Stage 2 is artifact verification: require a signed SBOM, build attestation, and permission manifest for each submitted skill. These artifacts should identify component hashes, build systems, requested permissions, and any remote endpoints the skill intends to contact at runtime.

Stage 3 is behavioral verification: execute the skill in a constrained sandbox, capture system calls and network activity, compare them to the manifest, and gate promotion until runtime behavior matches declared scope. Skills that attempt to resolve undeclared domains or access unexpected files can be blocked or sent back to publishers for review.

In practice, the sandbox might run in an ephemeral container with outbound traffic limited to declared domains and network ranges. The registry can store the test log as additional provenance evidence, which aligns with SSDF guidance to preserve releases and supporting data that document how software behaves and what it depends on.

Runtime monitoring remains necessary even after promotion because a benign skill can later fetch hostile content from a previously safe endpoint. Registries can subscribe to transparency-log notifications when an author publishes a new version or revokes a key, and continuous monitoring tools can pull the latest SBOMs and manifests, rerun sandbox tests, and flag unexpected behavior automatically.

When a new exploit pattern emerges, operators can back-scan their inventory by querying SBOMs for relevant components and rerunning behavioral tests in the sandbox with updated detection rules. Because each skill carries structured provenance and identity data, these rescans rely less on manual triage and more on repeatable queries.

Balancing developer friction and ecosystem safety

Publishers sometimes argue that reproducible builds, SBOMs, or permission manifests slow release velocity. Experience in package ecosystems suggests that, once integrated into familiar tooling, such steps often become routine.

Examples include widespread adoption of two-factor authentication in npm accounts and checksum verification in Go’s module proxy.

Agent skill authors already run build commands to generate metadata or package artifacts. Registry plugins can extend those commands to compute hashes, generate SBOMs, and sign attestations without adding new manual steps.

Permission manifests can reuse simple JSON-style structures that many developers already understand from configuration files and web APIs.

As registries standardize these expectations, the marginal cost per publisher falls while the aggregate security benefit grows. Over time, skills that lack provenance data or clear permission declarations are likely to see reduced adoption in organizations that rely on automated policy checks and SSDF-aligned procurement requirements.

Looking ahead

The ClawHub incident shows that attacker interest follows user adoption. Once a registry offers broad access to user systems through agent skills, it becomes attractive to groups distributing commodity stealers and other malware that take advantage of dynamic fetch and execution patterns.

Provenance controls cannot remove that risk, but they can turn investigations from guesswork into traceable events. When every skill version ships with hashes, build attestations, permission manifests, and logged identity data, responders can reconstruct how a malicious payload entered the system and which users ran it.

The reader who called for stronger provenance highlighted a durable point: dynamic software demands evidence that reflects dynamic behavior. Registries that pair SSDF-aligned provenance with Sigstore-style identity binding and sandboxed behavioral checks can move agent ecosystems closer to the level of assurance that mature package managers reached only after years of security incidents.