DecentWeb HTTP Bridge — Gap Analysis and Design
Version: 0.4
Status: Draft
Purpose: Review the current decentweb implementation as a foundation for an
HTTP bridge that any browser can use to read and publish content on the DecentWeb
network. Identify gaps, misalignments, and concrete changes needed to produce a
usable browser-accessible node.
Companion document: decentweb-design.md (protocol goals and layer definitions).
Key external references used in this review:
- BEP-44: Storing arbitrary data in the DHT
- BEP-46: Updating Torrents Via DHT Mutable Items
- libtorrent: DHT store extension notes
- BEP-9: Magnet URI format and metadata exchange
- RFC 8141: Uniform Resource Names (URNs)
- IANA URN Namespaces registry
1. What the Current Implementation Provides
The implementation is a DHT node daemon (dwd) and a CLI tool (dw), written in C
with Bazel. It covers the protocol stack reasonably well at the low level:
| What works | Where |
|---|---|
| Ed25519 keypair generation and storage | dw keygen, src/identity |
btpk magnet address generation and parsing (magnet:?xs=urn:btpk:<64-hex-public-key>) | dw_identity_magnet, dw_identity_parse_magnet |
| BEP-44 mutable item publish and resolve | src/feed, dwd |
| DHT node with routing-table persistence | dwd, libdht |
| Content bundle packing, hashing (SHA1), parsing | src/content/bundle.c |
| Feed manifest packing and parsing | src/content/manifest.c |
| Hash-verified fetch from a content cache | src/content/transport_http.c |
Full write path: dw publish | apps/dw |
Full read path: dw get | apps/dw, dwd get mode |
Unix control socket IPC (RESOLVE, PUBLISH) | dwd |
| Feed liveness: periodic BEP-44 re-announce | dwd re-publish loop |
What the implementation does not provide is any HTTP interface that a browser
can talk to. srv serves raw files out of a docroot by hash — it is the local
content cache half of the bridge — but there is no request handler that takes a
local path such as /0123... (bundle hash) or /abcd... (author public key),
resolves it as needed over the DHT, downloads the manifest and bundle, and returns
sanitized HTML to a browser. There is no feed reader view, no subscription
management, and no HTML sanitization pipeline. The gap between “protocol node plus
local cache” and “HTTP bridge” is that connecting request-handling layer.
2. The Intended Request Flow
A browser request through the bridge should look like this:
| |
The bridge is a local caching proxy between the browser and the DecentWeb network. It fetches and verifies content on behalf of the browser, stores it locally in its docroot, and serves sanitized HTML. The browser only ever talks to the bridge over loopback HTTP; it never touches the DHT or the content network directly.
The bridge should assume local deployment and expose plain HTTP only, not HTTPS.
The browser-visible interface is therefore http://localhost/<id> (or the same
shape on another loopback port if the operator does not bind port 80). The bridge
does not need a magnet: URL as its primary browser entrypoint because path length
already disambiguates the two identifiers it needs to handle:
/<40 hex chars>means “serve this immutable manifest or bundle hash from cache, fetching it if the transport can retrieve it”/<64 hex chars>means “treat this as an author Ed25519 public key, resolve the current mutable feed head via BEP-44, then render the latest bundle”
The canonical shareable author identifier remains the btpk magnet, but the bridge’s
HTTP surface should translate that into a simpler local path form.
Content that has already been fetched is served directly from the local docroot without hitting the network again. The docroot is the bridge’s durable content store: it survives restarts and grows as new content is fetched.
Publishers have no HTTP server requirement in the target model. A publisher runs
dwd to participate in the DHT and push BEP-44 updates. Readers’ bridges discover
and fetch the content. In the current implementation, however, the first bridge that
wants the content still needs some reachable HTTP mirror that already has the bytes,
because content transfer is not yet happening over the BitTorrent wire protocol.
3. Design Goal Alignment
The following table assesses each non-negotiable design goal from decentweb-design.md
against the current state and flags where the bridge must take care.
3.1 Core Properties
| Goal | Current state | Bridge concern |
|---|---|---|
| Zero outbound requests from content | Not enforced. Bundles are written to disk as-is; raw HTML may contain external src/href. | The bridge must sanitize all HTML before serving it to a browser. Without this the goal fails the moment a browser renders a page. |
| No hosting cost for publishers | Met by the model: publishers only need DHT access; the reader’s bridge caches content locally. | The current fetch transport requires an existing HTTP cache to retrieve content from. Until direct DHT/BitTorrent piece transfer is implemented, a bootstrap mirror is still needed. This is an interim implementation gap, not a design flaw. |
| No central index | Met at the protocol layer (DHT). | The bridge’s local SQLite index is per-instance and not shared — this is correct. |
| Content integrity without CAs | Met: each fetch is hash-verified; the feed is Ed25519-signed. | The bridge should surface this to the browser (e.g. a visible “signature verified” indicator on each post). |
| Natural content expiry | Met by the model: DHT liveness and swarm seeder attrition handle expiry. Locally cached content in docroot persists until the operator clears it. | The bridge cache is the operator’s own machine; operator controls retention. This is the right place for the decision. |
| Format simplicity | Bundle packing accepts any file names; no format enforcement at pack or serve time. | The bridge must refuse to serve or must transcode any file outside the allowed set (HTML, CSS, WebP, WebM, WOFF2). |
| No required old-web dependency | Partially violated in the current implementation: dw get requires an HTTP mirror URL to be passed on the CLI. The btpk magnet or author key alone is sufficient to resolve the feed head, but not to retrieve the manifest and bundle bytes. | Direct DHT/BitTorrent download is the goal. The mirror bootstrap is a known interim step, not a permanent dependency. |
3.2 Secondary Goals
| Goal | Current state | Bridge concern |
|---|---|---|
| Author identity portability | Keypair is a local file; no server-side custody. | A bridge with a publishing flow needs encrypted server-side key storage and a key-export mechanism. |
| Reader anonymity by default | No accounts needed to read. | The bridge server itself observes what content the user fetches. This is inherent to the bridge model (the bridge is the swarm peer, not the browser). It must not log reading behaviour unnecessarily, and this limitation should be surfaced in the UI. |
| Graceful degradation | A feed that goes offline simply fails to resolve. No broken-link equivalent yet. | The bridge should show the last-known feed state with a “last fetched at X” indicator rather than a bare error. |
4. Gaps in the Current Implementation
The following are missing from the current implementation entirely and are required for a usable browser-facing bridge.
4.1 No request handler connecting the browser to the DHT
srv serves files from docroot by hash. dwd resolves and publishes BEP-44
items. Nothing connects them in response to a browser request. The bridge needs a
request handler that:
- Accepts either a 64-hex author public key or a 40-hex content hash in the URL path.
- Checks the local docroot cache.
- On a cache miss for an author key, asks
dwdto resolve the feed, then fetches the manifest and bundle, verifies hashes, stores results in docroot. - Runs the content sanitization pass.
- Returns rendered HTML to the browser.
This is the bridge’s core missing component. Everything else builds on top of it.
4.2 Single post per feed
The current manifest schema:
| |
A feed has exactly one content bundle. There is no post history, no title, no
timestamp, no follows list. A feed reader model — a list of posts per feed, unread
counts, a post view for each — cannot be implemented on top of this schema.
This is the most fundamental schema gap. Nothing above the protocol layer works as a feed reader until a feed can hold multiple posts.
4.3 No manifest metadata
The manifest has no author-provided metadata: no feed title, no post titles, no publication timestamps, no description. The bridge would have no content to display in a feed list or post list beyond raw hashes.
4.4 No subscription persistence or background refresh
There is no subscription list and no background polling loop. dw get is a one-shot
command. A browser user has no way to see new content appear without manually running
CLI commands. The bridge needs a background thread that periodically resolves each
followed feed’s BEP-44 item, fetches new bundles, and updates the local index.
4.5 No local index or search
Layer 4 (discovery) is a stub. There is no SQLite database, no FTS5 index, and no subscription store. Search across fetched content and the feed/post list views both require a persistent local store.
4.6 No content sanitization pipeline
Before serving any bundle HTML to a browser, the bridge must:
- Strip all
<script>tags and event handler attributes (on*) - Strip or rewrite all
srcandhrefattributes pointing to remote URLs - Validate CSS against the permitted property set
- Reject or strip file types outside the permitted set (HTML, CSS, WebP, WebM, WOFF2)
None of this exists. dw_bundle_extract writes files to disk unchanged. Serving
raw bundle HTML to a browser would allow any external src attribute in the content
to make outbound requests, directly violating the core design guarantee.
4.7 No key management for the publishing flow
Key management is entirely manual: generate a file, keep it, pass its path on the CLI. For a bridge with a browser-based publish UI, this is not workable. The bridge needs encrypted server-side key storage (a key derived from a user passphrase protecting the Ed25519 seed in SQLite) and a key-export flow so users can back up and migrate their identity.
4.8 No btpk address display or QR code
The current implementation prints the magnet string on stdout. The bridge needs to present the user’s btpk address as a QR code and a copyable link, and to accept a pasted or scanned magnet on a subscribe/discover page. QR generation does not require JavaScript; a server-side generator can emit an inline SVG.
4.9 No multi-mirror fallback in the read path
dw get takes exactly one mirror URL on the command line. The manifest’s mirrors
list is used only as a fallback in a limited sense. The bridge should try each mirror
in the list in order before reporting failure. When the bridge publishes a feed it
should add its own accessible URL to the manifest’s mirrors list so other bridge
instances can retrieve cached content from it.
5. Implementation Choices That Do Not Align With the Design Goals
5.1 btpk encoding and spec status
identity.h (lines 3–8) documents the tension explicitly:
The DecentWeb article says “base32” but also claims BEP-46 compatibility, and BEP-46 uses hex, so we use hex.
That comment is directionally correct and the plan’s earlier text was not.
BEP-46 defines the magnet form as:
| |
So for btpk, the public key is hex in the BEP-46 text, not base32. This aligns
with the current implementation and with the repository’s own comments and tests.
It is useful to separate three different things here:
- The raw key in BEP-44. The mutable item key material is a 32-byte Ed25519
public key (
k) and the DHT target isSHA1(k)orSHA1(k || salt). - The magnet presentation in BEP-46. The magnet wrapper renders that public key
as 64 lowercase hex characters in
xs=urn:btpk:..., with optional hex salt ins=. - Generic magnet precedent in BEP-9. BEP-9’s
btihmagnet format is also hex by default and only mentions base32 as a compatibility form for legacy infohash links. That compatibility note does not redefine BEP-46’sbtpkencoding.
The term btpk appears to mean “BitTorrent public key” in practice. The only public
specification for urn:btpk: found in this review is BEP-46’s “Magnet link” section.
I did not find a separate standalone specification or an IANA-registered URN
namespace for btpk; the current IANA URN namespace registry
does not list it. Under RFC 8141, that
means urn:btpk: should be treated here as a BitTorrent/BEP-defined identifier token
embedded inside magnet URIs, not as an independently standardized URN namespace.
Recommendation: keep hex as the canonical DecentWeb representation for author
keys and btpk magnets unless and until a different upstream spec says otherwise.
The docs should be updated to remove the earlier base32 claim.
5.2 Content fetch requires an existing HTTP cache
The current transport:
| |
This answers the practical “who downloads the bundle?” question:
- Today, the requesting client downloads it. In the current CLI flow,
dw getasksdwdto resolve the author’s mutable feed over the DHT, thendwitself fetches the manifest and bundle over HTTP. In the daemon-only flow,dwd getdoes the same sequence internally. - BEP-44 only stores small mutable and immutable values in the DHT; it does not transport large bundle payloads.
libdhtimplements the DHT side of that design, not the BitTorrent peer wire protocol needed to exchange bundle pieces.
The cache server (srv) is often the bridge’s own docroot server — there is no
publisher hosting burden in the target design. But to do the first fetch of any
content that is not already in any accessible cache, the bridge currently has no
mechanism: it cannot retrieve content directly from the DHT/swarm when no HTTP cache
has it yet. This is the gap that BitTorrent metadata and piece transfer fill. BEP-9
describes magnet-driven metadata transfer; a real direct-download implementation would
also need the BitTorrent peer protocol and piece selection logic, not just libdht.
The transport interface is pluggable (dw_content_fetch_http is one implementation).
Adding a BitTorrent transport behind the same seam is the correct next step and is
the only path to making the “no required old-web dependency” goal fully true.
Implementation note: direct transport is a BitTorrent-client milestone
This should be treated as a distinct implementation project, not as a small extension to the current DHT code.
libdhtand BEP-44 solve naming and update discovery: find the current mutable feed head and verify that it is signed by the expected Ed25519 key.- Direct bundle retrieval needs content transport: peer discovery, metadata acquisition where applicable, peer handshakes, piece requests, piece hashing, retry logic, choking/unchoking behavior, and persistent seeding state.
- BEP-9 is relevant because it defines metadata transfer from magnet-style entrypoints, but it is only one part of the required client behavior.
- A complete implementation therefore needs either an embedded BitTorrent client stack or a new transport module that speaks the peer wire protocol and manages pieces, not just more DHT RPCs.
Until that exists, the bridge should describe its transport honestly as:
DHT for mutable feed resolution, HTTP mirrors for current manifest and bundle transfer.
5.3 SHA1 as the content hash
Bundle and manifest hashes are SHA1 (20 bytes), consistent with BitTorrent’s infohash width and BEP-44’s value field. This is not a correctness problem today, but SHA1 collision attacks exist in practice. A content network whose trust model rests entirely on hash-equality verification is worth moving to a collision-resistant hash (SHA-256, BLAKE3) before the wire format is finalised. This is a breaking change and needs coordination with the spec documents. Truncated SHA-256 (first 20 bytes) fits the BEP-44 value size constraint.
5.4 The Unix control socket protocol is too narrow
The current IPC protocol accepts two commands:
RESOLVE <pubkey-hex>→OK <hash>/NONE/ERRPUBLISH <keyfile> <hash>→OK/ERR
A bridge request handler needs substantially more from dwd:
- Subscribe to a feed (persist it, begin polling)
- Unsubscribe
- List subscriptions with status
- Fetch a specific bundle hash from the network into docroot
- Trigger an immediate feed refresh
- Report DHT ready state, peer count, uptime
The current protocol is a single line in each direction into a fixed buffer — not framed, not versioned, no structured data. It would need to be replaced with newline-delimited JSON or a small HTTP/1.1 API over the Unix socket before a bridge request handler can drive it.
5.5 Maintained-feeds list is not persisted
dwd keeps a linked list of feeds to re-publish (g_feeds) in memory only. A
restart loses it, and any feed that was maintained expires from the DHT roughly two
hours later. A bridge serving real users must survive restarts without silently
expiring its published feeds from the network.
5.6 Whole bundles held in memory
dw_bundle_parse and dw_manifest_parse read the entire payload into memory and
hold it as a bencode tree. For CLI use this is fine. For a bridge serving concurrent
requests, large bundles will exhaust memory. The bridge should enforce a maximum
bundle size, reject manifests where the referenced hash arrives with a
Content-Length exceeding the limit, and stream to disk rather than buffering fully
before verification.
5.7 IPv4-only DHT
dwd creates an IPv4-only UDP socket and skips IPv6 peers. IPv6 DHT participation
(BEP-32) is standard on modern public BitTorrent networks. A bridge node that cannot
participate in the IPv6 DHT is a second-class peer with reduced reach.
6. HTTP Bridge Architecture
Given the current implementation, the bridge adds one new component (dwh, or an
embedded HTTP handler in dwd) between srv/docroot and the browser:
| |
dwh is a new process that owns the browser-facing HTTP server, SQLite, and the
sanitization pipeline. It drives dwd via an extended IPC protocol. dwd remains
the DHT peer and is responsible for fetching content into the shared docroot. srv
continues to serve the docroot by hash — used both by dwd (as a peer-facing
mirror for other bridge instances) and by dwh (as the local cache read path).
7. Prioritised Recommendations
Changes ordered by impact on a browser user’s experience. Items in the same tier can be done in parallel.
Tier 1 — Prerequisites: nothing useful works without these
1a. Extend the manifest schema for multiple posts
The manifest must support a list of posts before a feed reader can exist. Proposed
minimal extension (backward-compatible: a manifest without posts falls back to the
current single-content behaviour):
| |
The content field should be kept for v1 read compatibility and deprecated in v2.
follows enables the trust-graph discovery path (Layer 4).
1b. Decide and commit to a btpk encoding
Pick one canonical representation and change the docs to match it everywhere.
Recommendation: hex, because that is what BEP-46
actually specifies for btpk, and it already matches the implementation.
1c. Add a content sanitization pass
Before serving any HTML to a browser, the bridge must:
- Parse the HTML (a minimal recursive-descent parser over the permitted element set is sufficient — a full DOM is not needed).
- Strip
<script>,<iframe>,<object>,<embed>, remote<link rel="stylesheet">, and allon*attributes. - Rewrite
<a href="...">for external destinations to open in a new context, with a visible “external link” marker. - Strip
src="http..."attributes on<img>,<video>,<source>. - Reject files with extensions outside {
.html,.css,.webp,.webm,.woff2}.
This is the enforcement mechanism for the “zero outbound requests from content” guarantee.
Tier 2 — Core UX: a usable feed reader
2a. Add SQLite for state
The bridge needs at minimum: feeds (pubkey, display title, last-fetched), posts
(feed key, bundle hash, title, timestamp), fts_index (FTS5 virtual table over post
content), and keypairs (for publishing users, encrypted with a passphrase-derived
key).
2b. Background feed refresh loop
A thread in dwh that:
- Reads the subscribed feed list from SQLite.
- On a configurable interval (default 15 minutes), resolves each feed’s BEP-44 item.
- Fetches the manifest and any new bundle hashes not yet in the local post store.
- Updates the SQLite index.
Until this exists, the bridge shows a static snapshot that never updates.
2c. Persist the maintained-feeds list in dwd
dwd should write its maintained-feeds list to dwd.state alongside the routing
table and restore it on startup. A restart must not silently expire published feeds.
2d. Extend the IPC protocol
Replace the current single-line protocol with newline-delimited JSON. New commands the bridge needs:
| Command | Purpose |
|---|---|
SUBSCRIBE <pubkey-hex> | Add feed to polling list |
UNSUBSCRIBE <pubkey-hex> | Remove feed |
LIST_FEEDS | Return subscribed feeds with last-resolve status |
FETCH_BUNDLE <hash> | Fetch a bundle into docroot by hash |
STATUS | Return DHT ready state, peer count, uptime |
Tier 3 — Quality and completeness
3a. Serve the feed reader as clean, no-JS HTML
All bridge pages must be complete server-rendered HTML. No JavaScript dependency for any core function. Minimum views:
- Feed list (sidebar): followed feeds with unread post count
- Post list: most recent posts from selected feed or all feeds
- Post view: sanitized bundle HTML, served inline
- Search results: FTS5 query across indexed post content
- Discover: paste a
btpkmagnet or type a 64-hex public key to subscribe - Publish: compose a post under a managed keypair
- Profile: user’s
btpkaddress as QR code, plus a copyable local bridge URL
3b. btpk address as QR code
A server-side QR generator outputs an inline SVG or WebP. The profile page shows the
user’s canonical btpk magnet as a QR code and as selectable text, and also shows the
bridge-local URL form (http://localhost/<64-hex-pubkey>). The discover page accepts
either form and begins a subscription.
3c. Enforce a bundle size limit
Define a maximum bundle size (suggested: 50 MB) and enforce it at both fetch time
(reject before reading the body if Content-Length exceeds the limit) and pack time
(dw bundle). This prevents memory exhaustion in both dwd and dwh.
3d. Full multi-mirror fallback
When fetching a bundle or manifest, try every mirror in the list before reporting
failure. When the bridge publishes a feed, add its own accessible URL to the
manifest’s mirrors list so other bridge instances can retrieve cached content
from it.
Tier 4 — Correctness and reach
4a. BitTorrent piece transfer transport
The HTTP transport is an interim measure that requires a peer already serving the
content over HTTP. Direct BitTorrent download removes this dependency and completes
the “no required old-web dependency” goal. The pluggable transport seam in
content.h already exists for this purpose.
4b. Add IPv6 DHT support
Pass AF_INET6 alongside AF_INET to the libdht node. Create a second UDP socket
for IPv6 and run both in the select loop. Necessary for full DHT reach.
4c. Re-evaluate SHA1 as the content hash
Before the manifest schema is finalised, consider moving bundle and manifest hashes to SHA-256 or BLAKE3. This is a breaking wire change and needs coordination across the spec documents.
8. What the Bridge Does Not Change
The following constraints from decentweb-design.md are architectural. The bridge
can honour or violate them but cannot alter them:
- Read-time privacy is a bridge-side concern. The bridge is the swarm peer; it knows what the user fetches. This is inherent to the proxy model. The bridge must not log reading behaviour unnecessarily, and this limitation should be visible in the UI.
- Publisher identity is non-transferable. The bridge may manage key material on behalf of a user, but the key remains the canonical identity. Key backup and export are required features, not optional.
- Published is public. Once a bundle hash appears in a signed manifest and reaches the DHT, it cannot be retracted. The publish flow in the bridge UI should make this explicit before submission.
- No first-contact guarantee. When a user subscribes to a btpk address for the first time, the bridge cannot verify who controls it. The UI should show key continuity (first-seen date, unbroken signature chain) as the available trust signal, not a false “verified” indicator.
9. Open Questions
Language stack: resolved. The bridge is C99 throughout. Lua (via LuaJIT) is the templating layer for dynamic HTML responses — Lua server pages handle feed reader views, post rendering, and form responses, while C handles the protocol, DHT, content fetching, sanitization, and SQLite. No other language is in scope.
Single binary vs. two processes. The bridge could be
dwdextended to embed an HTTP server, or a separatedwhthat talks todwdover the socket. Two processes allows independent restarts but adds deployment complexity. Single-process is simpler to deploy but couples the HTTP server to the DHT node’s restart cycle.Multi-user vs. single-user bridge. For a self-hosted instance, single-user simplifies key storage and removes isolation concerns between users. The SQLite schema should accommodate multi-user from the start (a
userstable with FKs intokeypairsandsubscriptions) even if the first implementation only creates one user.Markdown vs. HTML authoring. Recommendation: convert Markdown to HTML at publish time so stored bundles are always canonical HTML. Optionally preserve the Markdown source as
source.mdin the bundle for the bridge’s edit flow.Key backup prompt timing. When should the bridge prompt the user to download their key backup — on first login, on first publish, or both? A user who never backs up and loses access to the bridge loses their publishing identity permanently.
10. Version History
| Version | Date | Notes |
|---|---|---|
| 0.1 | 2026-06-04 | Initial draft. Gap analysis against design doc; bridge architecture; prioritised recommendations. |
| 0.2 | 2026-06-04 | Removed SaaS stack references; corrected transport framing (bridge cache is reader-side; no publisher hosting burden); added explicit request flow diagram; reframed design goal alignment accordingly. |
| 0.3 | 2026-06-04 | Language stack resolved: C99 core, LuaJIT server pages for HTML templating. No C# or ASP.NET. |
| 0.4 | 2026-06-05 | Spec consistency pass against BEP-44, BEP-46, libtorrent DHT-store notes, BEP-9, RFC 8141, and the IANA URN registry. Corrected btpk encoding to hex, clarified that urn:btpk: is BEP-defined rather than an IANA-registered URN namespace, changed the browser interface to local http://localhost/<id> paths, and made the current HTTP-mirror fetch path explicit. |
11. References
- BEP-5: DHT Protocol
- BEP-9: Extension for Peers to Send Metadata Files
- BEP-32: IPv6 extension for DHT
- BEP-44: Storing arbitrary data in the DHT
- BEP-46: Updating Torrents Via DHT Mutable Items
- libtorrent: BitTorrent extension for arbitrary DHT store
- RFC 3986: Uniform Resource Identifier (URI): Generic Syntax
- RFC 8141: Uniform Resource Names (URNs)
- IANA URN Namespaces registry