# deny.sh — Honey Mode — Byte-Exact Cross-SDK Specification (v1)

_Spec version: `deny-sh/honey/v1`. Authored 2026-05-31 (Phase 2). Reference implementation: TypeScript SDK (`src/`). This document is the **normative cross-language contract**. Rust, Python, and Go ports MUST reproduce every byte of every step below, or honey decrypt diverges across languages and the deniability illusion breaks (a record honeyed in one SDK would decrypt to a different fake in another)._

> Scope: this spec covers ONLY the Honey Mode determinism surface — the honey seed derivation, the seeded DRBG, the uniform-integer rejection rule, and the per-type generator draw order. The underlying envelope (Argon2id KDF, AES-256-CTR, XOR control-data, 4-byte LE length prefix, bucket bands) is specified by `src/core.ts` and is unchanged by Honey Mode; it is summarised here only where the honey path reads from it.

---

## 0. Conformance test = shared KAT vectors

A port is conformant iff, for every vector in `server/decoy-engine/kat/honey-kat.json` (Phase 2 deliverable), it produces the **exact** honey string for the given `(type, decryptBytes, salt, realLengthHint)`. Vectors are generated from the TS reference. No port ships until it is green against the full vector set. The vector file pins:

- `deriveHoneySeed` output (32-byte hex) for sample `(decryptBytes, salt, typeTag)` triples.
- `SeededByteSource` keystream (first 96 bytes hex) for sample seeds.
- `sourcedInt(src, max)` sequences for sample `(seed, max)`.
- `generateHoneyDecoy` final string for every v1-eligible type at its canonical length.

---

## 1. Envelope constants (read-only context, from `core.ts`)

| Constant | Value |
|---|---|
| `SALT_LENGTH` | 32 |
| `IV_LENGTH` | 16 |
| `HEADER_LENGTH` | 48 (`salt ‖ iv`, unencrypted) |
| `KEY_LENGTH` | 32 (AES-256) |
| `ALGORITHM` | `aes-256-ctr` |
| `LENGTH_PREFIX` | 4 (LE uint32 inside the encrypted zone) |
| Argon2id | `t=3`, `m=65536 KiB`, `p=1`, `hashLength=32`, type=Argon2id, output=binary |
| KDF password input | `Argon2id(SHA256(p1) ‖ SHA256(p2), salt)` |
| `BUCKET_BANDS` | `[64, 256, 1024, 4096, 16384]`, then `ceil(n/16384)*16384` above top |

`decryptBytes` (the honey seed input) = the recovered inner payload **before** `extractPayload` trimming, i.e. `XOR(AES-256-CTR-decrypt(encryptedData, key, iv), controlData[0:len])`. For a wrong password this is uniform-pseudorandom. It is exactly what `core.decryptToPayload` returns as `payload`.

---

## 2. Honey seed derivation — `deriveHoneySeed`

```
HONEY_DOMAIN = "deny-sh/honey/v1"   (UTF-8, 16 bytes, no NUL terminator)

seed = SHA256(
          HONEY_DOMAIN_bytes
       ‖  0x00
       ‖  decryptBytes
       ‖  0x00
       ‖  salt
       ‖  0x00
       ‖  typeTag_bytes
       )
```

- `‖` is byte concatenation. The three `0x00` bytes are literal single-byte domain separators.
- `typeTag` = the `DecoyType` string (UTF-8), e.g. `"stripe-live-key"`, `"bip39-phrase"`. Exact strings in §7.
- `decryptBytes` and `salt` are raw bytes (32-byte salt; `decryptBytes` length = bucket band).
- Output is the full 32-byte SHA-256 digest. **Do not truncate.**

Seed hygiene (security invariant, not just determinism): the seed depends ONLY on `{decryptBytes, salt, typeTag}`. `decryptBytes` for a wrong password is independent of the real plaintext, so the same wrong password always lands the same fake (no "answer changes on retry" honeypot tell) while leaking nothing about the real secret. Ports MUST NOT mix in any other material.

---

## 3. Seeded DRBG — `SeededByteSource`

A counter-mode SHA-256 hash DRBG. Blocks are 32 bytes, consumed left-to-right, never seeked back.

```
block_i = SHA256( seed ‖ be32(i) )          for i = 0, 1, 2, ...
keystream = block_0 ‖ block_1 ‖ block_2 ‖ ...
```

- `seed` = the 32-byte output of §2 (copied defensively; immutable for the lifetime of the source).
- `be32(i)` = the 4-byte **big-endian** encoding of the unsigned 32-bit counter `i`. (Big-endian here is the cross-language contract; do not confuse with the little-endian 4-byte reads in §4.)
- Counter starts at `0`, increments by `1` after each block is produced, wraps mod 2^32 (`(i+1) >>> 0`). Wrap is unreachable for honey-sized draws but pinned for exactness.

`bytes(n)` returns the next `n` keystream bytes:
- `n <= 0` → empty.
- Maintain a buffer + buffer-position. When the buffer is exhausted, produce the next `block_i` and reset position to 0.
- Copy `min(remaining_requested, remaining_in_buffer)` at a time until `n` bytes are delivered.
- **Critical**: a single `bytes(n)` call may span multiple blocks; partial-block state carries forward to the next `bytes()` call. The keystream is one continuous stream, NOT one block per call.

### `RandomByteSource` (non-honey path — informational)
The curated-decoy flow uses a `randomBytes`-backed source. It is NOT part of the honey determinism contract. Ports only need `SeededByteSource` for honey.

---

## 4. Uniform integer — `sourcedInt(src, max)`

Rejection sampling over 4-byte **little-endian** reads. This MUST match the legacy `randInt` rejection rule exactly.

```
function sourcedInt(src, max):
    if max <= 0: return 0
    limit = floor(0x1_0000_0000 / max) * max     # largest multiple of max <= 2^32
    repeat up to 128 times:
        b = src.bytes(4)                          # 4 bytes from the DRBG stream
        v = b[0] | (b[1]<<8) | (b[2]<<16) | (b[3]<<24)   # unsigned LE uint32
        if v < limit: return v % max
    throw "sourcedInt: rejection sampling exceeded bound"
```

- **Draw unit is always 4 bytes per attempt**, regardless of `max`. On rejection, 4 more bytes are consumed (the rejected bytes are NOT reused).
- `v` is an **unsigned** 32-bit value (LE). In languages without `>>> 0`, mask to `u32`.
- The 128-attempt cap is part of the contract (rejection prob < 0.5/attempt, so the cap is unreachable in practice; it converts a broken source into a throw rather than an infinite loop).

This is the ONLY integer primitive the generators use for index selection. Every `chars(alphabet, len)` draw is `len` independent `sourcedInt(src, alphabet.length)` calls.

---

## 5. Generator primitives (seeded path)

When Honey Mode runs, the ambient byte source is a `SeededByteSource`. The generators draw through two entry points; both pull from the same single DRBG stream **in call order**:

### 5a. `randInt(max)` → `sourcedInt(AMBIENT_SOURCE, max)`
(See §4.) Used for every character index and every small integer choice.

### 5b. `sourceBytes(n)` → `AMBIENT_SOURCE.bytes(n)`
Direct keystream bytes. Used ONLY by the three "real-crypto-material" generators: bip39 entropy, bitcoin WIF payload, solana 64-byte key. These bypass `sourcedInt` and read raw bytes straight from §3.

### 5c. Character helpers (exact semantics)
```
chars(alphabet, len):           # alphabet is a fixed string (see §6 tables)
    out = ""
    for i in 0..len-1:
        out += alphabet[ sourcedInt(src, len(alphabet)) ]
    return out

digits(len)  = chars("0123456789", len)
```

`chars` draws **left to right, one `sourcedInt` per character**. Draw order across a whole generator is exactly the source-code statement order in §7. Any reordering changes the output.

### 5d. Fixed alphabets (byte-exact — copy verbatim)
```
ALNUM        = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"   (62)
ALNUM_UPPER  = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"                             (36)
BASE64URL    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-" (64)
HEX          = "0123456789abcdef"                                                 (16)
BASE58       = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"       (58)
PRINTABLE    = chars 0x20..0x7E inclusive                                         (95)
NI_FIRST     = "ABCEGHJKLMNOPRSTWXYZ"                                             (20)
NI_SECOND    = "ABCEGHJKLMNPRSTWXYZ"                                              (19)
```
`BASE58` ordering MUST match the Base58Check alphabet (`base58Encode`/`base58CheckEncode` in `validators.ts`). bip39 wordlist = the canonical 2048-word English list; index = the 11-bit value per §7-bip39.

### 5e. Length helpers
```
boundedLen(realLen, prefixLen, minBody, fixedBody?):
    if fixedBody given:
        assert prefixLen + fixedBody <= realLen   # else throw "exceeds real value length"
        return fixedBody
    bodyLen = max(minBody, realLen - prefixLen)
    assert prefixLen + bodyLen <= realLen
    return bodyLen

token(prefix, realLen, alphabet=ALNUM, minBody=1, fixedBody?):
    return prefix + chars(alphabet, boundedLen(realLen, len(prefix), minBody, fixedBody))
```
`realLen` for the honey path = `realLengthHint` (the `dummyReal` length, see §8), NOT the real secret's length. Determinism therefore depends on the **dummyReal length per type** (§8) being identical across SDKs.

---

## 6. Honey entry point — `generateHoneyDecoy`

```
generateHoneyDecoy(type, decryptBytes, salt, realLengthHint?):
    assert isHoneyEligible(type)             # refuse "generic", "freeform-secret"
    lenHint   = realLengthHint ?? defaultLengthForType(type)     # §8 table
    dummyReal = dummyRealForType(type, lenHint)                  # §8 rules
    seed      = deriveHoneySeed(decryptBytes, salt, type)        # §2
    with AMBIENT_SOURCE = SeededByteSource(seed):                # §3
        return generateLocalDecoy(dummyReal, type)               # §7 draw order
```

`isHoneyEligible(type)` = `type ∉ {"generic", "freeform-secret"}`.

The ambient-source swap is scoped to one synchronous `generateLocalDecoy` call. Ports without a mutable ambient may instead thread the `SeededByteSource` explicitly into the generator; the **draw order and primitives must be identical**.

---

## 7. Per-type generator draw order (v1 launch subset)

`realLen` below = `dummyReal.length` from §8. Each line is the exact sequence of draws. Where a generator inspects `realValue` prefix (`startsWith`), it inspects `dummyReal` (§8 makes those deterministic).

| `DecoyType` (typeTag) | Construction (draws in order) |
|---|---|
| `stripe-test-key`   | `token("sk_test_", realLen, ALNUM, minBody=24)` |
| `stripe-live-key`   | `token("sk_live_", realLen, ALNUM, minBody=24)` |
| `github-pat-classic`| `token("ghp_", realLen, ALNUM, minBody=36, fixedBody=36)` → `chars(ALNUM,36)` |
| `github-pat-fine`   | `token("github_pat_", realLen, ALNUM+"_", minBody=60)` |
| `openai-key`        | prefix = `dummyReal.startsWith("sk-proj-") ? "sk-proj-" : "sk-"`; `token(prefix, realLen, BASE64URL, minBody=40)` |
| `anthropic-key`     | prefix = `dummyReal.startsWith("sk-ant-api03-") ? "sk-ant-api03-" : "sk-ant-"`; `token(prefix, realLen, BASE64URL, minBody=80)` |
| `aws-access-key`    | prefix = `dummyReal.startsWith("ASIA") ? "ASIA" : "AKIA"`; then `chars(ALNUM_UPPER, boundedLen(realLen,4,16,16))` = `chars(ALNUM_UPPER,16)` |
| `bip39-phrase`      | `randomWords(wordCount, budget=realLen)` — see §7-bip39 |
| `ethereum-private-key` | `dummyReal.startsWith("0x") ? token("0x", realLen, HEX, 64, 64) : chars(HEX, boundedLen(realLen,0,64,64))` |
| `bitcoin-wif`       | `randomBitcoinWif(realLen)` — see §7-wif |
| `solana-private-key`| loop ≤16×: `enc = base58Encode(sourceBytes(64))`; return first `enc` with `87 ≤ len(enc) ≤ realLen` — see §7-solana |

### §7-bip39 — `randomWords(count, budget)`
`count` = `dummyReal.trim().split(/\s+/).filter(non-empty).length`. For the canonical dummy (§8) this is **12**.
```
if count in {12,15,18,21,24}:
    entBytes = (count*11 - (count*11)/33) / 8     # 12→16, 15→20, 18→24, 21→28, 24→32
    repeat up to 64 times:
        entropy = sourceBytes(entBytes)            # raw DRBG bytes, §5b
        phrase  = bip39FromEntropy(entropy, count) # entropy ‖ SHA256(entropy)[:count*11/33 bits], 11-bit word indices
        if len(phrase) <= budget: return phrase
    throw "exceeds real value length"
# (non-standard counts use a length-filtered word pool via sourcedInt; NOT in v1 subset — dummyReal is always 12 words)
```
`bip39FromEntropy`: `bits = bin(entropy) ‖ bin(SHA256(entropy))[0:csBits]` where `csBits = entBits/32`; word `i` = `BIP39_WORDS[ int(bits[i*11 : i*11+11], 2) ]`. Big-endian bit order within each byte. Output = words joined by single space.

### §7-wif — `randomBitcoinWif(realLen)`
```
assert realLen >= 51
compressed = (realLen >= 52)
payload = bytes(compressed ? 34 : 33)
payload[0] = 0x80
payload[1..33] = sourceBytes(32)                  # raw DRBG bytes, §5b
if compressed: payload[33] = 0x01
wif = base58CheckEncode(payload)                  # payload ‖ doubleSHA256(payload)[0:4], Base58
if len(wif) <= realLen: return wif
else: repeat up to 16×: payload[1..33] = sourceBytes(32); retry = base58CheckEncode(payload); if len<=realLen return
      throw
```
`base58CheckEncode(payload)` = `base58Encode(payload ‖ SHA256(SHA256(payload))[0:4])`. `base58Encode` = the standard big-integer Base58 with leading-zero-byte → leading `'1'` handling (algorithm in `validators.ts`).

### §7-solana
```
repeat up to 16 times:
    enc = base58Encode(sourceBytes(64))           # 64 raw DRBG bytes → Base58
    if 87 <= len(enc) <= realLen: return enc
throw "exceeds real value length"
```

> **v1 subset only.** The remaining ~27 structured types (`jwt-token`, `iban`, `credit-card`, `private-key-pem`, URIs, slack/discord tokens, the long-tail provider PATs, NHS/SSN/NI/phone) extend trivially once the v1 pattern is audited green across all four SDKs. Their draw orders live in `src/record-decoy-generators.ts` and will be appended to this table in a v1.1 spec pass. Ports SHOULD stub them to throw `unsupported honey type (post-v1)` until then so a divergence can't ship silently.

---

## 8. `dummyReal` + `defaultLengthForType` (determinism inputs)

Honey generation feeds the generator a synthetic `dummyReal` (NOT the real secret) so the generator's length/prefix logic is deterministic and real-secret-independent. Ports MUST reproduce both tables.

`dummyRealForType(type, lenHint)`:
- `bip39-phrase` → `"abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about"` (12 words, fixed; gives `count=12`).
- (other v1 types) → `"x".repeat(lenHint)` (the `default` branch). None of the v1 prefix-checks (`sk-proj-`, `sk-ant-api03-`, `ASIA`, `0x`) match `"xxx…"`, so v1 prefixes resolve to the **else** branch deterministically.
- (jwt/credit-card/iban/uris/phone have bespoke dummyReal builders — post-v1.)

`defaultLengthForType` (v1 subset):
| type | len | type | len |
|---|---|---|---|
| stripe-test-key | 32 | anthropic-key | 108 |
| stripe-live-key | 107 | aws-access-key | 20 |
| github-pat-classic | 40 | bip39-phrase | 200 |
| github-pat-fine | 93 | ethereum-private-key | 64 |
| openai-key | 51 | bitcoin-wif | 51 |
| | | solana-private-key | 88 |

(Full 40-type table in `record-decoy-generators.ts::defaultLengthForType`.)

---

## 9. Decrypt branch (Approach A) — host-side contract

Honey decrypt does NOT change the envelope. It adds a verdict + fallback around the existing decrypt:

```
decryptHoney(ciphertext, controlData, {p1,p2}, honeyType, band):
    assert isHoneyEligible(honeyType)
    {payload, salt, wellFormed, plaintext} = core.decryptToPayload(ciphertext, {p1,p2,controlData}, expectedBand=band)
    if wellFormed:  return { value: utf8(plaintext), branch: "real" }     # real or decoy slot
    else:           fake = generateHoneyDecoy(honeyType, decryptBytes=payload, salt)
                    return { value: fake, branch: "honey" }
```

`isWellFormedFrame(payload, expectedBand)`:
```
if len(payload) < 4: return false
length = LE_uint32(payload[0:4])
fitsPayload = (length <= len(payload) - 4)
if expectedBand is None: return fitsPayload
fitsBand = (length <= expectedBand - 4)
return fitsPayload and fitsBand
```
- `band` is stored in record metadata (the bucket the payload was padded to at `encryptHoney` time = `bucketedPayloadLength(secretLen + 4)`).
- Accidental-well-formed-frame probability for a wrong key ≈ `(band - 3) / 2^32` (≈ 1.4e-8 for the 64-byte band, ≈ 6e-8 for 256). Below any practical concern; pinned so ports test the same window.
- **Constant shape**: both branches return a `{value: string, branch}` of identical external shape. `branch` is internal telemetry/test only and MUST NOT be surfaced to an end user or attacker. No timing/length/format divergence between branches.

`encryptHoney` always sets `padToBucket: true` so `band` is well-defined and the decrypt-side band check has a known target.

---

## 10. Port acceptance checklist (per language)

A port (Rust/Python/Go) is done when ALL hold:
1. `deriveHoneySeed` matches TS hex for every KAT triple.
2. `SeededByteSource` keystream (≥96 bytes) matches TS for every KAT seed.
3. `sourcedInt` sequence matches TS for every KAT `(seed, max)`.
4. `generateHoneyDecoy` final string matches TS for every v1-eligible type × ≥3 distinct seeds.
5. Each honey output passes its own type validator (Luhn/IBAN-mod97/bip39-checksum/Base58Check/etc.) — i.e. the fake is itself plausible.
6. Stability: same `(type, decryptBytes, salt)` → identical output on repeat (trivially true if 1–4 hold).
7. Distinctness: different `decryptBytes` → different output (statistical; KAT includes a near-collision pair).
8. Ineligible refusal: `generic`/`freeform-secret` throw the documented error.
9. `isWellFormedFrame` + band-consistency matches TS verdicts on the KAT frame set (well-formed, off-band, short).
10. No real-secret material reachable in any honey-path input (code review; seed inputs are exactly `{decryptBytes, salt, typeTag}`).

Mirror the TS test obligations in `src/test/honey-decrypt.test.ts` (real/decoy/honey branches, stability, distinctness, ineligible refusal, no-length-oracle, accidental-frame-rate, coarse timing-overlap).

---

## 11. Versioning

Any change to §2 (seed layout), §3 (DRBG), §4 (rejection rule), or §7 (draw order) is a **breaking** change to the honey contract and MUST bump `HONEY_DOMAIN` to `deny-sh/honey/v2` (and the spec version), because a v1-honeyed record would otherwise decrypt to a different fake under v2 code. The envelope version byte is separate (owned by `core.ts`); the honey domain string is the honey-determinism version. Records persist `honeyType` + `band`; a future format may also persist the honey domain version for forward-compat reads.
