← Back to home

Benchmarks & reproducibility

Every number quoted on the landing page comes from running Substrat against one of the pcaps below. This page lets you verify each claim — download the pcap, run the exact command, compare output.

Setup: download the Substrat binary for your platform, then make it executable (chmod +x substrat-*). On macOS, also run xattr -d com.apple.quarantine substrat-macos-arm64. The numbers below were measured on a MacBook (Apple Silicon, M-series).

Summary

Pcap	Size	Packets	Mode	Coverage	Anomalies	CPU
modbus_synth.pcap	7.0 KB	85	binary (positional)	94%	5/5 injected	0.034s
tlv_synth.pcap (v1.2.2)	5.9 KB	100	binary TLV	99%	5/5 injected	0.04s
nested_tlv_synth.pcap (new, v1.2.3)	6.0 KB	95	binary TLV nested	100%	5/5 (incl. 2 inner-recursive)	0.04s
dns.pcap	4.3 KB	38 (3 flows)	binary	94–100%	7 structural	0.017s
http_synth.pcap	8.8 KB	60 (2 flows)	text	template	—	0.18s
telnet.pcap	24.3 KB	117	text	template	—	0.30s
tftp.pcap	9.2 KB	23	text	FTP tokens (CRLF)	—	0.03s

All binary-mode analyses complete in well under a second on a 2024 MacBook Pro. tlv_synth.pcap exercises the new TLV engine (v1.2.2) that handles chained Type-Length-Value protocols such as IEC 60870-5-104, DNP3, BACnet, and DHCP options — protocols the positional-only engine couldn't model.

New in v1.4.0: two additions driven by four empirical research studies (research/ folder). substrat diagnose --readout=soft exposes a soft readout that finds the longest parseable substring of each line and raw-codes the residual — recovers up to ~14% bpc on noisy arithmetic corpora without changing the induced grammar. Trade-off: anomaly F1 drops ~10 points, so hard readout remains the default. substrat learn --fragility and substrat diagnose --fragility now report structural fragility scores (O(|G|) computation) that predict how much the grammar will degrade under noise, based on four grammar features with Spearman ρ ≥ 0.75 on a 10-corpus benchmark.

In v1.3.2: --prune now preserves parse coverage — a production is only removed if bpc strictly improves AND the parse-failure count on the corpus does not grow. v1.3.1 applied an MDL-only criterion that could drop wrap productions and silently sacrifice 40-60% of test parses. Docker end-to-end demo: on anbn, --prune now correctly removes the redundant S→SS (3→2 productions, bpc 0.352→0.349, 100% coverage preserved).

In v1.3.1: substrat anomaly now uses a double-threshold detector — a line is flagged STAT if its NLL z-score, length z-score, or intra-string entropy z-score exceeds the threshold. Catches too-short, too-long, and too-homogeneous lines that NLL alone missed (e.g. "0" or "((((((" slipping through a composed-expression detector). Also: substrat learn, compress, and diagnose accept --prune, an MDL-based post-pass that removes productions whose deletion improves bpc (slower, more compact grammar).

In v1.3.0: substrat diagnose command + fair MDL comparison protocol. Every baseline (n-gram, gzip, raw) now pays its own description length in bits, and the verdict is one of exact | lost-to-baseline | under-approx | over-approx | no-grammar. The JSON output feeds CI, SIEM, or regression suites. Full test suite runs in Docker in ~30s; release workflow now runs 104 tests on every tag.

In v1.2.4: adaptive anomaly threshold — substrat anomaly now tracks a sliding window of recent NLLs (default 500) and z-scores incoming lines against the rolling baseline. Legitimate drift (day/night cycles, firmware updates, seasonal shifts) no longer floods false positives; a dedicated drift event fires when the rolling mean crosses K × initial_std from the training baseline. Opt-out with --no-adaptive.

In v1.2.3: nested TLV detection (constructed Types whose Value is itself a TLV chain — BACnet APDU, OPC-UA partials), TLV-aware fuzz corpus generator, TLV-aware Wireshark .lua dissector (loop-based, Expert Info for prefix / type / length / overrun).

In v1.2.2: TCP reassembly on by default (multi-segment HTTP/FTP/SSH), non-Ethernet pcap support (Linux SLL, raw IP, loopback), live capture (substrat sniff), JSON output on all analysis commands.

1. MODBUS/TCP — 5/5 injected anomalies detected

SCADA protocol with deliberately corrupted packets appended to a clean trace. Substrat discovers the grammar from the first 80 packets and flags every corrupted packet.

Download modbus_synth.pcap (7 KB)

$ ./substrat re modbus_synth.pcap --wireshark ./dissectors --fuzz ./fuzz

Pcap: 85 packets, 1 services (1 analyzable)

--- Flow: TCP|192.168.1.10:502 (85 packets, BINARY mode) ---
Coverage: 94%

Protocol grammar discovered:
  MSG -> MAGIC_0 DATA_1 FIXED_2 TYPE_3 DATA_4 FIXED_5 DATA_6
  MAGIC_0 -> 0x00
  FIXED_2 -> 0x00000006
  TYPE_3  -> 0x01 | 0x02 | 0x03
  FIXED_5 -> 0x00

Anomalies found: 5/85 packets
  #80..#84: FIXED_2 mismatch (got 0x00010006), FIXED_5 mismatch (got 0xde)

Fuzz corpus: 500 samples -> ./fuzz/TCP_192.168.1.10_502/
Wireshark dissector: ./dissectors/TCP_192.168.1.10_502.lua
Total CPU: 0.034s

2. TLV-structured industrial — IEC-104 style (new in v1.2.2)

Synthetic pcap that mirrors the structure of IEC 60870-5-104: a 2-byte APCI-style prefix (0x68 0x04) followed by a chain of Type-Length-Value records — temperature (type 0x01, 2-byte value), humidity (type 0x03, 1-byte value), pressure (type 0x05, 4-byte value), and an optional battery reading (type 0x07, 1-byte value, present in about 40% of messages). Five anomalies are injected: two unknown types (0x99), two humidity fields with out-of-range length, one truncated chain.

The new TLV engine auto-detects the format, recovers the fixed prefix and the legitimate type vocabulary, and flags each anomaly class with an actionable reason.

Download tlv_synth.pcap (5.9 KB)

$ ./substrat re tlv_synth.pcap --json | jq '.flows[0]'

{
  "mode": "binary_tlv",
  "packets": 100,
  "coverage": 0.99,
  "tlv_format": {
    "prefix": "6804",
    "t_size": 1, "l_size": 1, "l_endian": "big",
    "types": [1, 3, 5, 7]
  },
  "grammar": [
    "MSG -> PREFIX TLV_CHAIN",
    "PREFIX -> 0x6804",
    "TLV_CHAIN -> TLV | TLV TLV_CHAIN",
    "TLV -> TYPE LENGTH VALUE",
    "TYPE -> 0x01 | 0x03 | 0x05 | 0x07",
    "LENGTH -> <uint8>",
    "VALUE -> <bytes[LENGTH]>"
  ],
  "anomalies": [
    { "index": 95, "issues": ["unknown TYPE 0x99"] },
    { "index": 96, "issues": ["unknown TYPE 0x99"] },
    { "index": 97, "issues": ["length out of range for TYPE 0x03: got 8, expected [1,1]"] },
    { "index": 98, "issues": ["length out of range for TYPE 0x03: got 8, expected [1,1]"] },
    { "index": 99, "issues": ["TLV chain does not parse"] }
  ]
}

Total CPU: 0.04s

3. Nested TLV — constructed records (new in v1.2.3)

Some real protocols carry records whose Value is itself a TLV chain: BACnet APDU, OPC-UA partial encodings, ASN.1 short-form. Substrat's nested detector runs after the flat template is built, probes every observed Type, and recursively discovers a sub-template when a Type's Value parses cleanly as a chain in the same (T, L, endian) format.

The synthetic pcap has an outer prefix 0x53, a sensor_id field (type 0x01), and a constructed type 0x10 whose Value chains three inner readings — temperature (0x20), humidity (0x22), pressure (0x24). Five anomalies are injected: three at the outer level (unknown type 0xcc), two at the inner level (unknown inner type 0xff but keeping the outer length intact).

Download nested_tlv_synth.pcap (6.0 KB)

$ ./substrat re nested_tlv_synth.pcap --json | jq '.flows[0]'

{
  "mode": "binary_tlv",
  "packets": 95,
  "coverage": 1.0,
  "tlv_format": {
    "prefix": "53", "t_size": 1, "l_size": 1, "l_endian": "big",
    "types": [1, 16],
    "nested_types": [16]
  },
  "grammar": [
    "MSG -> PREFIX TLV_CHAIN",    "PREFIX -> 0x53",
    "TYPE -> 0x01 | 0x10",
    "VALUE -> <bytes[LENGTH]>   # primitive types",
    "VALUE (when TYPE=0x10) -> T10_MSG   # nested TLV",
    "# nested grammar for TYPE=0x10",
    "T10_TYPE -> 0x20 | 0x22 | 0x24"
  ],
  "anomalies": [
    { "index": 90, "issues": ["unknown TYPE 0xcc"] },
    { "index": 93, "issues": ["TYPE=0x10→unknown TYPE 0xff"] }
  ]
}

Total CPU: 0.04s

Notice the recursive anomaly path TYPE=0x10→unknown TYPE 0xff — the operator sees exactly which nested record violated the template. The fuzz corpus and Wireshark .lua dissector are now also generated for TLV flows (--fuzz / --wireshark flags), with a loop-based dissector that walks the chain at runtime and emits Expert Info warnings for prefix mismatch, unknown types, out-of-range lengths, and packet overruns.

4. DNS — real capture, 3 flows auto-separated

Real DNS traffic. Substrat identifies three distinct flows (queries to different resolvers) and discovers a grammar per flow. Transaction ID, Flags, Questions count, Authority are recovered as separate fields.

Download dns.pcap (4 KB)

$ ./substrat re dns.pcap

Pcap: 38 packets, 2 services (2 analyzable)

--- Flow: UDP|192.168.170.20:53 [binary] (16 packets) ---
Coverage: 94%
Anomalies found: 2/16 packets

--- Flow: UDP|192.168.170.20:53 [text] (12 packets) ---
Coverage: 100%
Anomalies found: 1/12 packets

--- Flow: UDP|217.13.4.24:53 (10 packets) ---
Coverage: 100%
Anomalies found: 4/10 packets

Total CPU: 0.017s

5. Telnet — auto-split binary commands from text data

Real Telnet session. Substrat detects the IAC escape prefix 0xff and splits binary command packets from text data packets in the same TCP flow, then builds a grammar for each.

Download telnet.pcap (24 KB)

$ ./substrat re telnet.pcap

--- Flow: TCP|192.168.0.1:23<>192.168.0.2:1254 [binary] (19 packets) ---
Grammar: MSG -> DELIM_0xff SEGMENT
Anomalies found: 2/19 packets

--- Flow: TCP|192.168.0.1:23<>192.168.0.2:1254 [text] (117 packets) ---
Strategy: template, Tokens: []
Parse failures: 0/39

Total CPU: 0.30s

6. HTTP / FTP / TFTP — text-mode tokenization

For text protocols, Substrat discovers delimiter tokens (CRLF, status codes, keywords) and builds a slot-based grammar separating requests from responses.

http_synth.pcap tftp.pcap nb6.pcap

$ ./substrat re http_synth.pcap

--- Flow: TCP|10.0.0.1:80<>10.0.0.2:54321 (30 packets, TEXT mode) ---
Strategy: template
Tokens discovered: ['HTTP', '1.1', '200', 'OK\r\nContent-Type:', 'application']
Parse failures: 0/10
bpc: 1.4113

Total CPU: 0.18s

7. Live capture throughput (v1.2.2+)

Measured with the docker/stress harness bundled in the Substrat source tree: one MQTT broker, 10 synthetic sensors pushing 50 messages per second each (5% injected anomaly rate), and one sniffer running substrat sniff's underlying pipeline. All in containers on a 2024 MacBook Pro.

$ cd docker/stress && docker compose up -d
$ docker logs -f stress-sniffer-1

LEARNING from 200 messages...
  Strategy : chain
  bpc      : 3.2334
  CPU      : 0.005s
  MONITOR phase starting...

[MONITOR] 2,324 msgs | 464 msg/s | latency avg=0.4ms p99=1.6ms | anomalies=118 (5.1%)
[MONITOR] 4,576 msgs | 457 msg/s | latency avg=0.5ms p99=2.1ms | anomalies=248 (5.4%)
[MONITOR] 6,832 msgs | 455 msg/s | latency avg=0.4ms p99=1.6ms | anomalies=378 (5.5%)
[MONITOR] 9,095 msgs | 454 msg/s | latency avg=0.4ms p99=1.9ms | anomalies=492 (5.4%)

Sustained throughput sits around 460 msg/s with a p99 per-message latency under 2 ms. The measured anomaly rate (5.4%) matches the injected rate (5%), confirming the detector's calibration.

For a real-world deployment use substrat sniff -i <iface> --filter '<BPF>' --learn-packets 500 --json; the same LEARN-then-MONITOR pipeline runs against live traffic on a network interface instead of MQTT.

Hardware & notes

Measured on a MacBook Pro (Apple Silicon). Substrat runs a Rust core; no GPU, no cloud API, no training data, nothing leaves your machine.

Numbers will vary with CPU — expect sub-second analysis on any modern laptop for pcaps up to a few hundred packets. Larger captures scale roughly linearly with packet count in binary mode, and with text corpus size for text mode.

Found a pcap where Substrat does something unexpected? Send it to us — every edge case improves the tool.

← Back to home