← Back to Substrat

Substrat Documentation

Everything you need to use Substrat.

Table of contents

  1. Installation
  2. Quick start
  3. substrat re — Protocol reverse-engineering
  4. substrat learn — Grammar discovery
  5. substrat anomaly — Anomaly detection
  6. substrat compress — Compression
  7. substrat generate — Data generation
  8. Wireshark integration
  9. Using the fuzz corpus
  10. Supported protocols

1. Installation

Download the binary for your platform from the download page.

# macOS: remove quarantine flag
xattr -d com.apple.quarantine substrat-macos-arm64

# Make executable
chmod +x substrat-macos-arm64

# Verify
./substrat-macos-arm64 info

You should see version info and available commands printed to the terminal.

2. Quick start

The most common use case: analyze a network capture file.

# Analyze a pcap — get grammar + anomalies + Wireshark plugin + fuzz corpus
./substrat-macos-arm64 re capture.pcap --wireshark ./dissectors --fuzz ./fuzz

That's it. Substrat reads the pcap, discovers the protocol structure, flags anomalies, generates a Wireshark dissector and 500 test packets.

3. substrat re — Protocol reverse-engineering

The main command for pentest and forensics.

# Basic: grammar + anomalies
./substrat re capture.pcap

# With Wireshark dissector export
./substrat re capture.pcap --wireshark ./dissectors

# With fuzz corpus generation
./substrat re capture.pcap --fuzz ./fuzz_output

# Everything together, 1000 fuzz samples
./substrat re capture.pcap --wireshark ./dissectors --fuzz ./fuzz --fuzz-count 1000

# Limit to 50000 packets (for very large pcaps)
./substrat re capture.pcap --max-packets 50000

What it does, step by step

  1. Reads the pcap (pcap or pcapng, TCP/UDP/ICMP, IPv4/IPv6)
  2. Groups packets by flow or by service port (auto-selected)
  3. Detects the mode for each group:
    • Text (HTTP, FTP, SMTP...): token discovery + template matching
    • Binary (DNS, MODBUS, proprietary...): magic bytes, length fields, type fields, variable zones
    • Mixed (Telnet...): automatic binary/text split within the same flow
  4. Produces the grammar of the protocol
  5. Detects anomalies (corrupted magic bytes, unknown types, truncated packets)
  6. Generates a fuzz corpus (3 strategies: field_flip, boundary, structural)
  7. Exports a Wireshark dissector (.lua) for each binary flow

Example output (MODBUS/TCP)

Pcap: 85 packets, 1 services (1 analyzable)

--- Flow: TCP|192.168.1.10:502 (85 packets, BINARY mode) ---
Coverage: 94%

Protocol grammar discovered:
  MSG -> MAGIC_0 DATA_1 FIXED_2 TYPE_3 DATA_4 FIXED_5 DATA_6
  MAGIC_0 -> 0x00
  DATA_1 -> <bytes[1]>
  FIXED_2 -> 0x00000006
  TYPE_3 -> 0x01 | 0x02 | 0x03
  DATA_4 -> <bytes[1]>
  FIXED_5 -> 0x00
  DATA_6 -> <bytes[3]>

Anomalies found: 5/85 packets
  #80: FIXED_2 mismatch: expected 0x00000006, got 0x00010006
  #81: FIXED_5 mismatch: expected 0x00, got 0xde

Wireshark dissector: ./dissectors/TCP_192.168.1.10_502.lua
Fuzz corpus: 500 samples -> ./fuzz/TCP_192.168.1.10_502/

Total CPU: 0.03s

4. substrat learn — Grammar discovery

Discover the grammar of any structured text file (one sample per line).

./substrat learn data.txt
Corpus: 100 lines, alphabet=6 chars
Strategy: wrap
Grammar (3 productions):
  S->(S)
  S->()
  S->SS
bpc: 1.6030
Parse failures: 0/34
CPU: 0.04s

5. substrat anomaly — Anomaly detection

# Auto split: train on 2/3, test on 1/3
./substrat anomaly data.txt

# Separate train and test files
./substrat anomaly test.txt --train train.txt

Two types of anomalies:

6. substrat compress — Compression

./substrat compress data.txt
Raw: 2720 bits (340 chars x 8)
Compressed: 545 bits (1.6030 bpc)
Ratio: 20.0%

7. substrat generate — Data generation

./substrat generate data.txt --n 20

Learns the grammar and generates N valid samples with controlled depth.

8. Wireshark integration

Install the dissector

# macOS
cp ./dissectors/*.lua ~/.config/wireshark/plugins/

# Linux
cp ./dissectors/*.lua ~/.local/lib/wireshark/plugins/

# Windows
copy dissectors\*.lua %APPDATA%\Wireshark\plugins\

Then reload in Wireshark: Ctrl+Shift+L or restart.

What the dissector does

9. Using the fuzz corpus

The .bin files in the fuzz directory are raw binary packets, ready to send:

# With ncat
for f in ./fuzz/TCP_192.168.1.10_502/*.bin; do
  ncat 192.168.1.10 502 < "$f"
done

Three mutation strategies:

10. Supported protocols

ProtocolTypeResult
DNSBinary (real)Transaction ID, Flags, Questions, Authority detected
MODBUS/TCPBinary (SCADA)Proto ID, Unit ID, Length. 5/5 anomalies detected
TelnetMixed (real)Auto-split IAC binary + text commands
FTPText (real)Token \r\n, FTP commands, 0 parse failures
HTTPTextTokens HTTP, GET. Requests/responses separated
DHCPBinary (real)4 packets (too few to analyze)

Substrat works on any protocol — not just the ones listed above. If your protocol has a repeating structure, Substrat will find it. If you find a protocol that doesn't work well, send us the pcap.