Skip to content

MHR-Simulationen & Realdatensatz

Dieser Ordner enthält die Routing-Simulationen des MHR-Forks und den echten CoreScope-Datensatz, mit dem sie kalibriert wurden — damit jede:r die Ergebnisse reproduzieren oder eigene Auswertungen bauen kann.

Voraussetzungen

pip install numpy networkx matplotlib
# zum Entpacken der Rohdaten: xz (macOS: brew install xz, Debian/Ubuntu: apt install xz-utils)

Datensatz (data/)

Datei Inhalt Größe
data/packets.jsonl.xz vollständige Rohdaten: 109.980 reale Pakete (1 JSON/Zeile), je mit path_json (Hop-Kette), snr, rssi, observer_id, route_type, payload_type, timestamp 16,6 MB (entpackt 188 MB)
data/nodes.json 1962 Knoten: public_key, lat/lon, role, advert_count, relay_count_*, usefulness_score, … 1,3 MB
data/observers.json 38 Observer (Empfangs-Knoten) 24 KB
data/topology_edges.json aus den Pfaden abgeleitete beobachtete Relay-Kanten (aggregiert) 0,7 MB
data/neighbor_graph.json bestes Link-Modell: server-aufgelöste reale Nachbar-Topologie (1034 Knoten, 1956 Kanten) mit vollen Pubkeys, avg_snr pro Kante und ambiguous-Flag (nur 8,8 % ambig) 0,6 MB
data/hash_sizes.json Hash-Größen-Verteilung (1/2/3-Byte) im Netz 0,5 MB
data/snr_calibration.json, data/real_detour_stats.json abgeleitete Kennzahlen (SNR-Fit, reale Detour-Statistik) klein

Reichere CoreScope-Endpoints (für genauere/neue Simulationen)

Die CoreScope-API bietet mehr als die anfangs genutzten Basis-Endpoints — für ein echtes Link-Modell sind besonders nützlich: - /api/analytics/neighbor-graph — fertige reale Link-Topologie mit avg_snr pro Kante + ambiguous-Flag (Quelle von data/neighbor_graph.json). Ersetzt das geometrische v3-Modell durch echte gemessene Kanten → deutlich genauer. - /api/packets?expand=observations — liefert pro Paket ALLE Observer-Empfänge (je eigenes snr/rssi), statt nur einer Messung → viel dichtere Link-Qualitäts-Daten. - /api/resolve-hops (POST) / /api/paths/inspect (POST) — server-seitige Hash→Knoten-Auflösung (umgeht die 1-Byte-Hash-Ambiguität). - /api/nodes/{pubkey}, /api/analytics/hash-sizes, time-Pagination (?before=/?since=).

Quelle: öffentliche CoreScope-API https://corescope.meshrheinland.de/api/{nodes,observers,packets} (MeshCore Rheinland). Der Datensatz ist eine Momentaufnahme (30.05.2026). Rohdaten bewusst als .xz eingebunden (xz packt JSON-Lines ~11×; 188 MB → 16,6 MB, unter GitHubs Limit).

Schnellstart — v3-Realdaten-Sim aus dem Repo reproduzieren

Die Skripte erwarten die Rohdaten unter /tmp/cs_data/. Einmalig vorbereiten:

mkdir -p /tmp/cs_data
xz -dkc data/packets.jsonl.xz > /tmp/cs_data/packets.jsonl     # entpacken (~188 MB)
cp data/nodes.json data/observers.json /tmp/cs_data/
python3 mhr_sim_real_v3.py        # → sim_results_v3.json, data/*, fig_v3_*.png

Vollständig frische Daten holen (statt Repo-Snapshot)

python3 mhr_collect_corescope.py            # lädt aktuelle Daten nach /tmp/cs_data/ (paginiert ~110k Pakete)
python3 mhr_sim_real_v3.py

Der Collector nutzt urllib mit Browser-User-Agent + offset-Pagination (siehe Kopf der Datei).

Die Simulationen im Überblick

Skript Was Datenbasis
mhr_sim.py v1 — synthetische Rheinland-Topologie, MeshCore-Flood vs. MHR synthetisch (illustrativ — das angenommene SNR-Modell wird von den Realdaten widerlegt)
mhr_sim_real.py v1 auf 25 echten Knoten klein, angenommenes Linkmodell
mhr_sim_v2.py Stress-Szenarien (Churn, Linkausfall, Partition) reale 25-Knoten-Topologie
mhr_sim_real_v3.py v3 — auf vollem Realdatensatz kalibriert: misst reale Topologie, SNR-Fit, reale Detours (Median 2,1×), simuliert Baseline vs. MHR data/ + Rohpakete
../study/study_sim.py Mechanismus × Adoptions-Sweep (1 Knoten → alle), Safety-Invariante reale 776-Knoten-Topologie

Alle reproduzierbar mit Seed 42. Ergebnisse landen als *_results*.json + fig_*.png daneben.

Eigene Auswertungen bauen

  • Pakete streamen (nie komplett in den RAM): for line in open('/tmp/cs_data/packets.jsonl'): p = json.loads(line). Direkt aus dem .xz ohne Entpacken: import lzma, json; [json.loads(l) for l in lzma.open('data/packets.jsonl.xz','rt')].
  • Node→Hash-Mapping: der MeshCore-Hash eines Knotens = die ersten hash_size Bytes des public_key (Hex-Präfix). 2-Byte-Hashes sind ~99 % eindeutig, 1-Byte kollidieren stark — bei 1-Byte über Geografie disambiguieren oder als ambig zählen (nicht raten).
  • path_json ist die reale Hop-Kette eines Flood-Pakets; aufeinanderfolgende Hashes = reale Kante.
  • Methodik & Limitierungen ehrlich dokumentiert in MeshCore_Simulation_v3_Realdaten.md und ../study/MeshCore_Routing_Study.md.

Hinweis: Das v3-Linkmodell ist geometrisch idealisiert (reales Gelände/Antennenhöhe fehlen) — relative Mechanismus-Vergleiche sind robuster als absolute Airtime-Zahlen.


🇬🇧 English Translation

MHR Simulations & Real-World Dataset

This folder contains the routing simulations of the MHR fork and the real CoreScope dataset used to calibrate them — so that anyone can reproduce the results or build their own analyses.

Prerequisites

pip install numpy networkx matplotlib
# to decompress the raw data: xz (macOS: brew install xz, Debian/Ubuntu: apt install xz-utils)

Dataset (data/)

File Contents Size
data/packets.jsonl.xz complete raw data: 109,980 real packets (1 JSON/line), each with path_json (hop chain), snr, rssi, observer_id, route_type, payload_type, timestamp 16.6 MB (decompressed 188 MB)
data/nodes.json 1,962 nodes: public_key, lat/lon, role, advert_count, relay_count_*, usefulness_score, … 1.3 MB
data/observers.json 38 observers (receiving nodes) 24 KB
data/topology_edges.json observed relay edges derived from the paths (aggregated) 0.7 MB
data/neighbor_graph.json best link model: server-resolved real neighbor topology (1,034 nodes, 1,956 edges) with full pubkeys, avg_snr per edge and ambiguous flag (only 8.8% ambiguous) 0.6 MB
data/hash_sizes.json hash-size distribution (1/2/3-byte) in the network 0.5 MB
data/snr_calibration.json, data/real_detour_stats.json derived metrics (SNR fit, real detour statistics) small

Richer CoreScope Endpoints (for more accurate / new simulations)

The CoreScope API offers more than the basic endpoints used initially — particularly useful for a real link model: - /api/analytics/neighbor-graph — ready-made real link topology with avg_snr per edge + ambiguous flag (source of data/neighbor_graph.json). Replaces the geometric v3 model with real measured edges → significantly more accurate. - /api/packets?expand=observations — returns ALL observer receptions per packet (each with its own snr/rssi), instead of just one measurement → much denser link-quality data. - /api/resolve-hops (POST) / /api/paths/inspect (POST) — server-side hash→node resolution (bypasses the 1-byte hash ambiguity). - /api/nodes/{pubkey}, /api/analytics/hash-sizes, time pagination (?before=/?since=).

Source: public CoreScope API https://corescope.meshrheinland.de/api/{nodes,observers,packets} (MeshCore Rheinland). The dataset is a snapshot (2026-05-30). Raw data intentionally included as .xz (xz compresses JSON-Lines ~11×; 188 MB → 16.6 MB, within GitHub's limit).

Quick Start — Reproducing the v3 Real-Data Simulation from the Repo

The scripts expect the raw data under /tmp/cs_data/. One-time preparation:

mkdir -p /tmp/cs_data
xz -dkc data/packets.jsonl.xz > /tmp/cs_data/packets.jsonl     # decompress (~188 MB)
cp data/nodes.json data/observers.json /tmp/cs_data/
python3 mhr_sim_real_v3.py        # → sim_results_v3.json, data/*, fig_v3_*.png

Fetching Completely Fresh Data (instead of the repo snapshot)

python3 mhr_collect_corescope.py            # downloads current data to /tmp/cs_data/ (paginated ~110k packets)
python3 mhr_sim_real_v3.py

The collector uses urllib with a browser user agent + offset pagination (see the top of the file).

Overview of the Simulations

Script What Data basis
mhr_sim.py v1 — synthetic Rhineland topology, MeshCore flood vs. MHR synthetic (illustrative — the assumed SNR model is refuted by real data)
mhr_sim_real.py v1 on 25 real nodes small, assumed link model
mhr_sim_v2.py stress scenarios (churn, link failure, partition) real 25-node topology
mhr_sim_real_v3.py v3 — calibrated on the full real dataset: measures real topology, SNR fit, real detours (median 2.1×), simulates baseline vs. MHR data/ + raw packets
../study/study_sim.py mechanism x adoption sweep (1 node → all), safety invariant real 776-node topology

All reproducible with Seed 42. Results are written as *_results*.json + fig_*.png alongside the scripts.

Building Your Own Analyses

  • Stream packets (never load all into RAM): for line in open('/tmp/cs_data/packets.jsonl'): p = json.loads(line). Directly from .xz without decompressing: import lzma, json; [json.loads(l) for l in lzma.open('data/packets.jsonl.xz','rt')].
  • Node→Hash mapping: the MeshCore hash of a node = the first hash_size bytes of the public_key (hex prefix). 2-byte hashes are ~99% unique; 1-byte hashes collide heavily — for 1-byte, disambiguate by geography or count as ambiguous (do not guess).
  • path_json is the real hop chain of a flood packet; consecutive hashes = real edge.
  • Methodology & limitations are honestly documented in MeshCore_Simulation_v3_Realdaten.md and ../study/MeshCore_Routing_Study.md.

Note: The v3 link model is geometrically idealised (real terrain / antenna height are absent) — relative mechanism comparisons are more robust than absolute airtime figures.