MHR-Simulationen & Realdatensatz
Dieser Ordner enthält die Routing-Simulationen des MHR-Forks und den echten CoreScope-Datensatz, mit dem sie kalibriert wurden — damit jede:r die Ergebnisse reproduzieren oder eigene Auswertungen bauen kann.
Voraussetzungen
pip install numpy networkx matplotlib
# zum Entpacken der Rohdaten: xz (macOS: brew install xz, Debian/Ubuntu: apt install xz-utils)
Datensatz (data/)
| Datei | Inhalt | Größe |
|---|---|---|
data/packets.jsonl.xz |
vollständige Rohdaten: 109.980 reale Pakete (1 JSON/Zeile), je mit path_json (Hop-Kette), snr, rssi, observer_id, route_type, payload_type, timestamp |
16,6 MB (entpackt 188 MB) |
data/nodes.json |
1962 Knoten: public_key, lat/lon, role, advert_count, relay_count_*, usefulness_score, … |
1,3 MB |
data/observers.json |
38 Observer (Empfangs-Knoten) | 24 KB |
data/topology_edges.json |
aus den Pfaden abgeleitete beobachtete Relay-Kanten (aggregiert) | 0,7 MB |
data/neighbor_graph.json |
bestes Link-Modell: server-aufgelöste reale Nachbar-Topologie (1034 Knoten, 1956 Kanten) mit vollen Pubkeys, avg_snr pro Kante und ambiguous-Flag (nur 8,8 % ambig) |
0,6 MB |
data/hash_sizes.json |
Hash-Größen-Verteilung (1/2/3-Byte) im Netz | 0,5 MB |
data/snr_calibration.json, data/real_detour_stats.json |
abgeleitete Kennzahlen (SNR-Fit, reale Detour-Statistik) | klein |
Reichere CoreScope-Endpoints (für genauere/neue Simulationen)
Die CoreScope-API bietet mehr als die anfangs genutzten Basis-Endpoints — für ein echtes Link-Modell sind besonders nützlich:
- /api/analytics/neighbor-graph — fertige reale Link-Topologie mit avg_snr pro Kante + ambiguous-Flag (Quelle von data/neighbor_graph.json). Ersetzt das geometrische v3-Modell durch echte gemessene Kanten → deutlich genauer.
- /api/packets?expand=observations — liefert pro Paket ALLE Observer-Empfänge (je eigenes snr/rssi), statt nur einer Messung → viel dichtere Link-Qualitäts-Daten.
- /api/resolve-hops (POST) / /api/paths/inspect (POST) — server-seitige Hash→Knoten-Auflösung (umgeht die 1-Byte-Hash-Ambiguität).
- /api/nodes/{pubkey}, /api/analytics/hash-sizes, time-Pagination (?before=/?since=).
Quelle: öffentliche CoreScope-API https://corescope.meshrheinland.de/api/{nodes,observers,packets}
(MeshCore Rheinland). Der Datensatz ist eine Momentaufnahme (30.05.2026). Rohdaten bewusst als
.xz eingebunden (xz packt JSON-Lines ~11×; 188 MB → 16,6 MB, unter GitHubs Limit).
Schnellstart — v3-Realdaten-Sim aus dem Repo reproduzieren
Die Skripte erwarten die Rohdaten unter /tmp/cs_data/. Einmalig vorbereiten:
mkdir -p /tmp/cs_data
xz -dkc data/packets.jsonl.xz > /tmp/cs_data/packets.jsonl # entpacken (~188 MB)
cp data/nodes.json data/observers.json /tmp/cs_data/
python3 mhr_sim_real_v3.py # → sim_results_v3.json, data/*, fig_v3_*.png
Vollständig frische Daten holen (statt Repo-Snapshot)
python3 mhr_collect_corescope.py # lädt aktuelle Daten nach /tmp/cs_data/ (paginiert ~110k Pakete)
python3 mhr_sim_real_v3.py
Der Collector nutzt urllib mit Browser-User-Agent + offset-Pagination (siehe Kopf der Datei).
Die Simulationen im Überblick
| Skript | Was | Datenbasis |
|---|---|---|
mhr_sim.py |
v1 — synthetische Rheinland-Topologie, MeshCore-Flood vs. MHR | synthetisch (illustrativ — das angenommene SNR-Modell wird von den Realdaten widerlegt) |
mhr_sim_real.py |
v1 auf 25 echten Knoten | klein, angenommenes Linkmodell |
mhr_sim_v2.py |
Stress-Szenarien (Churn, Linkausfall, Partition) | reale 25-Knoten-Topologie |
mhr_sim_real_v3.py |
v3 — auf vollem Realdatensatz kalibriert: misst reale Topologie, SNR-Fit, reale Detours (Median 2,1×), simuliert Baseline vs. MHR | data/ + Rohpakete |
../study/study_sim.py |
Mechanismus × Adoptions-Sweep (1 Knoten → alle), Safety-Invariante | reale 776-Knoten-Topologie |
Alle reproduzierbar mit Seed 42. Ergebnisse landen als *_results*.json + fig_*.png daneben.
Eigene Auswertungen bauen
- Pakete streamen (nie komplett in den RAM):
for line in open('/tmp/cs_data/packets.jsonl'): p = json.loads(line). Direkt aus dem.xzohne Entpacken:import lzma, json; [json.loads(l) for l in lzma.open('data/packets.jsonl.xz','rt')]. - Node→Hash-Mapping: der MeshCore-Hash eines Knotens = die ersten
hash_sizeBytes despublic_key(Hex-Präfix). 2-Byte-Hashes sind ~99 % eindeutig, 1-Byte kollidieren stark — bei 1-Byte über Geografie disambiguieren oder als ambig zählen (nicht raten). path_jsonist die reale Hop-Kette eines Flood-Pakets; aufeinanderfolgende Hashes = reale Kante.- Methodik & Limitierungen ehrlich dokumentiert in
MeshCore_Simulation_v3_Realdaten.mdund../study/MeshCore_Routing_Study.md.
Hinweis: Das v3-Linkmodell ist geometrisch idealisiert (reales Gelände/Antennenhöhe fehlen) — relative Mechanismus-Vergleiche sind robuster als absolute Airtime-Zahlen.
🇬🇧 English Translation
MHR Simulations & Real-World Dataset
This folder contains the routing simulations of the MHR fork and the real CoreScope dataset used to calibrate them — so that anyone can reproduce the results or build their own analyses.
Prerequisites
pip install numpy networkx matplotlib
# to decompress the raw data: xz (macOS: brew install xz, Debian/Ubuntu: apt install xz-utils)
Dataset (data/)
| File | Contents | Size |
|---|---|---|
data/packets.jsonl.xz |
complete raw data: 109,980 real packets (1 JSON/line), each with path_json (hop chain), snr, rssi, observer_id, route_type, payload_type, timestamp |
16.6 MB (decompressed 188 MB) |
data/nodes.json |
1,962 nodes: public_key, lat/lon, role, advert_count, relay_count_*, usefulness_score, … |
1.3 MB |
data/observers.json |
38 observers (receiving nodes) | 24 KB |
data/topology_edges.json |
observed relay edges derived from the paths (aggregated) | 0.7 MB |
data/neighbor_graph.json |
best link model: server-resolved real neighbor topology (1,034 nodes, 1,956 edges) with full pubkeys, avg_snr per edge and ambiguous flag (only 8.8% ambiguous) |
0.6 MB |
data/hash_sizes.json |
hash-size distribution (1/2/3-byte) in the network | 0.5 MB |
data/snr_calibration.json, data/real_detour_stats.json |
derived metrics (SNR fit, real detour statistics) | small |
Richer CoreScope Endpoints (for more accurate / new simulations)
The CoreScope API offers more than the basic endpoints used initially — particularly useful for a real link model:
- /api/analytics/neighbor-graph — ready-made real link topology with avg_snr per edge + ambiguous flag (source of data/neighbor_graph.json). Replaces the geometric v3 model with real measured edges → significantly more accurate.
- /api/packets?expand=observations — returns ALL observer receptions per packet (each with its own snr/rssi), instead of just one measurement → much denser link-quality data.
- /api/resolve-hops (POST) / /api/paths/inspect (POST) — server-side hash→node resolution (bypasses the 1-byte hash ambiguity).
- /api/nodes/{pubkey}, /api/analytics/hash-sizes, time pagination (?before=/?since=).
Source: public CoreScope API https://corescope.meshrheinland.de/api/{nodes,observers,packets}
(MeshCore Rheinland). The dataset is a snapshot (2026-05-30). Raw data intentionally included as
.xz (xz compresses JSON-Lines ~11×; 188 MB → 16.6 MB, within GitHub's limit).
Quick Start — Reproducing the v3 Real-Data Simulation from the Repo
The scripts expect the raw data under /tmp/cs_data/. One-time preparation:
mkdir -p /tmp/cs_data
xz -dkc data/packets.jsonl.xz > /tmp/cs_data/packets.jsonl # decompress (~188 MB)
cp data/nodes.json data/observers.json /tmp/cs_data/
python3 mhr_sim_real_v3.py # → sim_results_v3.json, data/*, fig_v3_*.png
Fetching Completely Fresh Data (instead of the repo snapshot)
python3 mhr_collect_corescope.py # downloads current data to /tmp/cs_data/ (paginated ~110k packets)
python3 mhr_sim_real_v3.py
The collector uses urllib with a browser user agent + offset pagination (see the top of the file).
Overview of the Simulations
| Script | What | Data basis |
|---|---|---|
mhr_sim.py |
v1 — synthetic Rhineland topology, MeshCore flood vs. MHR | synthetic (illustrative — the assumed SNR model is refuted by real data) |
mhr_sim_real.py |
v1 on 25 real nodes | small, assumed link model |
mhr_sim_v2.py |
stress scenarios (churn, link failure, partition) | real 25-node topology |
mhr_sim_real_v3.py |
v3 — calibrated on the full real dataset: measures real topology, SNR fit, real detours (median 2.1×), simulates baseline vs. MHR | data/ + raw packets |
../study/study_sim.py |
mechanism x adoption sweep (1 node → all), safety invariant | real 776-node topology |
All reproducible with Seed 42. Results are written as *_results*.json + fig_*.png alongside the scripts.
Building Your Own Analyses
- Stream packets (never load all into RAM):
for line in open('/tmp/cs_data/packets.jsonl'): p = json.loads(line). Directly from.xzwithout decompressing:import lzma, json; [json.loads(l) for l in lzma.open('data/packets.jsonl.xz','rt')]. - Node→Hash mapping: the MeshCore hash of a node = the first
hash_sizebytes of thepublic_key(hex prefix). 2-byte hashes are ~99% unique; 1-byte hashes collide heavily — for 1-byte, disambiguate by geography or count as ambiguous (do not guess). path_jsonis the real hop chain of a flood packet; consecutive hashes = real edge.- Methodology & limitations are honestly documented in
MeshCore_Simulation_v3_Realdaten.mdand../study/MeshCore_Routing_Study.md.
Note: The v3 link model is geometrically idealised (real terrain / antenna height are absent) — relative mechanism comparisons are more robust than absolute airtime figures.