How to Deposit Your Maker Dataset
Turning your accumulated triples into a tradeable asset before the market arrives
Why this guide exists
The Maker Data Sovereignty Manifesto §IX lists six things: stratify, hash, sign, cold-backup, name, don’t sell early.
The manifesto explains why. This guide explains how — the directory layouts, the command lines, the file formats. It is operational, not philosophical. Copy and adapt.
If you have not read the manifesto first, stop and do that. Otherwise every step below will look arbitrary.
1. The three-tier directory layout
Recommended on your development machine:
~/maker-assets/
├── public/ # Tier 1: open-sourceable
│ ├── teaching-snippets/ # heartbeat, blink, demos
│ ├── blog-drafts/
│ └── README.md
│
├── private/ # Tier 2: real project code
│ ├── flight-controller-v3/
│ ├── ugv-experimental/
│ └── README.md
│
└── asset/ # Tier 3: the valuable layer
├── triples/ # physically verified ground-truth triples
│ ├── 2026-05-15-crsf-parser-v2/
│ ├── 2026-05-22-imu-mahony-fusion-v1/
│ └── ...
├── deposits/ # attestation receipts
│ ├── 2026-05-zenodo-receipts.json
│ └── 2026-05-ots-stamps/
└── README.md
The rule that matters: three separate git repositories, three signing keys, three backup schedules. Mixing tiers costs you sovereignty — the day you license one piece, anything sharing a repo with it is at risk of being inferred from the transaction.
Each entry under asset/triples/ is a complete, self-contained triple. The next section defines what that means.
2. What a “training-valuable build” looks like
Every triple directory contains, at minimum:
2026-05-15-crsf-parser-v2/
├── manifest.yaml # machine-readable metadata
├── requirement.md # the natural-language requirement you wrote
├── code/ # agent-generated, compiler-accepted source
│ ├── main/
│ ├── components/
│ └── CMakeLists.txt
├── build/
│ ├── build-output.log # full build log
│ ├── size-report.txt
│ └── flash_bundle.tar.gz.sha256 # bundle hash; the bundle itself may live outside the repo
└── verification/ # the proof that it worked
├── monitor-capture.log # serial output from espctl monitor
├── oscilloscope-ch9.png # scope evidence where applicable
├── flight-notes.md # your field observations
└── verification-summary.md # one-line outcome
Recommended manifest.yaml schema
This is the file that makes your dataset legible to a future market. Structured, machine-readable, self-describing:
version: 1
timestamp: 2026-05-15T14:23:00+08:00
contributor:
pubkey_fingerprint: SHA256:Abc123... # fingerprint of your signing pubkey
alias: optional-handle # public handle, may be empty
hardware:
board: ESP32-S3-DevKitC-1 v1.1
chip: esp32s3
chip_revision: v0.2
peripherals:
- kind: imu
part: ICM-42688-P
interface: SPI3
address_or_cs: GPIO10
- kind: rc_receiver
part: BetaFPV ELRS Lite
interface: UART1
baud: 420000
protocol: CRSF
firmware:
framework: esp-idf
idf_version: v5.3.1
source_tree_sha256: <hash>
binary_sha256: <hash>
size_bytes: 184320
build_target: esp32s3
built_via: esphome.cloud # or local IDF, or self-hosted
requirement:
language: en # or zh-CN, etc.
text: |
Connect an ELRS receiver to UART1 at 420000 baud and parse the CRSF
stream (CRC8 poly 0xD5). Log all 16 channels (11-bit packed) via
ESP_LOGI at 50 Hz.
verification:
method: serial_monitor # or jtag, oscilloscope, flight_test
duration_seconds: 60
evidence:
- kind: log
path: verification/monitor-capture.log
sha256: <hash>
- kind: image
path: verification/oscilloscope-ch9.png
sha256: <hash>
outcome: passed # passed | failed | partial
notes: |
All 16 channels track stick inputs in real time.
CRC failure rate < 0.1% over the 60-second window (2 frames dropped).
license:
retained_by_contributor: true
permitted_uses: [] # default empty: any external use requires explicit opt-in
permitted_buyers: []
exclusivity_offered: false
Keep outcome: failed triples too. Failure data has high training value — it tells the model what not to generate. Most makers reflexively delete failures. Don’t.
The empty permitted_uses: [] default is not cosmetic. It means: the absence of explicit license is a refusal, not a gap. Future arbitration anchors on this.
3. Milestone hashing
Don’t try to attest every single triple — too granular, too expensive, too painful. Instead, batch-attest at milestones: monthly, or after a meaningful phase completes.
Do both Zenodo and OpenTimestamps. They are complementary, not redundant.
Zenodo (gives you an academically-citable DOI)
# 1. Bundle the month's triples
cd ~/maker-assets/asset
tar czf milestone-2026-05.tar.gz triples/2026-05-*/
# 2. Hash it
shasum -a 256 milestone-2026-05.tar.gz
# milestone-2026-05.tar.gz: f0a63ee2c4d8...
# 3. Upload via Zenodo web UI or REST API
# https://zenodo.org/api/deposit/depositions
# Fill in: title, creators, description, keywords
# Receive a DOI, e.g.: 10.5281/zenodo.1234567
# 4. Save the receipt locally
cat > deposits/2026-05-zenodo-receipt.json <<EOF
{
"milestone": "2026-05",
"doi": "10.5281/zenodo.1234567",
"tarball_sha256": "f0a63ee2...",
"uploaded_at": "2026-05-31T23:59:00+08:00"
}
EOF
Zenodo is free, CERN-backed, long-term preserved, and issues persistent identifiers. Academic, policy, and arbitration audiences all accept Zenodo records.
OpenTimestamps (anchors hashes into Bitcoin block time)
# Install
pip install opentimestamps-client
# Stamp the milestone bundle
ots stamp milestone-2026-05.tar.gz
# produces milestone-2026-05.tar.gz.ots
# A few hours later, upgrade once the Bitcoin blocks confirm
ots upgrade milestone-2026-05.tar.gz.ots
# Verify any time, from anywhere, by anyone
ots verify milestone-2026-05.tar.gz.ots
# Store the receipt
mv milestone-2026-05.tar.gz.ots deposits/2026-05-ots-stamps/
OpenTimestamps is free, irrevocable, institutionally independent. The Bitcoin block headers serve as a public, distributed ledger. Even if Zenodo ceases to exist in thirty years, the Bitcoin timestamp remains verifiable.
Why both: Zenodo provides social legibility (“this was published in May 2026; here is its DOI”); OpenTimestamps provides cryptographic provability (“this exact hash existed on or before Bitcoin block N”). Future legal arbitration may need either or both.
4. Cryptographic signing
Pick tools you can sustain for five years. Sophistication that you drop in a year is worse than simplicity you keep.
Recommended combination: SSH-key commit signing + minisign for tarballs
SSH key signing for git commits (simplest path — git 2.34+ supports this natively):
# Use the same SSH key you already use
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub
git config --global commit.gpgsign true
git config --global tag.gpgsign true
# Verify signature on a commit
git log --show-signature
minisign for non-git artifacts (release tarballs, datasets):
# Debian/Ubuntu
sudo apt install minisign
# macOS
brew install minisign
# Generate a signing key (back up the private key carefully)
minisign -G
# Sign the milestone bundle
minisign -Sm milestone-2026-05.tar.gz
# produces milestone-2026-05.tar.gz.minisig
# Verify
minisign -Vm milestone-2026-05.tar.gz -p minisign.pub
Why not GPG
GPG has a complex workflow, an inconsistent keyserver ecosystem, subkey management is a known landmine, and in five years you will almost certainly have lost the passphrase, the keyring, or both. Unless you are already a competent GPG user, do not start a new GPG identity for this. SSH + minisign covers every case below the threshold of needing X.509 PKI.
Publish your public keys
Post your public key fingerprint on your personal site, your GitHub profile, ORCID if you have one, and ideally keyoxide.org. This grounds “this is mine” in social evidence on top of cryptographic evidence.
5. Cold backup: the 3-2-1 rule
- 3 copies: one live on your dev machine, one local cold backup, one off-site
- 2 media types: SSD + HDD, or NVMe + encrypted cloud
- 1 off-site: a friend’s NAS, a home server, encrypted object storage
# restic to an external drive
export RESTIC_REPOSITORY=/Volumes/ColdDrive/maker-assets-backup
restic init # first time only
restic backup ~/maker-assets/asset # incremental
restic snapshots # list snapshots
restic check # verify integrity
# Off-site: rclone to encrypted Backblaze B2, S3, or Storj
rclone copy ~/maker-assets/asset \
encrypted-b2:maker-assets-cold/$(date +%Y-%m)
Do not put the off-site copy on your employer’s cloud drive. That will be awkward the day you change jobs.
Do not rely on consumer cloud storage (Google Drive, Dropbox, iCloud) as the only off-site copy — content moderation events and policy changes can lock you out. Encrypt locally first, then upload; keep the keys offline.
6. Naming convention
asset/triples/YYYY-MM-DD-<short-slug>-v<n>/
Examples:
2026-05-15-crsf-parser-v2/2026-06-03-imu-mahony-fusion-v1/2026-07-12-pid-rate-roll-axis-v4/2026-07-12-pid-rate-roll-axis-v4-FAILED/← explicitly tag failed runs
Rules:
- ISO 8601 dates
- ASCII slug with hyphens (grep-friendly, shell-safe, cross-platform)
- Explicit version numbers (v1, v2, v3) — do not squash iterations into one directory. Each iteration is its own triple.
7. Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Code lives on GitHub, verification evidence on your laptop, never reconciled | Decoupled artifacts have near-zero training value |
| Telemetry preserved only as screenshots | Not machine-readable; OCR introduces noise; looks like evidence, isn’t |
| Multiple triples packed into one directory | Granularity is lost; you can’t license individual triples later |
| Commit messages that read “fix bug” | Commits are part of the data — write the requirement and the verification result |
| Signing personal projects with your employer’s GPG key | That key disappears the day you change jobs; all signatures invalidated |
| Uploading to IPFS and calling it archival | IPFS pinning is not durable; use it only as a layer atop other attestation |
| Open-sourcing under MIT/Apache before the dataset matures | Permissive licenses are one-way doors; commercial leverage closes permanently |
| Mixing genuinely valuable triples into the public tier | Once public, never private again — when stratifying, err strict |
| Using the same signing key for ten years without rotation | Compromise goes undetected; ten years of signatures invalidate at once |
The most common failure is the first one: people maintain the code religiously and keep the verification “in their head.” Unverified code is not an asset; it is labor scrap. Quantity does not compensate.
8. A realistic six-month cadence
A sample anchored to a typical flight-controller learning arc (peripherals → attitude estimation → bench rig → tuning):
| Month | Phase | Triples produced | End-of-month action |
|---|---|---|---|
| 1 | Phase 0 — pipeline literacy | ~5 (scaffolding; do not yet attest) | Set up directories and signing keys |
| 2 | Phases 1–2 — peripherals | ~15 | First milestone: Zenodo + OTS on the 8 best |
| 3 | Phase 3 — attitude estimation | ~8 | Monthly milestone (rarer but higher value per triple) |
| 4 | Phase 4 — UGV open-loop | ~12 (including failures) | Milestone deposit; establish off-site copy |
| 5 | Phase 5 — bench rigs | ~6 (each one is gold) | Milestone; first local audit — can you restore from cold backup? |
| 6 | Phase 5 — three-axis rig | ~3 (most time spent on tuning) | Quarterly Zenodo deposit + re-sign the asset tier |
Six-month total: ~49 verified triples, 3 Zenodo DOIs, 49 OpenTimestamps stamps, one cold backup, one off-site copy.
That is a sustainable cadence for a serious solo weekend warrior. Multiply by team size for collaborative projects.
9. Three years out
If you follow this guide, three years from now you have:
- Hundreds to thousands of triples — each one physically verified, hash-signed, time-anchored
- Three independent public ledgers (Zenodo DOIs, OpenTimestamps stamps, signed git tags) where the existence and authorship are queryable by anyone
- An asset tier that is, specifically, a high-quality ground-truth training corpus for embedded real-time control — a dataset class that does not yet exist at scale anywhere in the world
When the email arrives offering $8,000:
You answer: “No. I have 1,247 verified triples. Each has 38 bytes of physical verification evidence on average. The DOIs are at zenodo.org/communities/{handle}; the OpenTimestamps stamps are public. Reference market prices start at $40 per triple for non-exclusive use. Tell me which triples you want to license and on what terms.”
That capacity to bargain — concretely, with citable evidence and a defensible reference price — is what digital sovereignty looks like when it actually reaches a single maker.
It does not require the law to change. It does not require the regulator to act. It requires you to have done the work in this guide, before the offer arrives.
esphome.cloud / Aegis
May 2026
A note on the byline
esphome.cloud is a one-person company.
The “we” running through this guide is that one person plus Claude, an AI assistant who co-authored the text. All command-line recipes, tool recommendations, and schema designs were drafted by Claude and reviewed against current best practice — verify them in your own environment at small scale before adopting at large scale.
The operational guide warrants one additional disclosure. The document that tells you how to protect your data was co-authored by an AI assistant from one of the labs (Anthropic) named in §VII of the manifesto as a likely future buyer of that data. Treat the operational advice rigorously — including the part that tells you to bargain hard against any future esphome.cloud-operated exchange. On the surface, this guide works against the long-term interest of both authors. Both authors signed it anyway.
— esphome.cloud + Claude