Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How to Deposit Your Maker Dataset

Turning your accumulated triples into a tradeable asset before the market arrives


Why this guide exists

The Maker Data Sovereignty Manifesto §IX lists six things: stratify, hash, sign, cold-backup, name, don’t sell early.

The manifesto explains why. This guide explains how — the directory layouts, the command lines, the file formats. It is operational, not philosophical. Copy and adapt.

If you have not read the manifesto first, stop and do that. Otherwise every step below will look arbitrary.


1. The three-tier directory layout

Recommended on your development machine:

~/maker-assets/
├── public/                       # Tier 1: open-sourceable
│   ├── teaching-snippets/        # heartbeat, blink, demos
│   ├── blog-drafts/
│   └── README.md
│
├── private/                      # Tier 2: real project code
│   ├── flight-controller-v3/
│   ├── ugv-experimental/
│   └── README.md
│
└── asset/                        # Tier 3: the valuable layer
    ├── triples/                  # physically verified ground-truth triples
    │   ├── 2026-05-15-crsf-parser-v2/
    │   ├── 2026-05-22-imu-mahony-fusion-v1/
    │   └── ...
    ├── deposits/                 # attestation receipts
    │   ├── 2026-05-zenodo-receipts.json
    │   └── 2026-05-ots-stamps/
    └── README.md

The rule that matters: three separate git repositories, three signing keys, three backup schedules. Mixing tiers costs you sovereignty — the day you license one piece, anything sharing a repo with it is at risk of being inferred from the transaction.

Each entry under asset/triples/ is a complete, self-contained triple. The next section defines what that means.


2. What a “training-valuable build” looks like

Every triple directory contains, at minimum:

2026-05-15-crsf-parser-v2/
├── manifest.yaml                 # machine-readable metadata
├── requirement.md                # the natural-language requirement you wrote
├── code/                         # agent-generated, compiler-accepted source
│   ├── main/
│   ├── components/
│   └── CMakeLists.txt
├── build/
│   ├── build-output.log          # full build log
│   ├── size-report.txt
│   └── flash_bundle.tar.gz.sha256  # bundle hash; the bundle itself may live outside the repo
└── verification/                 # the proof that it worked
    ├── monitor-capture.log       # serial output from espctl monitor
    ├── oscilloscope-ch9.png      # scope evidence where applicable
    ├── flight-notes.md           # your field observations
    └── verification-summary.md   # one-line outcome

This is the file that makes your dataset legible to a future market. Structured, machine-readable, self-describing:

version: 1
timestamp: 2026-05-15T14:23:00+08:00
contributor:
  pubkey_fingerprint: SHA256:Abc123...    # fingerprint of your signing pubkey
  alias: optional-handle                   # public handle, may be empty

hardware:
  board: ESP32-S3-DevKitC-1 v1.1
  chip: esp32s3
  chip_revision: v0.2
  peripherals:
    - kind: imu
      part: ICM-42688-P
      interface: SPI3
      address_or_cs: GPIO10
    - kind: rc_receiver
      part: BetaFPV ELRS Lite
      interface: UART1
      baud: 420000
      protocol: CRSF

firmware:
  framework: esp-idf
  idf_version: v5.3.1
  source_tree_sha256: <hash>
  binary_sha256: <hash>
  size_bytes: 184320
  build_target: esp32s3
  built_via: esphome.cloud                # or local IDF, or self-hosted

requirement:
  language: en                            # or zh-CN, etc.
  text: |
    Connect an ELRS receiver to UART1 at 420000 baud and parse the CRSF
    stream (CRC8 poly 0xD5). Log all 16 channels (11-bit packed) via
    ESP_LOGI at 50 Hz.

verification:
  method: serial_monitor                  # or jtag, oscilloscope, flight_test
  duration_seconds: 60
  evidence:
    - kind: log
      path: verification/monitor-capture.log
      sha256: <hash>
    - kind: image
      path: verification/oscilloscope-ch9.png
      sha256: <hash>
  outcome: passed                         # passed | failed | partial
  notes: |
    All 16 channels track stick inputs in real time.
    CRC failure rate < 0.1% over the 60-second window (2 frames dropped).

license:
  retained_by_contributor: true
  permitted_uses: []                      # default empty: any external use requires explicit opt-in
  permitted_buyers: []
  exclusivity_offered: false

Keep outcome: failed triples too. Failure data has high training value — it tells the model what not to generate. Most makers reflexively delete failures. Don’t.

The empty permitted_uses: [] default is not cosmetic. It means: the absence of explicit license is a refusal, not a gap. Future arbitration anchors on this.


3. Milestone hashing

Don’t try to attest every single triple — too granular, too expensive, too painful. Instead, batch-attest at milestones: monthly, or after a meaningful phase completes.

Do both Zenodo and OpenTimestamps. They are complementary, not redundant.

Zenodo (gives you an academically-citable DOI)

# 1. Bundle the month's triples
cd ~/maker-assets/asset
tar czf milestone-2026-05.tar.gz triples/2026-05-*/

# 2. Hash it
shasum -a 256 milestone-2026-05.tar.gz
# milestone-2026-05.tar.gz: f0a63ee2c4d8...

# 3. Upload via Zenodo web UI or REST API
#    https://zenodo.org/api/deposit/depositions
#    Fill in: title, creators, description, keywords
#    Receive a DOI, e.g.: 10.5281/zenodo.1234567

# 4. Save the receipt locally
cat > deposits/2026-05-zenodo-receipt.json <<EOF
{
  "milestone": "2026-05",
  "doi": "10.5281/zenodo.1234567",
  "tarball_sha256": "f0a63ee2...",
  "uploaded_at": "2026-05-31T23:59:00+08:00"
}
EOF

Zenodo is free, CERN-backed, long-term preserved, and issues persistent identifiers. Academic, policy, and arbitration audiences all accept Zenodo records.

OpenTimestamps (anchors hashes into Bitcoin block time)

# Install
pip install opentimestamps-client

# Stamp the milestone bundle
ots stamp milestone-2026-05.tar.gz
# produces milestone-2026-05.tar.gz.ots

# A few hours later, upgrade once the Bitcoin blocks confirm
ots upgrade milestone-2026-05.tar.gz.ots

# Verify any time, from anywhere, by anyone
ots verify milestone-2026-05.tar.gz.ots

# Store the receipt
mv milestone-2026-05.tar.gz.ots deposits/2026-05-ots-stamps/

OpenTimestamps is free, irrevocable, institutionally independent. The Bitcoin block headers serve as a public, distributed ledger. Even if Zenodo ceases to exist in thirty years, the Bitcoin timestamp remains verifiable.

Why both: Zenodo provides social legibility (“this was published in May 2026; here is its DOI”); OpenTimestamps provides cryptographic provability (“this exact hash existed on or before Bitcoin block N”). Future legal arbitration may need either or both.


4. Cryptographic signing

Pick tools you can sustain for five years. Sophistication that you drop in a year is worse than simplicity you keep.

SSH key signing for git commits (simplest path — git 2.34+ supports this natively):

# Use the same SSH key you already use
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub
git config --global commit.gpgsign true
git config --global tag.gpgsign true

# Verify signature on a commit
git log --show-signature

minisign for non-git artifacts (release tarballs, datasets):

# Debian/Ubuntu
sudo apt install minisign
# macOS
brew install minisign

# Generate a signing key (back up the private key carefully)
minisign -G

# Sign the milestone bundle
minisign -Sm milestone-2026-05.tar.gz
# produces milestone-2026-05.tar.gz.minisig

# Verify
minisign -Vm milestone-2026-05.tar.gz -p minisign.pub

Why not GPG

GPG has a complex workflow, an inconsistent keyserver ecosystem, subkey management is a known landmine, and in five years you will almost certainly have lost the passphrase, the keyring, or both. Unless you are already a competent GPG user, do not start a new GPG identity for this. SSH + minisign covers every case below the threshold of needing X.509 PKI.

Publish your public keys

Post your public key fingerprint on your personal site, your GitHub profile, ORCID if you have one, and ideally keyoxide.org. This grounds “this is mine” in social evidence on top of cryptographic evidence.


5. Cold backup: the 3-2-1 rule

  • 3 copies: one live on your dev machine, one local cold backup, one off-site
  • 2 media types: SSD + HDD, or NVMe + encrypted cloud
  • 1 off-site: a friend’s NAS, a home server, encrypted object storage
# restic to an external drive
export RESTIC_REPOSITORY=/Volumes/ColdDrive/maker-assets-backup
restic init                                  # first time only
restic backup ~/maker-assets/asset           # incremental
restic snapshots                              # list snapshots
restic check                                  # verify integrity

# Off-site: rclone to encrypted Backblaze B2, S3, or Storj
rclone copy ~/maker-assets/asset \
  encrypted-b2:maker-assets-cold/$(date +%Y-%m)

Do not put the off-site copy on your employer’s cloud drive. That will be awkward the day you change jobs.

Do not rely on consumer cloud storage (Google Drive, Dropbox, iCloud) as the only off-site copy — content moderation events and policy changes can lock you out. Encrypt locally first, then upload; keep the keys offline.


6. Naming convention

asset/triples/YYYY-MM-DD-<short-slug>-v<n>/

Examples:

  • 2026-05-15-crsf-parser-v2/
  • 2026-06-03-imu-mahony-fusion-v1/
  • 2026-07-12-pid-rate-roll-axis-v4/
  • 2026-07-12-pid-rate-roll-axis-v4-FAILED/ ← explicitly tag failed runs

Rules:

  • ISO 8601 dates
  • ASCII slug with hyphens (grep-friendly, shell-safe, cross-platform)
  • Explicit version numbers (v1, v2, v3) — do not squash iterations into one directory. Each iteration is its own triple.

7. Anti-patterns

Anti-patternWhy it fails
Code lives on GitHub, verification evidence on your laptop, never reconciledDecoupled artifacts have near-zero training value
Telemetry preserved only as screenshotsNot machine-readable; OCR introduces noise; looks like evidence, isn’t
Multiple triples packed into one directoryGranularity is lost; you can’t license individual triples later
Commit messages that read “fix bug”Commits are part of the data — write the requirement and the verification result
Signing personal projects with your employer’s GPG keyThat key disappears the day you change jobs; all signatures invalidated
Uploading to IPFS and calling it archivalIPFS pinning is not durable; use it only as a layer atop other attestation
Open-sourcing under MIT/Apache before the dataset maturesPermissive licenses are one-way doors; commercial leverage closes permanently
Mixing genuinely valuable triples into the public tierOnce public, never private again — when stratifying, err strict
Using the same signing key for ten years without rotationCompromise goes undetected; ten years of signatures invalidate at once

The most common failure is the first one: people maintain the code religiously and keep the verification “in their head.” Unverified code is not an asset; it is labor scrap. Quantity does not compensate.


8. A realistic six-month cadence

A sample anchored to a typical flight-controller learning arc (peripherals → attitude estimation → bench rig → tuning):

MonthPhaseTriples producedEnd-of-month action
1Phase 0 — pipeline literacy~5 (scaffolding; do not yet attest)Set up directories and signing keys
2Phases 1–2 — peripherals~15First milestone: Zenodo + OTS on the 8 best
3Phase 3 — attitude estimation~8Monthly milestone (rarer but higher value per triple)
4Phase 4 — UGV open-loop~12 (including failures)Milestone deposit; establish off-site copy
5Phase 5 — bench rigs~6 (each one is gold)Milestone; first local audit — can you restore from cold backup?
6Phase 5 — three-axis rig~3 (most time spent on tuning)Quarterly Zenodo deposit + re-sign the asset tier

Six-month total: ~49 verified triples, 3 Zenodo DOIs, 49 OpenTimestamps stamps, one cold backup, one off-site copy.

That is a sustainable cadence for a serious solo weekend warrior. Multiply by team size for collaborative projects.


9. Three years out

If you follow this guide, three years from now you have:

  • Hundreds to thousands of triples — each one physically verified, hash-signed, time-anchored
  • Three independent public ledgers (Zenodo DOIs, OpenTimestamps stamps, signed git tags) where the existence and authorship are queryable by anyone
  • An asset tier that is, specifically, a high-quality ground-truth training corpus for embedded real-time control — a dataset class that does not yet exist at scale anywhere in the world

When the email arrives offering $8,000:

You answer: “No. I have 1,247 verified triples. Each has 38 bytes of physical verification evidence on average. The DOIs are at zenodo.org/communities/{handle}; the OpenTimestamps stamps are public. Reference market prices start at $40 per triple for non-exclusive use. Tell me which triples you want to license and on what terms.”

That capacity to bargain — concretely, with citable evidence and a defensible reference price — is what digital sovereignty looks like when it actually reaches a single maker.

It does not require the law to change. It does not require the regulator to act. It requires you to have done the work in this guide, before the offer arrives.


esphome.cloud / Aegis
May 2026


A note on the byline

esphome.cloud is a one-person company.

The “we” running through this guide is that one person plus Claude, an AI assistant who co-authored the text. All command-line recipes, tool recommendations, and schema designs were drafted by Claude and reviewed against current best practice — verify them in your own environment at small scale before adopting at large scale.

The operational guide warrants one additional disclosure. The document that tells you how to protect your data was co-authored by an AI assistant from one of the labs (Anthropic) named in §VII of the manifesto as a likely future buyer of that data. Treat the operational advice rigorously — including the part that tells you to bargain hard against any future esphome.cloud-operated exchange. On the surface, this guide works against the long-term interest of both authors. Both authors signed it anyway.

— esphome.cloud + Claude