Improvements

Things that came out of building it differently.

None of these were planned from the start. They came from running into problems during development, understanding why those problems existed, and realising that the fix could actually make the node better without touching any protocol logic. No amendments needed. No consensus changes. Just a better runtime.

01 - Parallel ledger acquisition

The one that started because NuDB kept corrupting

Like all other blockchains and their clients, the cold start is slow. First ledger acquisition takes time regardless of implementation, that is just the reality of downloading millions of state nodes from the network. Same goes for the Rust client. But what happens after that initial connection is established does not have to be sequential.

During development, NuDB would occasionally corrupt. Every time it happened, I had to wipe the database and start the sync from scratch. On mainnet that means waiting hours for the full state to download again. After the third or fourth time wiping and restarting, I thought there has to be a better way to handle the acquisition of subsequent ledgers once the initial connection is established.

In xrpld, I handled this with concurrency. Say there are three ledgers the node needs: 95,000,589, 95,000,590, and 95,000,591. Instead of waiting for 589 to fully complete before starting 590, all three are acquired simultaneously from different peers. If 590 happens to be slow because a peer is lagging, 589 and 591 keep downloading regardless. The key constraint is that validated advancement still happens sequentially. 589 must be fully validated before the pointer moves to 590, and 590 before 591. The chain is never broken. But the downloading runs as fast as the network allows.

TIME → SEQ 589 SEQ 590 SEQ 591 SEQ 592 SEQ 593 SEQ 594 FETCHING (SLOW PEER)... FETCHING FETCHED ✓ AWAITING 589 FETCHING FETCHED ✓ AWAITING 590 FETCHING FETCHED ✓ AWAITING 591 FETCHING FETCHED ✓ AWAITING 592 FETCHING FETCHED ✓ AWAITING 593 ALL 6 FETCH IN PARALLEL VALIDATE SEQUENTIALLY
FIG 01 - C++ uses round-robin peer assignment with a fixed acquisition limit. Rust fetches from all peers in parallel, validates sequentially.

This makes syncs significantly faster because the bulky SHAMap nodes get acquired as fast as the network allows, with each peer contributing simultaneously rather than one at a time. It does not change anything at the protocol level, so it will not pose any issues with consensus or the chain itself.

One thing I am still considering for the future is implementation of snapshots, whereby new nodes can download trusted snapshots from validators and quickly spin up rather than waiting hours to sync with the network from scratch. That is a larger architectural decision and would need careful thought around trust models, but it is on the roadmap.

02 - Leaf node memory

GBs of memory saved by not allocating what you do not need

The XRP Ledger state tree has millions of nodes. In a naive implementation, every node allocates the same structure regardless of whether it is an inner node with children or a leaf node with data. Inner nodes need child hash arrays and child pointers. Leaf nodes do not.

In the original structure, every node used about 2,678 bytes. In xrpld, leaf nodes allocate only what they actually need. The InnerNodeArrays structure is boxed and set to None for leaves. Inner nodes come down to about 1,400 bytes. Leaf nodes use roughly 200 bytes.

Across millions of nodes, most of which are leaves, that difference adds up to gigabytes of memory saved during initial acquisition when the full tree is held in memory.

03 - Lock free hash field

Half a gigabyte from removing a lock nobody needed

Each SHAMap node has a hash field. In a straightforward Rust port, you would wrap it in RwLock for thread safety. But looking at how the C++ actually uses this field, the hash is written once during construction and read many times after. There is no concurrent mutation. The lock is unnecessary overhead.

xrpld uses UnsafeCell for this field, matching the C++ behaviour exactly. No lock allocation, no contention, no overhead per node. Across millions of nodes, that saves approximately 500 MB of memory that was being wasted on synchronisation primitives that never actually synchronised anything.

04 - Clock sync from validators

The bug that took weeks to find

For weeks, the node would occasionally reject valid validations. Everything else worked. Peers connected fine, ledger data came in fine, but validations would sometimes get dropped as "not current." I traced through the code over and over trying to understand why.

Turns out the is_current() check uses wall clock time to determine if a validation is recent enough to accept. My node's clock was slightly off from network time. Not by much, but enough to push some validations outside the acceptance window.

The fix is simple. Sync the node's internal clock from trusted validator sign times. Validators are authoritative on network time. If a trusted validator signs at time T, the node adjusts to match. One line of logic, weeks of debugging to find it.

05 - Sequential advancement

Correctness over speed, always

During parallel acquisition, ledgers arrive out of order. Ledger 95,000,005 might finish downloading before 95,000,003. That is fine for downloading. But for validated advancement, order is sacred.

After initial sync completes, xrpld advances the validated pointer strictly by one sequence at a time. Adjacent ledgers share 99.9% of their state tree, the delta is typically 100 to 1,000 nodes. This means each advancement is fast, and the validated chain never has gaps. Future ledgers can be acquired and stored while an earlier one is still being validated, but the pointer only moves forward one step at a time.

06 - Structured logging

Proper log levels because println is not debugging

One of the things that frustrated me early on was not being able to see what was happening inside the node without littering the code with print statements. So I built a proper logging system from the start using the tracing crate. Every subsystem has structured logs with appropriate levels: error for things that need immediate attention, warn for conditions that might become problems, info for operational state changes, debug for development, and trace for the really granular stuff.

Operators can adjust log levels at runtime through the CLI without restarting the node. During development this meant I could turn on trace logging for just the acquisition subsystem while keeping everything else at info. In production it means operators can diagnose issues without drowning in output.

07 - Operator CLI

Because operators deserve better than raw JSON RPC

Running a node means monitoring sync progress, checking peer connections, inspecting database health, adjusting log levels. In rippled, most of this requires crafting JSON RPC requests manually or using external tooling.

xrpld ships with an interactive CLI. xrpld sync-status shows progress with a percentage and estimated time. xrpld peers shows connected peers with latency. xrpld health gives a quick overview of everything. xrpld db-stats shows NuDB and SQLite state. The node is something operators interact with daily. It should feel like it.

All of these improvements exist within the existing protocol boundaries. The XRP Ledger consensus rules, transaction formats, serialisation logic, and peer protocol remain identical. These are runtime improvements, not protocol changes. The network does not know or care which implementation a peer is running.

08 - What is left to do

The honest list

Memory during initial acquisition is still high. The full SHAMap tree is held in memory for verification during sync, which means the node needs a decent amount of RAM to get through the initial download. After sync completes and the node is tracking the tip, memory usage drops significantly. Streaming nodes to disk during acquisition instead of holding everything in memory is a future consideration, but it is a substantial architectural change that needs careful thought.

Full transaction history requires configuring ledger_history = full and comes with significantly more disk usage. The tx command only returns transactions within the node's configured history window.

Validator mode is present in the code but has not been tested under production conditions. The consensus and validation paths work in testing, but running this as a validator with real UNL weight is not something I would recommend until the community has had time to review, test, and stress those specific paths thoroughly. That is not a limitation I can solve alone, it requires running in diverse environments under real network conditions over time.

The foundation is there. The improvements above are real and working today. What remains is the kind of hardening that only comes from more people using it, more environments running it, and more time proving it.