Blockchain OnDevice Federated Learning
Definitions
Federated Learning
Machine learning technique involving the training of algorithms across multiple decentralized devices holding local data samples — data is not shared between these machines.
Notes
Abstract
A proposed blockchain federated learning (BlockFL) architecture where local model updates are exchanged and verified, enabling ondevice machine learning without centralized training data or coordination by utilizing consensus mechanisms.
Proposes an endtoend latency model of BlockFL, characterizing optimal block generation rate considering communication, computation, and consensus delays.

Latency is minimized by adjusting PoW difficulty.
Ondevice machine learning advances low latency, reliable wireless systems as highquality models can be stored ondevice, enabling agency (decisionmaking) even when disconnected.
Training these ondevice models would require more data than each device’s local samples and necessitates exchanges with other devices.

Local data is owned by each device, and thus exchanges should keep raw data private from other devices.

Each device exchanges its local model update, the weight and gradient parameters from which raw data cannot be derived.

In Google’s vanilla FL, a centralized server aggregates and takes an average of all local model updates, yielding a global model update to be downloaded by each device. The centralized nature of this method somewhat defeats the purpose of decentralization.

Vanilla FL’s training exchange is lengthy.
4. Device contribution is not rewarded in Vanilla FL, that is, a device providing a larger number of data samples has no compensation incentive to federate with devices possessing fewer data samples.
An onchain solution is proposed in lieu of a central server, enabling exchanging devices' local model updates, verification, and corresponding rewards.
The system consists of devices AND miners (devices collect data and execute the model, and miners can be external nodes).
This model aims to solve a regression (logistic regression analysis) problem in a parallel manner (distributed computing).
Aiming to minimize a loss function (a measure of how well the model can predict the next outcome, the lower loss is better) for a global weight vector (values which alter the statistical significance of data points in the model, giving them more or less ‘weight’ on how much they sway predictions).

To minimize this loss function, the model of a device is locally trained using a stochastic variancereduced gradient algorithm.

This is one of the most effective methods currently used for optimizing machine learning problems. It works best for convex problems (convex functions over convex sets, a line connecting any two points on the function’s curve falls above the curve), which are more generalized and easier to solve than linear problems (functions that curve up and down) Link to Optimization Problem Types — Convex Optimization.
SVRG aims to reduce random variance by storing an average of the function’s gradients and updating over epochs (each run over a training set). There is, however, some criticism of the effectiveness of SVRG in modern deep learning optimization, particularly as it falls short for nonconvex optimization problems.
Link: On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Local model updates are aggregated via a distributed approximate newton method. A distributed approximation variety of Newton’s method for finding the roots of a differentiable function by iteratively fitting the curvature of the function by locating its critical points.

BlockFL is designed to exchange local model updates through a distributed ledger.

Each ledger is divided into body and header parts. The body, a local module update at the nth epoch, as well as computation time T and header, a pointer to the previous block, block generation rate, and PoW output value (for determining rewards).
2. Each miner has a candidate block (a block that a mining node (miner) is trying to mine in order to receive the block reward), after PoW, the miner randomly generates a hash (using input numbers until it reaches a hash smaller than a target).
3. Block generation rate (the rate at which blocks can be created) can be controlled by altering the PoW difficulty (computability/complexity).
4. BlockFL is also capable of operating on other consensus algorithms such as PoS and Byzantinefaulttolerance but may require a more complex operation.

Byzantinefaulttolerance: A solution to the two generals' problem, where two agents have imperfect information as to whether or not something has worked properly (i.e. confirmation of a transaction). Bitcoin uses PoW consensus to handle BFT. Idempotency tokens are also often used to stop duplicate instructions between APIs.
5. If one miner succeeds in generating a block, other miners may unintendedly apply the wrong global model to their local ledgers, i.e. they will fork the chain.

Forking frequency increases with the block generation rate.
6. Devices are given data rewards from their associated miner proportional to the amount and quality of data supplied.

Data quality control incentives aim to stop untruthful devices from inflating their data for undeserved rewards.

Miners verify local updates by comparing the sample size against computation time — this can be done using Intel’s software guard extensions.
7. Miners are given inventive rewards in the conventional blockchain fashion.
OneEpoch BlockFL Operation
1. Local model update from device.
2. Local model upload, device randomly associates with a miner and uploads the local model and computation time to the miner.
3. Miners broadcast obtained local model updates from their associated devices and cross verify, recording updates in the device’s candidate block.
4. Miners run PoW until locating a nonce (number only used once, i.e. unique hash) or a block is generated.
5. Miners broadcast finding a nonce, and a candidate block is generated and broadcast to all miners to avoid forking.
6. Device downloads the global model from its associated miner.
7. Device D locally computes a global model update.
Latency

Communication delays.

Learning completion latency T.

Epoch time L.

Block generation rate λ.

BlockFL actually begins with initially higher accuracy than VanillaFL but converges after a number of iterations, the standard model without federation performs significantly worse.

Higher block generation rate (low PoW difficulty) increases forking frequency, which increases the learning completion latency.

Low block generation rate (high PoW difficulty) incurs its own overheads with excessive latency. This is why the formula was derived to find a convex function (remembering SVRG) optimal block generation rate.

More miners can also increase latency as the crossverification and block propagation times scale proportionally.