Federated Learning Protocol Secured by EigenLayer

Federated Learning Protocol Secured by EigenLayer

Introduction

This post outlines the general idea behind a Federated Learning Protocol secured by EigenLayer, focusing on data privacy, data compensation, and the use of distributed computing resources in the context of large language model (LLM) training and inference. The proposed protocol aims to enable users to collaboratively train an on-chain Foundational Large Language Model (yes, like ChatGPT…) using computational resources available off-chain (Local Model training) and private data available off-chain (locally available data).

Protocol Architecture

The Federated Learning Protocol would be implemented in the following stages:

  1. Global Model Initialization: A Foundational Large Language Model (hereafter referred to as the “Global Model”) is initialized and its starting parameters (weights and biases) are stored through EigenDA or another similar data availability layer. The Global Model serves as a starting point for Local Model training.
  2. Global Model Parameter Sharing: The parameters of the Global Model are shared with all restakers who have opted into the Federated Learning protocol.
  3. Local Model Training by Restakers: Each restaker obtains the parameters from the Global Model and uses them to initialize their Local Model. Opted-In restakers then train this Local Model on their own local data. Note that, while each restaker’s Local Model has the same architecture and size as the main Global Model that lives on Ethereum (or an L2), the key difference lies in the size of the dataset used for training; since each restaker locally trains the latest version of the Global Model on only their local data (rather than on the entire data set used to create the Global Model), the training process for Local Model training is vastly less computationally intensive. This is a fundamental principle of federated learning and makes it more feasible for restakers to train on their local hardware. Furthermore, it is worth noting that, with the continued exponential advancements in open-source LLM compression in recent months, it is anticipated that the computational and capital resources required to train a powerful LLM locally, will continue to go lower. For example, see Alpaca 7B, which was released by Stanford just last week, is as powerful as DaVinci-3 and costs less than $600 to train, with no special hardware and no high-end graphics card.
  4. Local Model Parameter Updates: After local training on a given restaker’s machine is complete, each restaker computes an update to the Local Model parameters based on the local training results. This update is represented as a gradient. For those not familiar with deep learning and the gradient descent optimization algorithm, the gradient is a vector of partial derivatives with respect to the model parameters (i.e., the weights and biases). These gradients represent the changes needed in the model parameters to minimize the model’s loss function, which quantifies the difference between the model’s predictions and the actual target values in the training data. In the context of this proposed Federated Learning Protocol, the restaker encrypts their newly computed gradient vector using an additive secret sharing scheme that involves splitting their locally computed gradient into multiple random shares, which, when combined, reveal the original gradient. Additionally, the restaker submits a zero-knowledge proof (ZKP) with their gradient vector result to prove that he/she/it has followed the training procedure prescribed in the protocol’s slashing contracts and has computed the model gradients accurately, while keeping the data used in the training process private.
  5. Aggregation: The restaker sends one share of their split gradient to each of the other restakers participating in the aggregation process (the protocol can be designed such that all or only some of the restakers participate in parameter aggregation). Each restaker participating in aggregation should now have one share of every other restaker’s gradient. Every restaker then sums the shares they have received from other restakers. This results in a new share, which is a partial sum of the encrypted gradients. These partial sums are then sent to the Aggregator (a smart contract or a designated restaker). The Aggregator combines the partial sums to obtain the aggregated gradient, while ensuring that individual updates received from each restaker is not exposed during the aggregation process.
  6. Global Model Update: The Aggregator uses the aggregated gradient to update the Global Model parameters on-chain, resulting in an improved version of the Global Model.

The sharing of the updated Global Model parameters (Step 2 above) is then repeated and the process for updating Global Model parameters of the Federated Learning Large Language Model is repeated in perpetuity, ever improving from the constant addition of new private data on which restakers locally train.

Protocol Participants

The proposed Federated Learning Protocol will consist of the following participants:

EigenLayer Restakers: ETH validator nodes responsible for providing computational resources for training the Global Model locally off-chain using a vastly smaller dataset than that used to produce the Global Model’s parameters. Restakers are compensated for making use of their computational resources and updating the state of the Global Model’s on-chain parameters.

Aggregator: A designated validator node or smart contract responsible for securely aggregating Local Model gradient updates from restakers to make a state update to the on-chain Global Model parameters.

Global Model Users: Individuals and entities that interact with the on-chain Global Model in exchange for a usage fee.

Compensation for Computational Resources and Slashing Risk

As with all middleware services built on EigenLayer, Restakers would receive ETH compensation for the computational resources they provide to the protocol, such as processing power, memory, and storage, needed to continually improve the Global Model’s loss function over time, and to compensate for the risk of unintended or malicious slashing.

Compensation for Private and Proprietary Information

Aside from being compensated for the critical function of providing computational resources, arguably more interesting would be the potential to compensate Restakers for the value of their locally stored private data/information that they have access to for Local Model training.

Individuals and entities have access to an inordinate amount of personal and/or proprietary information, that is highly private or gives them some sort of competitive advantage over their peers. This information is often stored locally on individuals’ and entities’ devices and/or is shared only with specific trusted third parties.

To incentivize Restakers to include locally stored high-quality, private, and/or proprietary data/information in their Local Model training dataset, the Federated Learning Protocol could implement a fingerprinting system whereby Restakers can be further compensated based on the influence their gradient has on the Global Model update and can do so in a way that still preserves the privacy of their data/information.

To implement this Data Compensation System, the protocol would need to:

  1. Generate a unique “fingerprint” for each Restaker when they opt into the protocol. This fingerprint could be derived from the Restaker’s wallet address.
  2. Attach the fingerprint to that restaker’s gradient submission when the restaker sends the updated gradient calculated by their Local Model to the Aggregator, including the fingerprint as metadata along with the gradient update. Note that the gradient computed by the restaker is based on whatever private data they elect to train on, and so their computed gradient inherently serves as a proxy for the data they elect to include in training their Local Model version.
  3. The Aggregator receives gradient updates from restakers and associates each gradient update with the corresponding fingerprint, aggregating the updates in such a way that individual gradients are not exposed during the process. During the aggregation process, the fingerprints associated with each gradient update are preserved, while ensuring that the privacy of the gradients is maintained during secure aggregation.
  4. Once the aggregation process is completed, the Aggregator updates the on-chain parameters of the Global Model with the aggregated parameters from the culmination of Local Model gradients. At this point, the fingerprints can be used to determine the influence each restaker’s Local Model gradient had on the updated state of the Global Model. This information can be stored and used later to distribute rewards to restakers based on their contributions.
  5. Implement a compensation mechanism that rewards restakers based on the influence of their gradient updates on the Global Model. This can be done by analyzing the impact of each gradient update on the Global Model’s performance (e.g., reduction in the loss function) and/or by tracking the usage of the Global Model and attributing a portion of the revenue generated by interaction with the Global Model to each contributing restaker based on their fingerprint.

Disclaimer: I am far from a technical person, as can likely be seen from the contents of this post. I have practically zero background in machine learning, distributed systems, or cryptography. With that said, the feasibility of building the system I propose above should (and I’m sure will) be called into question by those that are more technically minded. The purpose of this post is simply to get the juices flowing!

Cheers and looking forward to some more discussion!

16 Likes

Thanks so much @mdesim01 for proposing this interesting federated learning on EigenLayer idea! The idea that blockchain systems, particularly highly decentralized trust systems will need to play a major role in democratizing both AI training and AI inference is a thesis we deeply believe in.

There are some technical questions that need to be answered before building this: (i) is EigenLayer being used for getting economic security for federated learning? Or simply for getting decentralization security? If the former, we need to understand the slashability conditions. If it is the latter, we can simply choose light-weight home stakers etc to run the AI training (because this can ensure collusion resistance). One issue here though is how to ensure that the task is lightweight enough for a decentralized quorum to participate. (ii) How are the EigenLayer nodes getting the local data points for federated learning? One tantalizing possibility here is to use this very same system for inference (clients query random EL restaked nodes for ML inference queries) and this preserves some amount of privacy for the querying node (instead of a single OpenAI system getting access to all query information, we can eath client can send each query to random nodes to get some amount of privacy). Now the private data available at each node becomes the feeding point for the federated learning.

My high-level feeling is that federated ML systems which are byzantine resistant (resistant to a threshold number of byzantine nodes) may not yet be efficient enough to train very large models on large number of nodes.

Thanks again @mdesim01 for proposing this great idea. ML-on-EL is one of our favorite topics inside the team! Looking forward to more ideas and brainstorming on this topic!

7 Likes

Agree with @sreeramkannan here, the design is not technically feasible. One federated learning / verifiable compute idea I found interesting from awhile ago is [2102.05188] CaPC Learning: Confidential and Private Collaborative Learning, maybe it could be helpful to @mdesim01

However, this does raise an interesting question around FL: instead of using cryptographic/statistical solutions to align the different participants and avoid malicious behaviors, where can economic security fit in to nudge behaviors? I think the catch here is accountability (slashability), which may just push the question back to the crypto/stats proving part.

All in all, personally still think we are still a long way from FL being usable on a bigger scale mainly because of a lack of performance given the security and privacy constraints.

4 Likes

Thank you for your feedback @sreeramkannan ! After hearing Ilya Polosukhin from NEAR on The Chopping Block yesterday, it made me feel really stupid for having even posted this idea of decentralized training in the first place, as it seems so far outside the realm of possibility! It seems the consensus view is that it’s preposterous to expect a large set of personal laptops with no specialized GPUs to replace the need for large, centralized entities with access to massive clusters of NVIDIA A100s. Nonetheless, here’s my best attempt at answering your questions:

(i) is EigenLayer being used for getting economic security for federated learning? Or simply for getting decentralization security? If the former, we need to understand the slashability conditions. If it is the latter, we can simply choose light-weight home stakers etc to run the AI training (because this can ensure collusion resistance). One issue here though is how to ensure that the task is lightweight enough for a decentralized quorum to participate

I was initially envisioning that EigenLayer would be used for both economic security and decentralization security.

Economic security would be realized by slashing restakers if they fail to provide accurate gradient updates based on fine-tuning with their own private data; this would require generation of a ZK proof along with their gradient update submission, in order to prove that the restaker followed the training procedure prescribed by the AVS and computed the gradients accurately without revealing their private training data. My understanding now, however, is that generating a ZKP for training is so insanely computationally expensive that it is a pipe dream and likely to remain that way (would be happy to hear otherwise from someone with knowledge in the field of ZKML).

Decentralization security would be realized by choosing lightweight home stakers, but as you rightly said, ensuring that the training task is lightweight enough to allow home stakers to perform it might be quite a challenge. In my previous post, I implied that training a local model on a small dataset would be a lightweight task by alluding to the fact that the recently released Alpaca 7B LLM used a simple training procedure with only 175 training examples, requiring limited resources available to an individual with no specialized hardware…but after some further reading, I subsequently found out that even this light-weight training process involved fine-tuning on 640GB GPU (or 8 NVIDIA A100, 80GB GPUs) rented from a cloud computing provider, which of course requires trust and centralization (I guess the AVS could purchase their own specialized GPU cluster owned and operated by its tokenholders and restakers who opt into it :slight_smile: !?) In short, it’d seem that decentralized training comes with a whole lot of challenges that may make it untenable, at least in the near to medium term.

(ii) How are the EigenLayer nodes getting the local data points for federated learning?

My thinking was that restakers would only train their Local Models on their own locally available data, not that they’d train on anyone else’s data. In this case, there would be no need for routing data from external data providers to restakers. Instead, the restakers would simply use their own local data for training, which inherently preserves the privacy of that data since it never leaves the restaker’s local environment. This also simplifies the federated learning protocol, as there would be no need for additional privacy-preserving mechanisms to secure a data provider’s data during transmission or storage at the restakers’ end. The gradient from restakers’ locally trained models, trained on only their own data, would be aggregated to influence the parameters of the Global Model, which would then be used for the next iteration of local training. In this way, owners of valuable data would be incentivized to enter the restaking market so that they can better monetize their data by allowing it to influence the Global Model’s parameters, without making the data accessible to anyone (since the training is all done locally). Once again though, the whole idea of being able to actually prove that the local training was done correctly without revealing the data itself seems to be a major challenge.

One tantalizing possibility here is to use this very same system for inference (clients query random EL restaked nodes for ML inference queries) and this preserves some amount of privacy for the querying node (instead of a single OpenAI system getting access to all query information, we can eath client can send each query to random nodes to get some amount of privacy). Now the private data available at each node becomes the feeding point for the federated learning.

Decentralized inference absolutely seems like a far more feasible entry into the AI space, and doing so in a way that routes client queries to random restaked nodes sounds like a really interesting way to avoid a single centralized entity from obtaining all of that private query information. I would think that such a system would require ZK inference in order to prove that a given output was created by a given model; I know there are a number of people/teams that are making progress on that front (WorldCoin and Modulus Labs both come to mind).

I’d be curious to hear more of the EL Team’s ideas around how you guys were thinking a decentralized inference system on EigenLayer would get implemented:

  1. How would a restaker prove proper inference?
  2. Would each restaker generate outputs from a locally-stored model, so as to avoid reliance on the API of a single centralized entity like OpenAI?
  3. Would every restaker’s model be identical or would their models be tailored for certain tasks/queries that would require special routing depending upon the query being made by a client?
  4. Assuming there’s potential that restakers could operate using differing models, would there be any value in having multiple restakers receive a query and provide their outputs and then have those multiple outputs synthesized into a single output that gets returned to the client? Or, alternatively, could there be some means by which to score restakers’ outputs and only return the “best”, resulting in a gradation of rewards and slashings for restakers based on their output scores (e.g., for a given output, the top 90th percentile split 50% of the reward, the 89th to 50th percentile split the remaining 50% of the reward, the bottom 10th percentile are slashed, and all others receive nothing)? This begs the question of how output scores are decided–is it by the client’s feedback or by some method of internal scoring performed via consensus among the restakers’ models? I know BitTensor is working on a scoring system that relies on nodes scoring one another’s outputs.

…There are so many questions!

Anyway, thanks again for your feedback!

3 Likes

Thanks so much for writing this up.

2 Likes