Compromises and benefits of Ethereum's switch to Proof-of-Stake

Ethereum's migration from Proof-of-Work to Proof-of-Stake is considered to be an extremely important milestone because developers regard it as a key prerequisite for several subsequent development goals. But the Merge and future sharding upgrades involve some trade-offs with regards to decentralisation.

Before its launch in 2015, Ethereum developers already had a stated ambition to replace its Proof-of-Work (PoW) consensus mechanism with an alternative one: Proof-of-Stake (PoS). While it was deemed too technically risky to start the network with anything other than PoW, the eventual migration to PoS has been a major development goal of Ethereum developers and a highly anticipated milestone on their roadmap.

Though subject to some changes, the development roadmap itself predates the launch of Ethereum and has existed since the network was only in its testnet phase. Many of the changes described in the evolving roadmap have already been implemented, but the migration to PoS, one of the most challenging and involved modifications, has yet to be completed, and is now delayed.

Proof-of-Stake delays in the past

The reasons for the delays are many and complex, but they more or less reduce to PoS proving much more technically challenging to safely implement than developers first thought. Many prototypes have been proposed and evaluated, but problems have kept emerging, necessitating ongoing multi-year bouts of bug-fixes and redesigns.

We can even observe the serial delays of the migration, often referred to as the Merge, from blockchain data. Confident that the PoS implementation is ever near at hand, the Ethereum protocol contains a hard-coded exponential increase in mining difficulty. This mechanism is called the "difficulty bomb" and is designed to cause mining difficulty and revenues to disconnect, forcing miners to abandon the PoW chain, leaving the alternative PoS chain as the only viable one.

Previous attempts of shifting Ethereum to Proof-of-Stake / Source: Bloomberg, CoinShares

Three separate "detonations" of the difficulty bomb can be observed in the above figure. They are visible both as exponential increases in block times, and as rapid divergences between hashrate and difficulty. However, since PoS has never been ready to implement at the time of the detonation (or upcoming detonation) Ethereum developers have rolled back the difficulty bomb on five separate occasions.

All this being said, and while no specific date exists for the Merge as of the time of writing (and the Merge has indeed just been delayed again from H1 2022, tentatively to H2 2022), there are emerging signs perhaps warranting cautious optimism that PoS implementation might actually be forthcoming this time around.

Future sharding also rests on the Merge

In addition to PoS (The Merge), the second major part of Ethereum’s next phase is the introduction of sharding. Sharding is a blockchain protocol scalability technique whereby the protocol increases its throughput by splitting the blockchain into many blockchains (shards), allowing single computers to choose which of the many blockchains to work on. Sharding allows the total throughput of the protocol to increase without increasing the computational demand of the individual computers working on it. In other words, Ethereum will be able to process a lot more information while still hoping to rely on relatively casual users providing distributed processing power through regular consumer computers.

Vertical and horizontal scaling split up the chain in different ways / Source: CoinShares

In the world of computer science this is referred to as a horizontal scaling technique. Horizontal scaling is characterised by increasing throughput/capability by adding more individual computers to a network. Its alternative is vertical scaling whereby increased scale is only achieved through increasing the throughput/capability of the individual network computers.

In the world of blockchain protocols, increasing block sizes or increasing block frequencies (reducing the targeted time between blocks) are examples of vertical scaling as they require that all computers participating must be very powerful (which is expensive). Conversely, sharding allows additional throughput/capability - which means the ability to process a lot more transactions and smart contracts per second, at much lower costs - by adding more network participants, assuming that they will all care about separate shards.

Proof-of-Stake re-introduces the need for trust

Discontinuing PoW mining incurs important trade-offs in return for a drastic reduction in energy consumption, which under our current global electricity production stack, also means massively reduced carbon emissions. Broadly summarised, Ethereum will suffer reductions or elimination of censorship resistance, trust minimisation and decentralisation as a result of implementing PoS. It will also suffer a large increase in its attack surface due to its increased complexity of code. Hackers will have more exploits to seek for.

PoS reintroduces the requirement to trust other network participants when joining or re-joining the network. This is because staking is a quantity internal to the blockchain network. That is, you cannot know who has what stake unless you know which blockchain is the correct one. This means that before a user can validate whether the blockchain before them has been correctly executed, they must first trust someone else to tell them what the blockchain is in the first place. This is a problem if a new user or a returning user is faced with a choice between multiple conflicting blockchains presented to him by a malicious actor. Since a PoS blockchain costs nothing to create, fake histories that are otherwise valid can be created and presented to outsiders en-masse by dishonest participants.

Work, on the other hand, is external to the system. This means that if two conflicting blockchains are presented to a new or returning user, they can trivially check for themselves which blockchain is the correct one simply by looking at the amount of accumulated work (the one with the most accumulated work is by definition the correct one). In a PoS system the only way to get around this is to introduce checkpoints, which again, require trusting other participants to tell you what the correct blockchain was at various times in the past. PoS therefore creates a need to trust other network participants through multiple new avenues, which it must trade-off against its benefits.

Censorship resistance and centralisation

PoS also trades-off its ability to resist censorship. Censorship resistance, in this context and frequently referred to by participants in the crypto ecosystem, means the ability of the network to resist the actions of a network participant trying to prevent some or all transactions from being entered into the transaction record. The only effective way to do this is to control more than 51% of block producers - miners in a PoW system, stakers in a PoS system. An entity controlling a majority of block production can simply refuse to enter some or all transactions into the blockchain, effectively censoring any or all parties.

In a PoW system, miners need to consume a resource external to the system and also require external capital (hardware). This can be procured without the majority miner knowing anything about it, meaning that there exists a mechanism by which a censor can lose its place as a majority miner.

In a PoS system, no such recourse exists within the protocol rules. As soon as an entity achieves a majority stake in the system they will perpetually increase their proportion of the total stake and nothing can force them to sell any of their stake meaning that their position is impossible to dislodge. The only way to recover from a situation like this is by recourse to a social consensus hard fork, which is just another way of saying centralised management by a select committee - which is by definition the opposite of decentralised.

Sharding further decreases decentralisation

The trade-off for both horizontal scaling of a blockchain network, versus keeping throughput low, is that the network as a whole becomes more like a client-server network than a peer-to-peer network, losing out on important decentralisation benefits. Why is that? Briefly explained, in order to be a full peer in a blockchain network - that is, someone who participates in the network without the need to trust any other network participant - a user must be able to fully verify every single event that happens on the network.

With a multitude of blockchains (or a single huge one) to verify, the computational and bandwidth resources required to be a full peer increases dramatically, making fewer and fewer users able to afford the privilege of being full peers. This results in the reintroduction of trust as all users who are now unable to verify all shards (or a huge single blockchain) must trust other users to tell them the truth about what happened on other shards (or on the huge blockchain they can no longer afford to self-verify).

A high level of decentralisation is a sought after yet traditionally hard to define quality of peer-to-peer networks. The reason for its desirability is that a network with as many peers as possible becomes impossible to shut down due to the huge number of participants, all of whom must be disabled for the network to be fully extinguished. So the more costly it is to be a full peer, the less decentralised a network will generally be, which leads to lower security in exchange for higher scalability; also known as the Blockchain Trilemma.