close
close

What is distributed computing? | Quanta Magazine

What is distributed computing? | Quanta Magazine

No device is an island: Your everyday computing needs depend on more than just the microprocessors in your computer or phone. Our modern world relies on “distributed computing,” which divides the computational load among many different machines. The technique moves data back and forth in a complex choreography of digital beats – a dance that has shaped the past, present and likely future of the Internet.

In 1973, Xerox engineers invented Ethernet a connection that allowed the first personal computers to communicate with a shared printer. Ethernet gave rise to local area networks (LANs), which allowed users to share files in their homes or offices in the late 1970s.

Around the same time, the Advanced Research Projects Agency (ARPA) was developing a more expansive distributed network. ARPANET, as it was called, could distribute information over telephone lines, creating a much wider network than a LAN. However, this had limitations: the machines involved had to be compatible, and all connections were wired. Defense officials wanted tanks, planes, and ships to be able to communicate wirelessly, and scientists wanted to bring networking capabilities to the masses. The challenge was to establish rules that standardize how any type of machine communicates with another over any type of connection.

In 1974 Vinton Cerf and Robert Kahn documented their proposal for a set of rules they called the Transmission Control Protocol/Internet Protocol, or TCP/IP. These instructions made it possible to break information into small “packets”, send them across the network, and reassemble them at their destination.

To this day, all Internet data transfers begin with a three-way TCP handshake. The sender’s machine sends a “sync” packet to the receiver, as if to say “listen.” The recipient responds with an acknowledgment and the sender confirms with an acknowledgment of his own. Then the message begins in earnest. Part of the TCP protocol divides sent data into a sequence of packets, carefully labeling them so that they can be reassembled later. IP then determines the best route for each packet. TCP reassembles the information on the recipient’s machine and checks its accuracy. This process ensures there are no out-of-order or missing packages.

In the early 1980s, physicists at Europe’s CERN particle physics laboratory showed that distributed computing could be about much more than just communication. Their computational problems were too complex for a single device to handle effectively. Hundreds of CERN scientists would collaborate on a single experiment, and the giant detectors they used produced a huge amount of raw data that had to be analyzed. Distributed computing enabled these researchers to efficiently divide the computational load between giant mainframe computers and individual workstations, even if the machines were from different manufacturers and using different operating systems. This type of work paved the way for reliable intercontinental collaboration and, ultimately, email.

Decades later, as global business moved online, the ultimate distributed computing system was born: the cloud. Cloud servers are basically other people’s computers that you can use to store and process data for you. Thanks to these systems, technology companies no longer needed their own servers, they could simply rent computing power distributed over a distributed network. Cloud servers are a bit like taxis: you only use them when you need them, and it’s often more efficient than having a car that sits unused 95% of the time.

Distributed computing also enables better cryptographic systems. An encrypted key may be more secure when generated by a network of computers in which no single device knows the entire secret code. This collaborative approach also prevents data manipulation on the blockchain, a technology that stores transactions redundantly across multiple network nodes.

These benefits—performance, security, and reliability—suggest that our computers will likely continue to distribute their computations for the foreseeable future.