What is IPFS

Advanced

10m March 23, 2023

IPFS (InterPlanetary File System) is a distributed system for peer-to-peer content storing and accessing. Unlike the traditional Internet service provider, the system is not organized by any central institutions (decentralized). Instead, everyone can participate in the system as a node, contributing some idle storage space in their computer devices to store and share files with others.

For example, when you search for a term on Wikipedia, your computer requests Wikipedia's server to share a certain website page with you. However, in IPFS, your computer will ask other computers around the world to share the data with you.

IPFS was founded by Protocol Labs in 2015. The team is striving to build a system that works across places as disconnected or as far apart as planets. That's where the name InterPlanetary File System came from.

Mechanism of Search on IPFS: Content Addressing

The main difference between IPFS and traditional search engines is that IPFS utilized a means called Content Addressing rather than Location Addressing. It means that users can identify content by what's in it rather than where it is located.

For example, when you search 'Bitcoin' in TokenInsight, you will be linked to the URL (Uniform Resource Locator): https://tokeninsight.com/en/coins/bitcoin/overview, which gives you a path to the specific page in TokenInsight's domain (Location Addressing). However, in IPFS, you will ask other participants who have the page to share it with you. In IPFS, content could be files, websites, applications, or metadata.

Here is an example from IPFS Doc. When you look for a book in the library, you often ask for it by the title; that's content addressing because you're asking for what it is. If you were using location addressing to find that book, you'd ask for it by where it is: "I want the book that's on the second floor, first stack, third shelf from the bottom, four books from the left." If someone moved that book, you'd be out of luck!

Operation Mechanism of IPFS

When a user uploads content on IPFS, the content will be stored with a unique identifier called CID, and then spread the content across the network. To ensure the speed of the network, IPFS might split the content into pieces, and store each piece separately with a CID. These contents will be classified and graphed through a data structure called Merkle DAG.

If a user looks for specific content, he/she can search for the content via CID (Content Addressing). Then the system will find which participant owns the target and ask him/her to deliver the content to the requirer.

Content Storage on IPFS

Content Identifier

Content Identifier (CID) is a unique identification for every piece of content in the IPFS network, and it is a cryptography hash that comes from the content itself. Therefore, the same content imported to IPFS from different participants will produce the same CID, and any difference/change in content will also be reflected via a different CID.

Merkle Directed Acyclic Graph

To structuralize the CIDs in the network, IPSF utilizes a data structure called Merkle DAG (Merkle Directed Acyclic Graph), which derives from DAG.

What is Merkle DAG?
Graph stands for a diagram showing the relationship between different objects. It contains objects (also known as nodes) and edges (see the graph below). A directed graph means each edge has a direction. Acyclic indicates that there is no loop or cycle in the graph. As for Merkle DAG, it is a sort of DAG where each object has an identifier, and this is the result of hashing the objecte's contents

To ensure the speed of the network, IPFS usually splits content into small pieces (eg. 256KB), and each of them will be stored in one block (different from 'block' in a distributed ledger) with a CID. Once users request a piece of content, the computer will fetch these pieces together from blocks and integrate them into one piece.

IPFS utilizes Merkle DAG to represent the relationships between contents and their splits, and even to clarify contents into categories (just like documents and folders).

The features of Merkle DAG:

Merkle DAGs can only be constructed from bottom to top (from leaf node to root node).
Any change in a node will alter its CID and further affect all the ascendants in the DAG.
A leaf node might have multiple root nodes.

How is Content found and shared on IPFS

Distributed Hash Table

After storage, IPFS adopts DHT (Distributed Hash Table) for users to find the target content, namely which peer holds the content you want and what is the location of the peer. A hash table is a database of keys (CIDs) and values (Contents), and each value is matched with a unique key so that users can identify the value by the key. Distributed Hash Table is a system containing multiple peers and each of them has a hash table. You need to ask these peers to share the content with you by the key.

DHT is suitable for finding information among a massive base of data, as the keys are in a consistent format. Further, each peer will partition the data to improve the data searching speed.

Once users find the location of the content they want, they can connect the peer, send requests, and get the content (this is done by an IPFS module called Bitswap). When received, they can verify the content by comparing the CID.

Application of IPFS

There are some Web3 projects utilizing IPFS as a storage infrastructure in different fields, such as Filecoin (storage servicer), Audius (decentralized music servicer), Pinata (NFT hosting servicer), OpenBazaar (peer-to-peer e-commerce platform), Morpheus.Network (supply chain network servicer), etc.

You might also be interested in the following content:
What is Arweave?
What is Distributed Ledger?

Web3

Storage

What else do you want to learn?

Send