Visualization of the Farcaster Network

In a recent Moment of Zen podcast episode, Balaji Srinivasan noted how there aren't really maps for social networks like there are for physical spaces. I think this is because there are some features of geographic maps that make instinctive sense but don't have counterparts in the social network world (think distance, borders, or direction). However, with the appropriate definitions, you can bring some of these concepts into social network land.

In a social network, the concept of ‘distance’ is pretty abstract, since nodes (users) are related to each other through hugely complicated directed graphs of interactions. These interactions can be positive, negative, or totally devoid of sentiment. They can also be differentiated by paradigm (think ‘comment’ vs. ‘like’). Ideally, our mapping would position these users so that similar ones would be nearby to one another, while different users would be farther apart. This description of 'distance' still isn't great though, since the concept of having ‘similar’ and ‘different’ users is a substantial abstraction; how can one define those terms?

Consider two users, A and B. One idea is that if A and B interact with each other often, they are similar and should therefore be close in proximity on our map. However, consider the scenario in which they never talk to each other but post very similar content (cat videos). To an outsider, the two users’ pages are almost identical, and should also be considered similar. So which is correct?

I think the best way to resolve this is to examine how distributed the network is. For example, a YouTube-style website consists of one ‘parent’ user publishing content and then receiving hundreds or thousands of views, comments, and likes. You might describe a social network of this kind as centralized, since the content is largely generated by a few very prolific users. Every post one of these ‘power users’ publishes will receive orders of magnitude more interactions. Here, I think it's best to base similarity on similarity of content, not of interactions. Even then, ‘similarity of content’ is very ambiguous, but this is beyond the scope of the post.

Contrast this with something like WhatsApp, where the average post will reach tens to hundreds of people at most. You would describe this kind of social network as being very distributed; it doesn’t make sense for someone to be a ‘WhatsApp’ influencer. A single post will not receive huge amounts of interactions, and posts are made at higher frequencies. It would likely make more sense to group users by interactions, and not by the content of their conversations.

Finally, there are social networks like Twitter (or decentralized alternatives like Farcaster) that lie somewhere in the middle. The frequency of posts/interactions can be quite high, but a single post can still reach thousands or millions of users. Content here is inherently interaction-based, so a huge chunk of even a power user’s posts will be interactions themselves. For this kind of network, a hybrid approach would probably work best.

How This Works

This visualization is of the ~1000 most prolific users on Farcaster. If you aren’t familiar, Farcaster is a ‘protocol for decentralized social apps,’ the most used being a Twitter-like social media platform. Since accessing the root data for Farcaster is so easy (since it’s decentralized), it served as the perfect social network to try something like this out. Each node in the map is a distinct user, and distance between two users signifies how similar they are (closer = more similar). I know I just said that both content and interactions should define similarity for this kind of social network, but this mapping is really just a proof of concept. I based similarity solely on the number of interactions the two users had with each other in relation to their total number of outgoing interactions.

If you’re curious, here’s the actual formula I used to calculate the similarity of User A to User B:

similarityAB=ln(1+(5i/r))similarity_{AB} = ln(1 + (5i / r))

Here, i is the number of interactions user A had with user B, and r is the total number of interactions A had with anyone. I then used the total number of interactions anyone had with A compared to the total number of interactions anyone had with B to compute a weighted average of the two similarity scores. If I didn't do this, more 'famous' users would be drowned out (meaning they wouldn't be similar to many people) by users that talk only within small groups of people. I can now compare this final value to some arbitrary number, giving me a boolean value that signifies whether the two users are similar.

Once I had a list of which users are similar and which aren’t, I used a siamese neural net with a contrastive loss function to generate positional embeddings for each user. A siamese network means two users are sent as input into the model at the same time, thus producing two positional embeddings. Then, if the users are similar, the loss value is just the euclidean distance between the two embeddings. If the users aren’t similar, the loss value is just the euclidean distance subtracted from the maximum distance we care to see two unlike users be apart.

Once I have a positional embedding for each user (64 dimensional), I can use a dimensionality-reduction algorithm like T-SNE to reduce the embeddings to 2D coordinates. Then, I plot them, and voilà!

Let me know what you think!