From POH (Proof Of Humanity) to POI (Proof Of Intelligence)
In today's Web3 projects and ecosystem, the issue of "Proof of Humanity" has received significant attention due to Sybil attacks undermining the integrity of retrospective airdrops in Web3. Greedy actors create fake accounts to unfairly earn more airdropped tokens. This greatly harms the interests of other users and goes against the project's intention to incentivize the community.
The structure of this section is as follows. First, Trusta's AI and machine learning-powered framework can systematically analyze on-chain data and identify suspicious Sybil clusters. We first introduce the two-phase approach for Sybil identification. We then introduce the Test of Humanity, which is a Turing Test-like proactive method for identity verification through Q&A with the verifier. Finally, we introduce our agentic system, which can organically combine all possible verification approaches into a holistic agent, aiming to address the problem of Proof of Intelligence.
Proof of Humanity: AI and Data-Driven Framework
The Sybils automate interactions across their accounts using bots and scripts. This causes their accounts to cluster together as malicious communities. Trusta’s 2-phase AI-ML framework identifies Sybil communities using clustering algorithms:
Phase 1 analyzes asset transfer graphs (ATGs) with community detection algorithms like Louvain and K-Core to detect densely connected and suspicious Sybil groups.
Phase 2 computes user profiles and activities for each address. K-means refines clusters by screening dissimilar addresses to reduce false positives from Phase 1.
In summary, Trusta first uses graph mining algorithms to identify coordinated Sybil communities. Then additional user analysis filters outliers to improve precision, combining connectivity and behavioral patterns for robust Sybil detection.
Phase I: Community Detection on ATGs
Trusta analyzes asset transfer graphs (ATGs) between EOA accounts. Entity addresses such as bridge, exchanges, smart contracts are removed to focus on user relationships. Trusta has developed proprietary analytics to detect and remove hub addresses from the graphs. Two ATGs are generated:
The general transfer graph with edges for any token transfer between addresses.
The gas provision network where edges show the first gas provision to an address.
The initial gas transfer activates new EOAs, forming a sparse graph structure ideal for analysis. It also represents a strong relationship as new accounts depend on their gas provider. The gas network’s sparsity and importance makes it valuable for Sybil resistance. Complex algorithms can mine the networks while gas provision links highlight meaningful account activation relationships.
Trusta analyzes asset transfer graphs to detect Sybil clusters through community detection Louvian algorithms and some known attack patterns, shown in the diagram
The star-like divergence attacks: Addresses funded by the same source
The star-like convergence attacks: Addresses sending funds to the same target
The tree-structured attacks: Funds distributed in a tree topology
The chain-like attacks: Sequential fund transfers from one address to the next in a chain topology.
Phase 1 yields preliminary Sybil clusters based solely on asset transfer relations. Trusta further refines results in Phase 2 by analyzing account behavior similarities.
Phase II: K-Means Refinement Based on Behaviour Similarities
Transaction logs reveal address activity patterns. Sybils may exhibit similarities like interacting with the same contracts/methods, with comparable timing and amounts. Trusta validates Phase 1 clusters by analyzing onchain behaviors across two variable types:
Transactional variables: These variables are derived directly from on-chain actions and include information such as the first and latest transaction dates and the protocols or smart contracts interacted with.
Profile variables: These variables provide aggregated statistics on behaviors such as interaction amount, frequency, and volume.
To refine the preliminary cluster of Sybils using the multi-dimensional representations of addresses behaviors, Trusta employs a K-means-like procedure. These two steps in K-means are iteratively performed until convergence is achieved, resulting in refined clusters of Sybils.
The clustering-based algorithms for Sybil resistance are the optimal choice at this stage for several reasons:
Relying solely on historical Sybil lists like HOP and OP Sybils is insufficient because new rollups and wallets continue to emerge. Merely using previous lists cannot account for these new entities.
In 2022, there were no benchmark Sybil labelled data sets available to train a supervised model. Training on static Sybil/non-Sybil data raises concerns about the precision and recall of the model. Since a single dataset cannot encompass all Sybil patterns, the recall is limited. Additionally, misclassified users have no means to provide feedback, which hampers the improvement of precision.
Anomaly detection is not suitable for identifying Sybils since they behave similarly to regular users.
Therefore, we conclude that a clustering-based framework is the most suitable approach for the current stage. However, as more addresses are labeled, Trusta will certainly explore supervised learning algorithms such as deep neural network-based classifiers.
Test Of Humanity: Knowledge Based Authentication
Based on our work of TOH (Test of Humanity) on TON, Trusta's project "t-TON: The Trustworthy and Open Network" has won the Ton Hackathon championship in 2024 winter【6】. The number of TON accounts has surged from over 10 million to over 100 million in the past six months, raising concerns about Sybil attacks within the ecosystem. Sybil attacks refer to dishonest actors using scripts to create/control numerous fake Telegram/TON accounts to perform in-app and on-chain activities, unfairly gaining more airdropped tokens.
In the context of the TON ecosystem, TOH's idea can be understood like this: If your TON wallet has actively played Catizen and received the $CATI airdrop, you will easily answer a question like, “Which token did you receive in the airdrop? (A) $NOT, (B) $CATI, (C) $DOGS.” In contrast, a scripted bot would struggle to summarize on-chain information and respond quickly, facing difficulties with such personalized questions.
Inspired by the Turing Test 【7】, Trusta.AI has generated a personalized Test Of Humanity (TOH) system based on user's TON activities. We gather TON data to create a tailored questionnaire with specifically designed questions. Your humanity is then determined by your responses and performance on the questionnaire. Our test of humanity system is simple, interactive, and personalized, featuring:
Interactive yet Simple: Users only need to answer a few straightforward questions, avoiding heavy methods like iris or facial recognition.
Personalized and Secure: The questionnaire is tailored based on each account's individual behavior, making it harder to conduct bulk attacks.
Privacy-Preserving: Questions are generated solely from on-chain data, minimizing concerns about personal privacy.
AI-Enhanced User Experience: We utilize ChatGPT to assist in the design of questions, including misleading answer options.
Our verification isn’t just about checking whether the answer are correct. We also evaluate their response time, answer patterns, and other factors to make a comprehensive judgment on authenticity. This multi-dimensional approach strengthens our detection against bots. Additionally, we leverage AI models to introduce question variations, making it significantly more difficult for attackers to predict or game the system. Both methods ensure our KBA system remains robust and adaptive.
The Trusta Identity Verification Agent System
It is crucial to distinguish among Human Intelligences, intelligent-deficient robots (bots), and Artificial Intelligence. To tackle this complicated task, we definitely need the collaboration of many modules.
The above figure shows the collaboration of multiple imagined modules. Additionally, Trusta has just begun exploring a multi-agent orchestration approach to build an Identity Verification Agent System. The architecture is shown in the figure below. We are continuously improving this framework so that, in the AI era, different entities will be assigned the correct identities.