cyberd: Computing the knowledge from web3

The original post has been updated based on community input in order to remove confusion. Full history

Notes on cyber release of cyber:// protocol reference implementation using Go.

cyber•Congress: @xhipster, @litvintech, @hleb-albau, @arturalbov

Abstract

A consensus computer allows computing of provably relevant answers without opinionated blackbox intermediaries such as Google, Youtube, Amazon or Facebook. Stateless content-addressable peer-to-peer communication networks such as IPFS and stateful consensus computers such as Ethereum provide part of the solution, but there are at least three problems associated with implementation. Of course, the first problem is the subjective nature of relevance. The second problem is that it is hard to scale consensus computer for a huge knowledge graph. The third problem is that the quality of such a knowledge graph will suffer from different attack surfaces such as sybil, selfish behaviour of interacting agents. In this paper, we (1) define a protocol for provable consensus computing of relevance between IPFS objects based on Tendermint consensus of cyber•rank computed on GPU, (2) discuss implementation details and (3) design distribution and incentive scheme based on our experience. We believe the minimalistic architecture of the protocol is critical for the formation of a network of domain-specific knowledge consensus computers. As a result of our work some applications never existed before emerge. We expand the work with our "after Genesis vision" on features and apps.

Introduction to web3

Original protocols of the Internet such as TCP/IP, DNS, URL, and HTTPS brought a web into the point where it is now. Along with all the benefits they have created they brought more problem to the table. Globality being a vital property of the web since inception is under real threat. The speed of connections degrades with network grow and from ubiquitous government interventions into privacy and security of web users. One property, not evident in the beginning, become important with everyday usage of the Internet: it's ability to exchange permanent hyperlinks thus they would not break after time has passed. Reliance on "one at a time ISP" architecture allows governments effectively censor packets. It is the last straw in a conventional web stack for every engineer who is concerned about the future of our children.

Other properties while being not so critical are very desirable: offline and real-time. Average internet user being offline must have the ability to work with the state it has and after acquiring connection being able to sync with global state and continue to verify state's validity in realtime while having a connection. Now, these properties offered on the app level while such properties must be integrated into lower level protocols.

The emergence of a web3 stack creates an opportunity for a new kind of Internet. Community calls it web3. We call it "The Great Web" as it is expected that some low-level conventions must become immutable and not being changed for decades. e.g. immutable content links. It has a promise to remove problems of a conventional protocol stack and add to the web better speed and more accessible connection. However, as usual in a story with a new stack, new problems emerge. One of such problem is general-purpose search. Existing general-purpose search engines are restrictive centralized databases everybody forced to trust. These search engines were designed primarily for client-server architecture based on TCP/IP, DNS, URL, and HTTPS protocols. Web3 creates a challenge and opportunity for a search engine based on developing technologies and specifically designed for them. Surprisingly the permission-less blockchain architecture itself allows organizing general purpose search engine in a way inaccessible for previous architectures.

On adversarial examples problem

Conventional architecture of search engines there one entity process and rank all the shit suffers from one hard but the particular problem that still has not been solved even by brilliant Google scientists: adversarial examples problem. The problem Google acknowledge is that it is rather hard to algorithmically reason either this particular sample is adversarial or not independently on how cool the learning technology is. Obviously, a cryptoeconomic approach can change beneficiaries in this game effectively removing possible sybil attack vectors and removing the necessity to make a decision on example crawling and meaning extraction from one entity to the whole world. Learning sybil-resistant model will probably lead to orders of magnitude more predictive results.

Cyber protocol at `cyber`

compute cyber inception of cyber protocol based on the Genesis distribution rules
def knowledge graph state
take cyberlinks
check the validity of signatures
check bandwidth limit
check the validity of CIDv0
if signatures, bandwidth limit, and CIDv0 are valid than cyberlink is valid
every round calculate cyber•rank deltas for the knowledge graph

Knowledge graph

We represent a knowledge graph as a weighted graph of directed links between content addresses, or content identifications, or CIDs, or simply ipfs hashes. In this paper, we will use them as synonyms.

Content addresses are essentially web3 links. Instead of using non-obvious and mutable thing:

https://github.com/cosmos/cosmos/blob/master/WHITEPAPER.md

we can use pretty much exact thing:

Qme4z71Zea9xaXScUi6pbsuTKCCNFp5TAv8W5tjdfH7yuH

Using content addresses for building a knowledge graph we get so much needed superpowers of ipfs-like p2p protocols for a search engine:

mesh-network future proof
interplanetary
tolerant
accessible
technology agnostic

Web3 agents generate our knowledge graph. Web3 agents include itself to the knowledge graph by transacting only once. Thereby they prove the existence of private keys for content addresses of revealed public keys. Using this basic proof mechanics consensus computer could have provable differentiation between subjects and objects in a knowledge graph.

Our cyber implementation is based on cosmos-sdk identities and cidv0 content addresses.

Web 3 agents generate knowledge graph by applying cyberlinks.

Cyberlinks

To understand cyberlinks, we need to understand the difference between URL link aka hyperlink and IPFS link. URL link points to the location of content, but IPFS link point to the content itself. The difference in web architecture based on location links and content links is drastical, hence require new approaches.

Cyberlink is an approach to link two content addresses or IPFS links semantically:

QmdvsvrVqdkzx8HnowpXGLi88tXZDsoNrGhGvPvHBQB6sH
.
Qme4z71Zea9xaXScUi6pbsuTKCCNFp5TAv8W5tjdfH7yuH

This cyberlink means that cyberd presentation on cyberc0n is referencing Cosmos whitepaper. A concept of cyberlink is a convention around simple semantics of communication format in any peer to peer network:

.

You can see that cyberlink represents a link between two links. Easy peasy!

Cyberlink is a simple yet powerful semantic construction for building a predictive model of the universe. That is, using cyberlinks instead of hyperlinks gives us superpowers inaccessible for previous architectures of general purpose search engines.

Cyberlinks can be extended, e.g. can form link chains if exist a series of two cyberlinks from one agent in which the second link in the first cyberlink is equal to the first link in the second cyberlink:

.
.

Using this simple principle, all interacting agents can reach consensus around interpreting clauses. So link chains are helpful for interpreting rich communications around relevance.

Also using the following link: QmNedUe2wktW65xXxWqcR8EWWssHVMXm3Ly4GKiRRSEBkn the one can signal the start and stop of execution in the knowledge graph. A lot of cool stuff can be done using cyberlinks.

If web3 agents expand native IPFS links with something semantically richer as DURA links than web3 agents can easier reach consensus on the rules for program execution. Indeed, DURA protocol is a proper implementation of a cyberlink concept.

cyber implementation of cyberlinks based on DURA specification is available in .cyber app of browser cyb.

Based on cyberlinks we can compute the relevance of subjects and objects in a knowledge graph. That is why we need a consensus computer.

Notion of consensus computer

Consensus computer is an abstract computing machine that emerges from agents interactions.

A consensus computer has a capacity in terms of fundamental computing resources such as memory and computing. To interact with agents, a computer needs a bandwidth.

Ideal consensus computer is a computer in which:

the sum of all computations and memory available for individuals
is equal to
the sum of all verified computations and memory of a *consensus computer*

We know that:

verifications of computations < computations + verifications of computations

Hence we will not be able to achieve an ideal consensus computer ever. CAP theorem and scalability trilemma also prove this statement.

However, this theory can work as a performance indicator of a consensus computer.

After 6 years of investments into consensus computers, we find out that Tendermint consensus has a good balance between coolness for our task and readiness for production. So we decide to implement the cyber protocol using Tendermint which is very close to Cosmos Hub setting.

The cyber implementation is a 64-bit tendermint consensus computer of the relevance for 64-byte string space that is as far from ideal at least as 1/146, because we have 146 validators who verify the same computation using the knowledge graph of the same size.

We must bind computational, storage and bandwidth supply of consensus computer with maximized demand on queries. Computation and storage in case of basic relevance machine can be easily predicted based on bandwidth, but bandwidth requires a limiting mechanism.

Bandwidth

Bonded stake - stake, that deducted from your acc coins and put as deposit to take part in consensus. Due to the passive inflation model and slashing, deposit does not match 1-to-1 to the final reward. So, for example, stakeholders may wish to set up a script, that will periodically withdraw and rebound rewards to increase their bonded stake.

Active stake - currently available for direct transfer, not-bonded stake.

Bandwidth stake = active stake + bonded stake.

Cyberd uses a very simple bandwidth model. The main goal of that model is to reduce daily network growth to given constant, say 3gb per day.

Thus, here we introduce resource credits, or RS. Each message type has assigned RS cost. There is constant DesirableNetworkBandwidthForRecoveryPeriod determining desirable for RecoveryPeriod spent RS value. RecoveryPeriod is defining how fast agent can recover their bandwidth from 0 to agent's max bandwidth. An agent has maximum RS proportional to his stake by the formula:

agent_max_rc = bandwidth_stake * DesirableNetworkBandwidthForRecoveryPeriod

There is a period AdjustPricePeriod summing how much RS was spent for that period AdjustPricePeriodTotalSpent. Also, there is constant AdjustPricePeriodDesiredSpent, used to calculate network loading.

AdjustPricePeriodTotalSpent / AdjustPricePeriodDesiredSpent ratio is called fractional reserve ratio. If network usage is low, fractional reserve ratio adjust message cost (by simple multiplication) to allow agent with a lower stake to do more transactions. If resource demand increase, fractional reserve ratio goes >1 thus increase messages cost and limiting final tx count for some long-term period (RC recovery will be < then RC spending).

There are only two ways to change account bandwidth stake:

Direct coins transfer.
When distribution payouts occur. For example, when validator changes his commission rates, all delegations will be automatically unbounded. Another example, delegator itself unbond some part or full share.

So agents must have CYB tokens in accordance with their will of learning the knowledge graph. However, proposed mechanics of CYB tokens work not only as spam protection but as the economic regulation mechanism to align the ability of validators to process knowledge graph and market demand for processing.

Relevance machine

We define relevance machine as a machine that transition knowledge graph state based on the will of agents to learn the knowledge graph. The more agents will learn the knowledge graph the more valuable the graph becomes. This machine enables simple construction for search question querying and answers delivering.

The will is projected on every agent's cyberlink. A simple rule prevents abuse by agents: one content address can be voted by a coin only once. So it does not matter for ranking from how much accounts you voted. The only sum of their balances matters.

A useful property of a relevance machine is that it must have inductive reasoning property or follows the blackbox principle.

She must be able to interfere predictions
without any knowledge about objects
except who cyberlinked, when cyberlinked and what was cyberlinked.

If we assume that a consensus computer must have some information about linked objects the complexity of such model growth unpredictably, hence a requirement for a computer for memory and computations. That is, deduction of meaning inside the consensus computer is expensive thus our design depends on the blindness assumption. Instead of deducting a meaning inside the consensus computer we design a system in which meaning extraction is incentivized because agents need CYB to compute relevance.

Also, thanks to content addressing the relevance machine following the blackbox principle do not need to store the data but can effectively operate on it.

Human intelligence organized in a way to prune none-relevant and none-important memories with time has passed. The same way can do relevance machine. Also, one useful property of relevance machine is that it needs to store neither past state, nor full current state to remain useful, or more precisely: relevant. So relevance machine can implement aggressive pruning strategies such as pruning all history of knowledge graph formation or forgetting links that become non-relevant.

cyber implementation of relevance machine is based on the most straightforward mechanism which is called cyber•Rank.

cyber•Rank

Ranking using consensus computer is hard because consensus computers bring serious resource bounds. e.g. Nebulas still fail to deliver something useful on-chain. First, we must ask ourselves why do we need to compute and store the rank on-chain, and not go Colony or Truebit way?

If rank computed inside consensus computer, you have an easy content distribution of the rank as well as an easy way to build provable applications on top of the rank. Hence we decided to follow more cosmic architecture. In the next section, we describe the proof of relevance mechanism which allows the network to scale with the help of domain-specific relevance machines that works in parallel thanks to IBC protocol.

Eventually, relevance machine needs to find (1) deterministic algorithm that allows computing a rank for a continuously appended network to scale the consensus computer to orders of magnitude that of Google. Perfect algorithm (2) must have linear memory and computation complexity. The most importantly it must have (3) highest provable prediction capabilities for the existence of relevant cyberlinks.

After some research, we found that we can not find a silver bullet here. So we decided to find some more basic bulletproof way to bootstrap the network: the rank from which Larry and Sergey have bootstrapped a previous network. The key problem with the original PageRank is that it is not resistant to sybil attacks.

Token weighted PageRank limited by token-weighted bandwidth do not have inherent problems of naive PageRank and is resistant to sybil attacks. For the time being, we will call it cyber•Rank until something better emerge.

In the center of the spam protection system is an assumption that write operations can be executed only by those who have a vested interest in the evolutionary success of a relevance machine. Every 1% of stake in consensus computer gives the ability to use 1% of possible network bandwidth and computing capabilities.

As nobody uses all possessed bandwidth, we can safely use up to 100x fractional reserves with 2-minute recalculation target. This mechanics offers a discount for cyberlinking thus effectively maximizing demand for it.

We would love to discuss the problem of vote buying mainly. Vote buying by itself is not such bad. The problem with vote buying appears in the systems where voting affects the allocation of inflation in the system like Steem or any state-based system. So vote buying can become easily profitable for adversary employing a zero-sum game without a necessity to add value. Our original idea of a decentralized search was based on this approach, but we reject this idea removing incentive on consensus level for knowledge graph formation completely. In our setting in which every participant must bring some value to the system to affect predictive model vote buying become NP-hard problem hence is useful for the system.

We understand that the ranking mechanism will always remain red herring. That is why we expect to rely on on-chain governance mechanism to define the winning one. We would love to switch from one algorithm to another, based on economic a/b testing through hard spoons of domain-specific relevance machines though.

The current implementation of consensus computer based on relevance machine for cyber•Rank can answer and deliver relevant results for any given search request in the 64 byte CID space. However, to build a network of domain-specific relevance machines, it is