README
¶
Avalanche Networking
Table of Contents
Overview
Avalanche is a decentralized p2p (peer-to-peer) network of nodes that work together to run the Avalanche blockchain protocol.
The network
package implements the networking layer of the protocol which allows a node to discover, connect to, and communicate with other peers.
All connections are authenticated using TLS. However, there is no reliance on any certificate authorities. The network
package identifies peers by the public key in the leaf certificate.
Peers
Peers are defined as members of the network that communicate with one another to participate in the Avalanche protocol.
Peers communicate by enqueuing messages between one another. Each peer on either side of the connection asynchronously reads and writes messages to and from the remote peer. Messages include both application-level messages used to support the Avalanche protocol, as well as networking-level messages used to implement the peer-to-peer communication layer.
sequenceDiagram
actor Morty
actor Rick
loop
Morty->>Rick: Write outbound messages
Rick->>Morty: Read incoming messages
end
loop
Rick->>Morty: Write outbound messages
Morty->>Rick: Read incoming messages
end
Message Handling
All messages are prefixed with their length. Reading a message first reads the 4-byte message length from the connection. The rate-limiting logic then waits until there is sufficient capacity to read these bytes from the connection.
A peer will then read the full message and attempt to parse it into either a networking message or an application message. If the message is malformed the connection is not dropped. The peer will simply continue to the next sent message.
Peer Handshake
Upon connection to a new peer, a handshake is performed between the node attempting to establish the outbound connection to the peer and the peer receiving the inbound connection.
When attempting to establish the connection, the first message that the node sends is a Handshake
message describing the configuration of the node. If the Handshake
message is successfully received and the peer decides that it will allow a connection with this node, it replies with a PeerList
message that contains metadata about other peers that allows a node to connect to them. See PeerList Gossip.
As an example, nodes that are attempting to connect with an incompatible version of AvalancheGo or a significantly skewed local clock are rejected.
sequenceDiagram
actor Morty
actor Rick
Note over Morty,Rick: Connection Created
par
Morty->>Rick: AvalancheGo v1.0.0
and
Rick->>Morty: AvalancheGo v1.11.4
end
Note right of Rick: v1.0.0 is incompatible with v1.11.4.
Note left of Morty: v1.11.4 could be compatible with v1.0.0!
par
Rick-->>Morty: Disconnect
and
Morty-XRick: Peerlist
end
Note over Morty,Rick: Handshake Failed
Nodes that mutually desire the connection will both respond with PeerList
messages and complete the handshake.
sequenceDiagram
actor Morty
actor Rick
Note over Morty,Rick: Connection Created
par
Morty->>Rick: AvalancheGo v1.11.0
and
Rick->>Morty: AvalancheGo v1.11.4
end
Note right of Rick: v1.11.0 is compatible with v1.11.4!
Note left of Morty: v1.11.4 could be compatible with v1.11.0!
par
Rick->>Morty: Peerlist
and
Morty->>Rick: Peerlist
end
Note over Morty,Rick: Handshake Complete
Ping-Pong Messages
Peers periodically send Ping
messages containing perceived uptime information. This information can be used to monitor how the node is considered to be performing by the network. It is expected for a node to reply to a Ping
message with a Pong
message.
sequenceDiagram
actor Morty
actor Rick
Note left of Morty: Send Ping
Morty->>Rick: I think your uptime is 95%
Note right of Rick: Send Pong
Rick->>Morty: ACK
Peer Discovery
When starting an Avalanche node, a node needs to be able to initiate some process that eventually allows itself to become a participating member of the network. In traditional web2 systems, it's common to use a web service by hitting the service's DNS and being routed to an available server behind a load balancer. In decentralized p2p systems, however, connecting to a node is more complex as no single entity owns the network. Avalanche consensus requires a node to repeatedly sample peers in the network, so each node needs some way of discovering and connecting to every other peer to participate in the protocol.
Inbound Connections
It is expected for Avalanche nodes to allow inbound connections. If a validator does not allow inbound connections, its observed uptime may be reduced.
Outbound Connections
Avalanche nodes that have identified the IP:Port
pair of a node they want to connect to will initiate outbound connections to this IP:Port
pair. If the connection is not able to complete the Peer Handshake, the connection will be re-attempted with an Exponential Backoff.
A node should initiate outbound connections to an IP:Port
pair that is believed to belong to another node that is not connected and meets at least one of the following conditions:
- The peer is in the initial bootstrapper set.
- The peer is in the default bootstrapper set.
- The peer is a Primary Network validator.
- The peer is a validator of a tracked Subnet.
- The peer is a validator of a Subnet and the local node is a Primary Network validator.
IP Authentication
To ensure that outbound connections are being made to the correct IP:Port
pair of a node, all IP:Port
pairs sent by the network are signed by the node that is claiming ownership of the pair. To prevent replays of these messages, the signature is over the Timestamp
in addition to the IP:Port
pair.
The Timestamp
guarantees that nodes provided an IP:Port
pair can track the most up-to-date IP:Port
pair of a peer.
Bootstrapping
In Avalanche, nodes connect to an initial set (this is user-configurable) of bootstrap nodes.
PeerList Gossip
Once connected to an initial set of peers, a node can use these connections to discover additional peers.
Peers are discovered by receiving PeerList
messages during the Peer Handshake. These messages quickly provide a node with knowledge of peers in the network. However, they offer no guarantee that the node will connect to and maintain connections with every peer in the network.
To provide an eventual guarantee that all peers learn of one another, nodes periodically send a GetPeerList
message to a randomly selected Primary Network validator with the node's current Bloom Filter and Salt
.
Gossipable Peers
The peers that a node may include into a GetPeerList
message are considered gossipable
.
Trackable Peers
The peers that a node would attempt to connect to if included in a PeerList
message are considered trackable
.
Bloom Filter
A Bloom Filter is used to track which nodes are known.
The parameterization of the Bloom Filter is based on the number of desired peers.
Entries in the Bloom Filter are determined by a locally calculated Salt
along with the NodeID
and Timestamp
of the most recently known IP:Port
. The Salt
is added to prevent griefing attacks where malicious nodes intentionally generate hash collisions with other virtuous nodes to reduce their connectivity.
The Bloom Filter is reconstructed if there are more entries than expected to avoid increasing the false positive probability. It is also reconstructed periodically. When reconstructing the Bloom Filter, a new Salt
is generated.
To prevent a malicious node from arbitrarily filling this Bloom Filter, only 2
entries are added to the Bloom Filter per node. If a node's IP:Port
pair changes once, it will immediately be added to the Bloom Filter. If a node's IP:Port
pair changes more than once, it will only be added to the Bloom Filter after the Bloom Filter is reconstructed.
GetPeerList
A GetPeerList
message contains the Bloom Filter of the currently known peers along with the Salt
that was used to add entries to the Bloom Filter. Upon receipt of a GetPeerList
message, a node is expected to respond with a PeerList
message.
PeerList
PeerList
messages are expected to contain IP:Port
pairs that satisfy all of the following constraints:
- The Bloom Filter sent when requesting the
PeerList
message does not contain the node claiming theIP:Port
pair. - The node claiming the
IP:Port
pair is currently connected. - The node claiming the
IP:Port
pair is either in the default bootstrapper set, is a current Primary Network validator, is a validator of a tracked Subnet, or is a validator of a Subnet and the peer is a Primary Network validator.
Avoiding Persistent Network Traffic
To avoid persistent network traffic, it must eventually hold that the set of gossipable peers
is a subset of the trackable peers
for all nodes in the network.
For example, say there are 3 nodes: Rick
, Morty
, and Summer
.
First we consider the case that Rick
and Morty
consider Summer
gossipable
and trackable
, respectively.
sequenceDiagram
actor Morty
actor Rick
Note left of Morty: Not currently tracking Summer
Morty->>Rick: GetPeerList
Note right of Rick: Summer isn't in the bloom filter
Rick->>Morty: PeerList - Contains Summer
Note left of Morty: Track Summer and add to bloom filter
Morty->>Rick: GetPeerList
Note right of Rick: Summer is in the bloom filter
Rick->>Morty: PeerList - Empty
This case is ideal, as Rick
only notifies Morty
about Summer
once, and never uses bandwidth for their connection again.
Now we consider the case that Rick
considers Summer
gossipable
, but Morty
does not consider Summer
trackable
.
sequenceDiagram
actor Morty
actor Rick
Note left of Morty: Not currently tracking Summer
Morty->>Rick: GetPeerList
Note right of Rick: Summer isn't in the bloom filter
Rick->>Morty: PeerList - Contains Summer
Note left of Morty: Ignore Summer
Morty->>Rick: GetPeerList
Note right of Rick: Summer isn't in the bloom filter
Rick->>Morty: PeerList - Contains Summer
This case is suboptimal, because Rick
told Morty
about Summer
multiple times. If this case were to happen consistently, Rick
may waste a significant amount of bandwidth trying to teach Morty
about Summer
.
Documentation
¶
Index ¶
Examples ¶
Constants ¶
const (
ConnectedPeersKey = "connectedPeers"
TimeSinceLastMsgReceivedKey = "timeSinceLastMsgReceived"
TimeSinceLastMsgSentKey = "timeSinceLastMsgSent"
SendFailRateKey = "sendFailRate"
)
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶ added in v1.4.10
type Config struct {
HealthConfig `json:"healthConfig"`
PeerListGossipConfig `json:"peerListGossipConfig"`
TimeoutConfig `json:"timeoutConfigs"`
DelayConfig `json:"delayConfig"`
ThrottlerConfig ThrottlerConfig `json:"throttlerConfig"`
ProxyEnabled bool `json:"proxyEnabled"`
ProxyReadHeaderTimeout time.Duration `json:"proxyReadHeaderTimeout"`
DialerConfig dialer.Config `json:"dialerConfig"`
TLSConfig *tls.Config `json:"-"`
TLSKeyLogFile string `json:"tlsKeyLogFile"`
MyNodeID ids.NodeID `json:"myNodeID"`
MyIPPort *utils.Atomic[netip.AddrPort] `json:"myIP"`
NetworkID uint32 `json:"networkID"`
MaxClockDifference time.Duration `json:"maxClockDifference"`
PingFrequency time.Duration `json:"pingFrequency"`
AllowPrivateIPs bool `json:"allowPrivateIPs"`
SupportedACPs set.Set[uint32] `json:"supportedACPs"`
ObjectedACPs set.Set[uint32] `json:"objectedACPs"`
// The compression type to use when compressing outbound messages.
// Assumes all peers support this compression type.
CompressionType compression.Type `json:"compressionType"`
// TLSKey is this node's TLS key that is used to sign IPs.
TLSKey crypto.Signer `json:"-"`
// BLSKey is this node's BLS key that is used to sign IPs.
BLSKey *bls.SecretKey `json:"-"`
// TrackedSubnets of the node.
// It must not include the primary network ID.
TrackedSubnets set.Set[ids.ID] `json:"-"`
Beacons validators.Manager `json:"-"`
// Validators are the current validators in the Avalanche network
Validators validators.Manager `json:"-"`
UptimeCalculator uptime.Calculator `json:"-"`
// UptimeMetricFreq marks how frequently this node will recalculate the
// observed average uptime metrics.
UptimeMetricFreq time.Duration `json:"uptimeMetricFreq"`
// UptimeRequirement is the fraction of time a validator must be online and
// responsive for us to vote that they should receive a staking reward.
UptimeRequirement float64 `json:"-"`
// RequireValidatorToConnect require that all connections must have at least
// one validator between the 2 peers. This can be useful to enable if the
// node wants to connect to the minimum number of nodes without impacting
// the network negatively.
RequireValidatorToConnect bool `json:"requireValidatorToConnect"`
// MaximumInboundMessageTimeout is the maximum deadline duration in a
// message. Messages sent by clients setting values higher than this value
// will be reset to this value.
MaximumInboundMessageTimeout time.Duration `json:"maximumInboundMessageTimeout"`
// Size, in bytes, of the buffer that we read peer messages into
// (there is one buffer per peer)
PeerReadBufferSize int `json:"peerReadBufferSize"`
// Size, in bytes, of the buffer that we write peer messages into
// (there is one buffer per peer)
PeerWriteBufferSize int `json:"peerWriteBufferSize"`
// Tracks the CPU/disk usage caused by processing messages of each peer.
ResourceTracker tracker.ResourceTracker `json:"-"`
// Specifies how much CPU usage each peer can cause before
// we rate-limit them.
CPUTargeter tracker.Targeter `json:"-"`
// Specifies how much disk usage each peer can cause before
// we rate-limit them.
DiskTargeter tracker.Targeter `json:"-"`
}
type DelayConfig ¶ added in v1.6.1
type DelayConfig struct {
// InitialReconnectDelay is the minimum amount of time the node will delay a
// reconnection to a peer. This value is used to start the exponential
// backoff.
InitialReconnectDelay time.Duration `json:"initialReconnectDelay"`
// MaxReconnectDelay is the maximum amount of time the node will delay a
// reconnection to a peer.
MaxReconnectDelay time.Duration `json:"maxReconnectDelay"`
}
type HealthConfig ¶ added in v1.2.1
type HealthConfig struct {
// Marks if the health check should be enabled
Enabled bool `json:"-"`
// MinConnectedPeers is the minimum number of peers that the network should
// be connected to be considered healthy.
MinConnectedPeers uint `json:"minConnectedPeers"`
// MaxTimeSinceMsgReceived is the maximum amount of time since the network
// last received a message to be considered healthy.
MaxTimeSinceMsgReceived time.Duration `json:"maxTimeSinceMsgReceived"`
// MaxTimeSinceMsgSent is the maximum amount of time since the network last
// sent a message to be considered healthy.
MaxTimeSinceMsgSent time.Duration `json:"maxTimeSinceMsgSent"`
// MaxPortionSendQueueBytesFull is the maximum percentage of the pending
// send byte queue that should be used for the network to be considered
// healthy. Should be in (0,1].
MaxPortionSendQueueBytesFull float64 `json:"maxPortionSendQueueBytesFull"`
// MaxSendFailRate is the maximum percentage of send attempts that should be
// failing for the network to be considered healthy. This does not include
// send attempts that were not made due to benching. Should be in [0,1].
MaxSendFailRate float64 `json:"maxSendFailRate"`
// SendFailRateHalflife is the halflife of the averager used to calculate
// the send fail rate percentage. Should be > 0. Larger values mean that the
// fail rate is affected less by recently dropped messages.
SendFailRateHalflife time.Duration `json:"sendFailRateHalflife"`
}
HealthConfig describes parameters for network layer health checks.
type Network ¶
type Network interface {
// All consensus messages can be sent through this interface. Thread safety
// must be managed internally in the network.
sender.ExternalSender
// Has a health check
health.Checker
peer.Network
// StartClose this network and all existing connections it has. Calling
// StartClose multiple times is handled gracefully.
StartClose()
// Should only be called once, will run until either a fatal error occurs,
// or the network is closed.
Dispatch() error
// Attempt to connect to this IP. The network will never stop attempting to
// connect to this ID.
ManuallyTrack(nodeID ids.NodeID, ip netip.AddrPort)
// PeerInfo returns information about peers. If [nodeIDs] is empty, returns
// info about all peers that have finished the handshake. Otherwise, returns
// info about the peers in [nodeIDs] that have finished the handshake.
PeerInfo(nodeIDs []ids.NodeID) []peer.Info
// NodeUptime returns given node's [subnetID] UptimeResults in the view of
// this node's peer validators.
NodeUptime(subnetID ids.ID) (UptimeResult, error)
}
Network defines the functionality of the networking library.
func NewNetwork ¶
func NewNetwork(
config *Config,
minCompatibleTime time.Time,
msgCreator message.Creator,
metricsRegisterer prometheus.Registerer,
log logging.Logger,
listener net.Listener,
dialer dialer.Dialer,
router router.ExternalHandler,
) (Network, error)
NewNetwork returns a new Network implementation with the provided parameters.
func NewTestNetwork ¶ added in v1.9.8
func NewTestNetwork(
log logging.Logger,
networkID uint32,
currentValidators validators.Manager,
trackedSubnets set.Set[ids.ID],
router router.ExternalHandler,
) (Network, error)
type PeerListGossipConfig ¶ added in v1.6.1
type PeerListGossipConfig struct {
// PeerListNumValidatorIPs is the number of validator IPs to gossip in every
// gossip event.
PeerListNumValidatorIPs uint32 `json:"peerListNumValidatorIPs"`
// PeerListPullGossipFreq is the frequency that this node will attempt to
// request signed IPs from its peers.
PeerListPullGossipFreq time.Duration `json:"peerListPullGossipFreq"`
// PeerListBloomResetFreq is how frequently this node will recalculate the
// IP tracker's bloom filter.
PeerListBloomResetFreq time.Duration `json:"peerListBloomResetFreq"`
}
type ThrottlerConfig ¶ added in v1.6.1
type ThrottlerConfig struct {
InboundConnUpgradeThrottlerConfig throttling.InboundConnUpgradeThrottlerConfig `json:"inboundConnUpgradeThrottlerConfig"`
InboundMsgThrottlerConfig throttling.InboundMsgThrottlerConfig `json:"inboundMsgThrottlerConfig"`
OutboundMsgThrottlerConfig throttling.MsgByteThrottlerConfig `json:"outboundMsgThrottlerConfig"`
MaxInboundConnsPerSec float64 `json:"maxInboundConnsPerSec"`
}
type TimeoutConfig ¶ added in v1.6.1
type TimeoutConfig struct {
// PingPongTimeout is the maximum amount of time to wait for a Pong response
// from a peer we sent a Ping to.
PingPongTimeout time.Duration `json:"pingPongTimeout"`
// ReadHandshakeTimeout is the maximum amount of time to wait for the peer's
// connection upgrade to finish before starting the p2p handshake.
ReadHandshakeTimeout time.Duration `json:"readHandshakeTimeout"`
}
type UptimeResult ¶ added in v1.7.0
type UptimeResult struct {
// RewardingStakePercentage shows what percent of network stake thinks we're
// above the uptime requirement.
RewardingStakePercentage float64
// WeightedAveragePercentage is the average perceived uptime of this node,
// weighted by stake.
// Note that this is different from RewardingStakePercentage, which shows
// the percent of the network stake that thinks this node is above the
// uptime requirement. WeightedAveragePercentage is weighted by uptime.
// i.e If uptime requirement is 85 and a peer reports 40 percent it will be
// counted (40*weight) in WeightedAveragePercentage but not in
// RewardingStakePercentage since 40 < 85
WeightedAveragePercentage float64
}