README
¶
Trimaran: Load-aware scheduling plugins
Trimaran is a collection of load-aware scheduler plugins described in Trimaran: Real Load Aware Scheduling.
Currently, the collection consists of the following plugins.
TargetLoadPacking
: Implements a packing policy up to a configured CPU utilization, then switches to a spreading policy among the hot nodes. (Supports CPU resource.)LoadVariationRiskBalancing
: Equalizes the risk, defined as a combined measure of average utilization and variation in utilization, among nodes. (Supports CPU and memory resources.)
The Trimaran plugins utilize a load-watcher to access resource utilization data via metrics providers. Currently, the load-watcher
supports three metrics providers: Kubernetes Metrics Server, Prometheus Server, and SignalFx.
There are two modes for a Trimaran plugin to use the load-watcher
: as a service or as a library.
load-watcher as a service
In this mode, the Trimaran plugin uses a deployed load-watcher
service in the cluster as depicted in the figure below. A watcherAddress
configuration parameter is required to define the load-watcher
service endpoint. For example,
watcherAddress: http://xxxx.svc.cluster.local:2020
Instructions on how to build and deploy the load-watcher
can be found here. The load-watcher
service may also be deployed in the same scheduler pod, following the tutorial here.
load-watcher as a library
In this mode, the Trimaran plugin embeds the load-watcher
as a library, which in turn accesses the configured metrics provider. In this case, we have three configuration parameters: metricProvider.type
, metricProvider.address
and metricProvider.token
.
The configuration parameters should be set as follows.
metricProvider.type
: the type of the metrics providerKubernetesMetricsServer
(default)Prometheus
SignalFx
metricProvider.address
: the address of the metrics provider endpoint, if needed. For the Kubernetes Metrics Server, this parameter may be ignored. For the Prometheus Server, an example setting ishttp://prometheus-k8s.monitoring.svc.cluster.local:9090
metricProvider.token
: set only if an authentication token is needed to access the metrics provider.
The selection of the load-watcher
mode is based on the existence of a watcherAddress
parameter. If it is set, then the load-watcher
is in the 'as a service' mode, otherwise it is in the 'as a library' mode.
In addition to the above configuration parameters, the Trimaran plugin may have its own specific parameters.
Following is an example scheduler configuration.
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
profiles:
- schedulerName: trimaran
plugins:
score:
enabled:
- name: LoadVariationRiskBalancing
pluginConfig:
- name: LoadVariationRiskBalancing
args:
metricProvider:
type: Prometheus
address: http://prometheus-k8s.monitoring.svc.cluster.local:9090
safeVarianceMargin: 1
safeVarianceSensitivity: 2
Configure Prometheus Metric Provider under different environments
- Invalid self-signed SSL connection error for the Prometheus metric queries
The Prometheus metric queries may have invalid self-signed SSL connection error when the cluster
environment disables the skipInsecureVerify option for HTTPs. In this case, you can configure
insecureSkipVerify: true
formetricProvider
to skip the SSL verification.
args:
metricProvider:
type: Prometheus
address: http://prometheus-k8s.monitoring.svc.cluster.local:9090
insecureSkipVerify: true
- OpenShift Prometheus authentication without tokens.
The OpenShift clusters disallow non-verified clients to access its Prometheus metrics. To run the
Trimaran plugin on OpenShift, you need to set an environment variable
ENABLE_OPENSHIFT_AUTH=true
for your trimaran scheduler deployment when run load-watcher as a library.
A note on multiple plugins
The Trimaran plugins have different, potentially conflicting, objectives. Thus, it is recommended not to enable them concurrently. As such, they are designed to each have its own load-watcher.
Documentation
¶
Index ¶
Constants ¶
const (
// MegaFactor : Mega unit multiplier
MegaFactor = float64(1. / 1024. / 1024.)
)
Variables ¶
This section is empty.
Functions ¶
func GetMuSigma ¶
func GetMuSigma(rs *ResourceStats) (float64, float64)
GetMuSigma : get average and standard deviation from statistics
func GetResourceData ¶
func GetResourceData(metrics []watcher.Metric, resourceType string) (avg float64, stDev float64, isValid bool)
GetResourceData : get data from measurements for a given resource type
func GetResourceRequested ¶
func GetResourceRequested(pod *v1.Pod) *framework.Resource
GHetResourceRequested : calculate the resource demand of a pod (CPU and Memory)
Types ¶
type Collector ¶
type Collector struct {
// contains filtered or unexported fields
}
Collector : get data from load watcher, encapsulating the load watcher and its operations
Trimaran plugins have different, potentially conflicting, objectives. Thus, it is recommended not to enable them concurrently. As such, they are currently designed to each have its own Collector. If a need arises in the future to enable multiple Trimaran plugins, a restructuring to have a single Collector, serving the multiple plugins, may be beneficial for performance reasons.
func NewCollector ¶
func NewCollector(trimaranSpec *pluginConfig.TrimaranSpec) (*Collector, error)
NewCollector : create an instance of a data collector
func (*Collector) GetNodeMetrics ¶
func (collector *Collector) GetNodeMetrics(nodeName string) ([]watcher.Metric, *watcher.WatcherMetrics)
GetNodeMetrics : get metrics for a node from watcher
type PodAssignEventHandler ¶
type PodAssignEventHandler struct {
// Maintains the node-name to podInfo mapping for pods successfully bound to nodes
ScheduledPodsCache map[string][]podInfo
sync.RWMutex
}
This event handler watches assigned Pod and caches them locally
func New ¶
func New() *PodAssignEventHandler
Returns a new instance of PodAssignEventHandler, after starting a background go routine for cache cleanup
func (*PodAssignEventHandler) AddToHandle ¶
func (p *PodAssignEventHandler) AddToHandle(handle framework.Handle)
AddToHandle : add event handler to framework handle
type ResourceStats ¶
type ResourceStats struct {
// average used (absolute)
UsedAvg float64
// standard deviation used (absolute)
UsedStdev float64
// req of pod
Req float64
// node capacity
Capacity float64
}
ResourceStats : statistics data for a resource
func CreateResourceStats ¶
func CreateResourceStats(metrics []watcher.Metric, node *v1.Node, podRequest *framework.Resource,
resourceName v1.ResourceName, watcherType string) (rs *ResourceStats, isValid bool)
CreateResourceStats : get resource statistics data from measurements for a node
Directories
¶
Path | Synopsis |
---|---|
Package loadvariationriskbalancing plugin attempts to balance the risk in load variation across the cluster.
|
Package loadvariationriskbalancing plugin attempts to balance the risk in load variation across the cluster. |