controlled-job

command module
v0.0.0-...-1670326 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 10, 2024 License: Apache-2.0 Imports: 13 Imported by: 0

README

ControlledJob CRD

For workloads which need to run only during particular periods of the day

A ControlledJob is a resource which specifies:

  • the definition of a Job to be run (a plain K8s JobSpec)
  • the schedule when we want that Job to be run. For example 'every weekday between 9am and 5pm, in the London timezone'

During the specified schedule, the controlled-job-operator will ensure that a Job object with a matching spec exists, and when the schedule says to stop, the Job is deleted.

Features:

  • Control over what happens when the JobSpec specification on the ControlledJob changes while a Job is currently running. Either stop the old Job and start a new one with the new spec, or ignore it until the next scheduled run
  • The ability to override the schedule manually. If a Job is manually created with the correct metadata, it will become managed by the matching ControlledJob. This allows use cases where the starting of a Job depends on external conditions (the successfuly completion of a batch job to prepare data for the Job perhaps) or when there's a need to start a Job earlier one day for some reason, but we still want the ongoing monitoring, restarting, and stopping to be handled according to the schedule
  • Strong guarantees about exclusive running of the Job. If a Job is restarted for any reason, the controlled-job-operator will start it in a suspended state, and only unsuspend it when it's sure any previous Job can no longer be running.
  • Pesimistic error handling. The system will not automatically retry failing Jobs, or restart Jobs that have exited cleanly during their scheduled time, to provide the user with the flexibility to choose how those cases are handled; settings on the JobSpec provided by Kubernetes already allow configuration of how to handle restarts and failures of a Job (eg retry up to 3 times before giving up). The logic from the ControlledJob side is simple: ensure a Job exists (in any state - starting, running, failed, succeeded) during the scheduled period, and is deleted outside of that period. The user can trigger a restart of a ControlledJob simply by deleting the current Job, which will trigger the controlled-job-operator to create a brand new Job in its place.
  • Comprehensive status conditions, that can be used to drive alerting and health checks
  • The ability to mutate the new Job specification at creation time. For example, a dynamic image tag lookup, or substituting the current date into an env var on the created Pod. Specify a URL to a service which should behave like a standard K8s mutating webhook for Jobs and it will be called before any Job is created.

Example

apiVersion: batch.gresearch.co.uk/v1
kind: ControlledJob
metadata:
  name: controlledjob-sample
spec:

  # Timezone is any standard tz database timezone name
  # Optionally with an additional static offset (in seconds)
  timezone:
    name: "GMT"
    offset: 3600 # 1h, making the overall timezone 'GMT + 1h'

  # Any number of scheduled events. Each one is either 'start' or 'stop' and 
  # schedule can be timeOfDay & daysOfWeek, or a calid CRONTAB entry
  events:
    - action: "start"
      schedule:
        timeOfDay: "09:00"
        daysOfWeek: "MON-FRI"        
    - action: "stop"
      cronSchedule: "0 17 * * MON-FRI"

  # Template for the job to create. Any valid JobSpec is accepted
  jobTemplate:
    metadata:
      labels:
        foo: bar
    spec:
      backoffLimit: 3
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from my ControlledJob
          restartPolicy: OnFailure

Developer guide

This operator is built on the standard controller-runtime library using Kubebuilder and so should be familiar to anyone used to developing K8s controllers.

The main logic lives under pkg/reconciliation which is a good place to start reading.

Contributing

We welcome bug fixes, issue reports, and documation improvements, however feature requests or additions are generally not in scope. Please open an issue to discuss any potential feature work and read our contributing guide for more details on how to contribute.

Community Guidelines

Please read our code of conduct before participating in or contributing guide to this project.

Security

Please see our security policy for details on reporting security vulnerabilities.

License

ControlledJob is licensed under the Apache Software License 2.0 (Apache-2.0) SPDX-License-Identifier: Apache-2.0

Documentation

Overview

Copyright 2021.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Directories

Path Synopsis
api
v1
Package v1 contains API Schema definitions for the batch v1 API group +kubebuilder:object:generate=true +groupName=batch.gresearch.co.uk
Package v1 contains API Schema definitions for the batch v1 API group +kubebuilder:object:generate=true +groupName=batch.gresearch.co.uk
cli
pkg
job
k8s

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
JackTT - Gopher 🇻🇳