Understanding Kubernetes’ tools/cache package: part 2

In the previous post in this long-running series, we looked at some of the foundations underlying Kubernetes controllers, and started looking into the concepts behind the tools/cache package, specifically ListerWatcher, Store and Reflector. In this post we’ll look at the actual concept of a Controller.

So far, I’ve been careful about capitalization. I’ve written “Kubernetes controller”, not “Kubernetes Controller” or “Kubernetes Controller“. That’s been on purpose. That’s in part because the tools/cache package has a Controller type, which logically sits in front of Reflector. You can see for yourself:

// Controller is a generic controller framework.
type controller struct {
	config         Config
	reflector      *Reflector
	reflectorMutex sync.RWMutex
	clock          clock.Clock
}

type Controller interface {
	Run(stopCh <-chan struct{})
	HasSynced() bool
	LastSyncResourceVersion() string
}

Note, interestingly, that strictly speaking all you have to do to implement the Controller type is to supply a Run() function (for other Java programmers, the stopCh “stop channel” is Go’s way of (essentially) allowing interruption), a HasSynced() function that returns true if, er, synchronization has been accomplished (we’ll look into what that means later), and a LastSyncResourceVersion() function, which returns the resourceVersion of the Kubernetes list of resources being watched and reflected.

An interesting point here is that this means that whereas Reflector was a generic interface completely decoupled from Kubernetes, this interface is conceptually coupled to Kubernetes (note the inclusion of the term Resource and the concept of a resource version, both of which are concepts from the Kubernetes ontology). This observation can help us later on with refining our Java model.

Next, look at the controller struct, which is the state-bearing portion of a Controller implementation. It takes in a Config representing its configuration, and has a “slot” for a Reflector, along with unimportant-to-the-end-user bits related to testing and thread safety.

So what does a Controller do, exactly, that a Reflector doesn’t already do?

A clue to the answer lies in the Config type, which is used as an implementation detail by only one particular implementation of the Controller type:

// Config contains all the settings for a Controller.
type Config struct {
	// The queue for your objects; either a FIFO or
	// a DeltaFIFO. Your Process() function should accept
	// the output of this Queue's Pop() method.
	Queue

	// Something that can list and watch your objects.
	ListerWatcher

	// Something that can process your objects.
	Process ProcessFunc

	// The type of your objects.
	ObjectType runtime.Object

	// Reprocess everything at least this often.
	// Note that if it takes longer for you to clear the queue than this
	// period, you will end up processing items in the order determined
	// by FIFO.Replace(). Currently, this is random. If this is a
	// problem, we can change that replacement policy to append new
	// things to the end of the queue instead of replacing the entire
	// queue.
	FullResyncPeriod time.Duration

	// ShouldResync, if specified, is invoked when the controller's reflector determines the next
	// periodic sync should occur. If this returns true, it means the reflector should proceed with
	// the resync.
	ShouldResync ShouldResyncFunc

	// If true, when Process() returns an error, re-enqueue the object.
	// TODO: add interface to let you inject a delay/backoff or drop
	//       the object completely if desired. Pass the object in
	//       question to this interface as a parameter.
	RetryOnError bool
}

// ShouldResyncFunc is a type of function that indicates if a reflector should perform a
// resync or not. It can be used by a shared informer to support multiple event handlers with custom
// resync periods.
type ShouldResyncFunc func() bool

// ProcessFunc processes a single object.
type ProcessFunc func(obj interface{}) error

(As you read the excerpt above, bear in mind that it is in the controller.go file, but nevertheless contains lots of documentation forward references to types and concepts from other files we haven’t encountered yet.)

So not all Controller implementations have to use this. In fact, Controller is completely undocumented! But realistically, the only Controller implementation that matters, the one returned by the tersely-named New() function, backed by a controller struct, does use it, so we’d better understand it thoroughly.

The first thing to notice is that a Config struct contains a Queue. We can track down the definition for Queue in fifo.go:

// Queue is exactly like a Store, but has a Pop() method too.
type Queue interface {
	Store

	// Pop blocks until it has something to process.
	// It returns the object that was process and the result of processing.
	// The PopProcessFunc may return an ErrRequeue{...} to indicate the item
	// should be requeued before releasing the lock on the queue.
	Pop(PopProcessFunc) (interface{}, error)

	// AddIfNotPresent adds a value previously
	// returned by Pop back into the queue as long
	// as nothing else (presumably more recent)
	// has since been added.
	AddIfNotPresent(interface{}) error

	// Return true if the first batch of items has been popped
	HasSynced() bool

	// Close queue
	Close()
}

So loosely speaking any Queue implementation must also satisfy the Store contract. In Java terms, this means a hypothetical Queue interface would extend Store. Let’s file that away for later.

We can also see some concept leakage here: recall that Controller implementations must have a HasSynced function, but its purpose and reason for being are undocumented. When we look back at the controller–struct-backed implementation of Controller, one possible implementation of the Controller type, we can see that the implementation of its HasSynced function merely delegates to that of the Queue contained by its Config. So there is a tacit assumption that a Controller implementation will most likely be backed by a Queue, though this is not strictly speaking required, since that would be the easiest way to implement the HasSynced function. This also serves as the only documentation we’re going to get about what that function is supposed to do: Return true if the first batch of items has been popped.

Back to the Config. It also contains a ListerWatcher. Hey! We’ve seen one of these before. We had realized that it is a core component of a Reflector. So why doesn’t a Config merely have a Reflector? Why is encapsulation broken here—isn’t ListerWatcher basically an implementation detail of a Reflector? Yes, and there doesn’t seem to be a good reason. We can tell from some source code later on that when the controller–struct-backed implementation of Controller‘s Run function is called, only one possible implementation of such a function, it creates a Reflector just-in-time using the ListerWatcher housed by the Config. Why the Reflector isn’t passed in as part of the Config is an open question. At any rate, logically speaking, part of a controller–struct-backed implementation of the Controller interface is a Reflector.

Next up is the first actually interesting part that we really haven’t encountered before, which gives us a better idea of what a Controller is supposed to do: the ProcessFunc. A ProcessFunc appears, from its documentation, to do something to a Kubernetes resource:

ProcessFunc processes a single object.

So even from this little bit of documentation we can see that ultimately a Controller implementation that happens to use a Config (remember, it’s not required to) will not only cause caching of Kubernetes resources into a Store (remember, it’s not required to), but will presumably work on objects found in that Store as well. These are shaky assumptions, and not enforced by any contract, but turn out to be quite important, not just to the de facto standard implementation of the Controller type (the controller–struct-backed one that uses a Config), but to the implied, but not technically specified, contract of the Controller type itself.

Summed up, most Controller implementations probably should use a Reflector to populate a Queue (a particular kind of Store) with Kubernetes resources of a particular kind, and then also processes the contents of that Queue, presumably by Popping objects off of it.

Again, it is worth noting that this summary is based on one particular implementation of the type (the de facto standard controller–struct-backed one), and not on the type’s contract itself, but elsewhere in the package you will be able to see that this is expected, if not enforced, behavior of any Controller implementation.

We might visually model all this like so:

Here, a Controller uses a Reflector to populate a Queue, and has a protected process method that knows how to do something with a given object. It also has a protected shouldResync() method, a public hasSynced() method, and the ability to report what the last Kubernetes resource version was after a synchronization. We’ll greatly refine and refactor this model over time.

In the next post, we’ll look at informers and shared informers which build on top of these foundations, again with an eye towards modeling all this idiomatically in Java.

Author: Laird Nelson

Devoted husband and father; working on Helidon at the intersection of Java, Jakarta EE, architecture, Kubernetes and microservices at Oracle; open source guy; Hammond B3 player and Bainbridge Islander. View all posts by Laird Nelson

Share this:

Author: Laird Nelson

3 thoughts on “Understanding Kubernetes’ tools/cache package: part 2”