Helm from Java, Part 5

I’ve amended my prior class diagram to reflect the fact that ChartVersion “is a” Metadata:

ChartRepository

Metadata is a generated structure—part of the Tiller API—that represents an actual Helm Chart.yaml file.

Why is this important?  Because now you can see the concepts: an IndexFile is basically a collection of all the Chart.yamls conceptually held (usually in gzip archives) by a ChartRepository.  And a ChartRepository is really not much more than an IndexFile.  This whole structure can be greatly simplified.

The way I see it, the fact that this notional collection of Chart.yamls is a file is incidental, and hence IndexFile is a very poor name indeed.  Structurally, all it is is a collection of things.  We can also tell that because of the Get() function signature, an item in that collection is uniquely identified (or at least should be!) by a name and a version.  This means that this notional collection of Chart.yamls is a set (and not, say, a list, or some sort of other collection type that permits duplicates).

Next, we can tell from the Add() function that really what you’re doing with this function is creating a new ChartVersion structure and adding it.  That furthers our suspicion that really what we’re dealing with here is some sort of collection type of ChartVersions, and nothing else.

If represented in Java, it really doesn’t need to be any more complicated than a Map.  In this case, it is a Map of ChartVersions (and hence Metadata instances) indexed by a key consisting of name and version.  So a ChartRepository too is really not much more than this Map located somewhere else, together with the information needed to talk to it.

Next, ChartVersion is also a poor name.  Something like ChartDescriptor is probably better: the fact that a Version attribute happens to be one of the things it logically contains doesn’t—it seems to me—elevate it to primacy here.  What this thing is is really a full description of the state of a Helm chart (the Metadata aspect of its nature) and some information about where you can find the real thing (the URLs attribute of its ChartVersion nature).

So instead of an IndexFile having many ChartVersions that it calls its Entries (?!), in Java I propose a Map (probably a SortedMap) of ChartDescriptors indexed by ChartDescriptorKey instances.  Obviously you can create one of these from an index.yaml file from a chart repository—but just as importantly you could create it from something else.

Helm from Java, Part 4

I’ve created a UML class diagram to help me decipher the main structures in the Go code of Helm around chart repositories and their related concepts:

ChartRepository.png

For the most part I stayed true to UML notation.  Similarly-colored boxes are from the same source files:

A few interesting things stand out here.

First, from a conceptual standpoint, a ChartRepository is basically a glorified wrapper around an IndexFile, which is, in turn, a glorified wrapper around a set of ChartVersions.  A ChartVersion, in turn, is also a Metadata, which is a gRPC/Protocol Buffers object you’ve seen before (I didn’t indicate that relationship here on this diagram, but probably should have).  ChartVersions are stored (well, should be stored) within their containing IndexFile sorted by their version.  Go maybe doesn’t have the same concept as Java’s sorted collections and Comparators, so there’s some additional sorting logic that really you don’t need in the IndexFile concept.

ChartRepository instances are notionally “stored” in a RepoFile, but really what the RepoFile stores is a “pointer” to a ChartRepository—a primary key, of sorts—called, confusingly, an Entry.  More confusingly, a ChartRepository refers to its identifying Entry as its Config!  But if you think of an Entry as a primary key of a ChartRepository you should do OK.  A RepoFile is basically a pile of Entry instances, so, again, notionally, a collection of ChartRepository pointers.  (There is some question about why this data structure permits duplicate Entry instances (i.e. with the same Name but different contents); see Helm issue #2606 for details.)

I’m in the process of translating this to Java, and I think the resulting object model is going to look a lot cleaner, while delivering the same functionality, and permitting alternate implementations of chart storage and discovery.  Stay tuned.

Helm from Java, Part 3

If you use microbean-helm, you are exposed to the gRPC-colored view of the Tiller server.  (If you don’t know what I’m talking about, see part 1 and part 2.)

That view of things has a lot of gRPC cruft in it that for the most part you won’t be concerned with.  To help show you what the conceptual structure of the Tiller object model really is, I put together a UML class view of the main object model exposed by Tiller, eliminating the gRPC methods you’re not likely to use:

Tiller

So fundamentally in the Helm/Tiller ecosystem, you’re working with charts, that, when installed, result in releases.  In the gRPC-generated object model, a Chart has a Metadata, several files, many values, and many Templates.

Each of these objects represents one of the items in the Helm chart file structure.  But, obviously, because with microbean-helm these are Java objects you can create them from any source you like.

Of note here is the use of Any for non-template files (like NOTES.txt).  While there are apparently lots of different ways to use this general purpose class, the Helm ecosystem appears to encode a chart file’s simple relative filename as the return value of its getTypeUrl() method, and its textual content as the return value of its getData() method. It’s not entirely clear whether Helm and Tiller got this right, but that is currently the behavior, so there you go.

It strikes me that, fuzzily, there are interesting opportunities here that involve microbean-helm, fabric8’s DefaultKubernetesClient, fabric8’s DockerClient, and so on to create Tiller-compatible charts but using plain Java.

Helm from Java, part 2

In the previous post, I outlined how microbean-helm produces the Java bindings for Helm, the Kubernetes package manager.  Specifically, it creates and packages up the gRPC Java code that describes the API surfaced by the Tiller server-side component of Helm (the thing that does the heavy lifting in the Helm ecosystem).

In the latest revision of the project, I’ve done some road grading to make it easier to work with the (cumbersome) gRPC API.  Here is how you connect to Tiller from Java and ask it for a particular release.  That is, we’re basically doing the Java equivalent of helm history someRelease:

Under the covers, this sets up a port forwarding connection to the first Pod in the cluster that is running a healthy instance of Tiller (hopefully there is exactly one) and establishes all the gRPC plumbing for you (which is not very straightforward).  As long as you ensure that (in this example) both the DefaultKubernetesClient and the Tiller object are closed, as is done in this example, you won’t leak resources.

Happy Helming from Java!

Helm from Java!

If you work with Kubernetes, you have surely encountered Helm, the more-official-than-the-alternatives package manager for the Kubernetes platform.

Helm is a command-line tool, written in Go, that interacts with a Kubernetes cluster to make managing all the various Kubernetes resources a little easier and less of a frantic exercise in watching things break and restart automatically.  It is indispensable.  You can read more about it at its Github repository.

Helm’s machinery consists of two parts: the command line tool, helm, and the server-side componentry that stays mostly hidden behind the scenes (Tiller).  (The nautical imagery gets very old very fast but there’s no escaping it.)

When you install helm, the first thing you do (typically) is to run helm init.  This sets up some housekeeping directories and such locally, and also, very conveniently, programmatically constructs Kubernetes Deployment, Service and (optionally) Secret resources that together cause Tiller to be deployed into Kubernetes so that the helm command line tool can talk to it.  It’s a very simple idempotent bootstrap operation and is key to helm‘s simplicity.

Once you have Tiller running in your Kubernetes cluster, the helm command line tool talks to it to do the heavy lifting.  Tiller, in other words, is the real workhorse, and helm is a glorified curl for it.  (Representing it this way is a great disservice to the Helm team, of course, and I’m not actually serious, but it should help you to put the pieces together mentally.)

Now suppose you wanted for whatever crazy reasons to do the following from a Java library:

  • Install Tiller if it isn’t already there
  • Talk to it using a Java API

If you use my (early days!) microbean-helm project, you can now do this—no command line tooling required.

The Tiller API itself is defined using protocol buffers, which means you can generate it using gRPC.  That is handled by microbean-helm’s pom.xml file, which arranges for the protocol buffers files to be checked out of the official Helm Github repository and compiled appropriately. (Then, because they’re now part of a regular old Maven Java project, you can make Javadocs out of them too.)

So the generated Tiller API lets you talk to Tiller, which is the real heart of the whole Helm system.  With that, you can write any number of helm-like tools in Java.

Of course, you need a Tiller server to talk to.  The installation part is something that I hand-tooled by following the logic in the Helm installer code (invoked indirectly via helm init) and making it idiomatic for a Java library.  Just as with helm init, you can install Tiller if it isn’t there, or upgrade it if it is.

The net effect is that armed with this Java library you can now install or upgrade Tiller, install Helm charts, and otherwise work with Helm artifacts without having to drop down to the command line. Happy Helming—from Java!

Components and Environments

I am playing around with the concepts of components and environments in my microbean experiments.

The fundamental idea behind microbean is that with CDI 2.0 every last piece of your application can be its own self-contained artifact (including the main class with its main() method).  Each such artifact is a CDI bean archive.  When you put them all together on the classpath and run java org.microbean.main.Main, then your application works.  This might include running servers, deploying things, and so on, but it’s all loosely coupled and tiny. It is the act of bringing these archives together—drawing a lasso around them with a CLASSPATH—that constitutes deployment.

As we’ve seen, CDI has the notion of a bean archive, but it doesn’t really have a notion of a group of bean archives.  I am going to use the wonderfully overloaded term component to describe this concept.

These components might need some other facilities provided by various portable extensions.  There is a hazy way in which a group of portable extensions providing services is different from a group of bean archives providing business logic.  I’ll call this aggregation an environment.

With these provisional terms in place, we can thus say that components run in environments.  We can also say that environments support components.

Then the act of deploying a microbean application becomes: pick one or more environments and one or more components and put them on the classpath, and then run java org.microbean.main.Main (or any other main class that simply starts up a CDI SE container and shuts it down; this just happens to be a convenient prefabricated one).

How could we represent these components?  On disk, we don’t really want to represent them at all.  A component as used here is a notional construct, after all: it’s a bunch of jars that belong together in some way.  For management purposes, however, representing them in Maven pom.xml files of a particular kind looks like a good way to go.

This also seems to be a good way to represent environments as well.

Maven’s pom.xml file and the dependency graph it implies is one of the most misunderstood artifacts in all of Java software engineering.  Most developers know that it describes a project and its dependencies, but did you know you can include a non-jar project pom.xml file (an artifact of type pom) as a dependency itself, thus effectively pulling in (transitively) its dependencies?  This can lead to version surprises, but did you know that a pom.xml‘s section can be used to lock down the version of a given artifact, whether it appears directly or transitively as a dependency? Finally, to complete the puzzle, did you know that a pom.xml can, itself, appear in its own <dependencyManagement> section, thus offering a way to lock its own version down as well as those of its dependencies?

These features mean that we can define both components and environments as artifacts of type pom.  I’ll be exploring these concepts over the next few blog posts.

Kubernetes Events Can Be Complicated

I stumbled across this the other day and wanted to write it down for posterity.

In Kubernetes, you can—and at this point I’m speaking loosely, but will tighten things up below—subscribe to a stream of WatchEvents that describe interesting things happening to various Kubernetes resources in your Kubernetes cluster.

What is somewhat mind-altering is that one of the kinds of resources whose WatchEvent stream you can subscribe to is the Kubernetes Event kind.  These two things are different.

Whoa.

Now it’s time to get very, very specific with concepts and typography.  If I am speaking about a Kubernetes resource, you’ll see it capitalized in fixed-width type, like so:

Pod

…and I will do my best to prefix it with the word “Kubernetes”:

Kubernetes Pod

If I am just talking semantics, you’ll see the term in a normal typeface without the word “Kubernetes” in front of it.

If you’re programming in Java, and are using the fabric8 Kubernetes client (the only Java Kubernetes client that I’m aware of), you can receive all events from your Kubernetes cluster by following this recipe:

final Watch closeMe = client.events().inAnyNamespace().watch(new Watcher() {
  @Override
  public final void eventReceived(final Action action, final Event resource) {
  
  }

  @Override
  public final void onClose(final KubernetesClientException kubernetesClientException) {

  }
});

(Note that WordPress’s horrid editor sometimes eats the angle brackets necessary to include the generic type parameter of Event that should follow new Watcher above.)  The eventReceived() method will get called asynchronously as the cluster does interesting things, and you can root around in the contents of the received event to see what happened.  The action will be one of ADDED, MODIFIED, DELETED or ERROR.  Simple, right?

So I was messing about with the bleeding-edge service-catalog project, and installing it in minikube and uninstalling it and generally thrashing around breaking things.  I was somewhat surprised to receive an io.fabric8.kubernetes.model.Event in this stream together with an action equal to DELETED (!) that looked like this (I formatted the output for some degree of legibility below):

Event(
  apiVersion=v1,
  count=1,
  firstTimestamp=2017-04-25T23:41:54Z, 
  involvedObject=ObjectReference(
    apiVersion=v1, 
    fieldPath=spec.containers{controller-manager}, 
    kind=Pod,
    name=catalog-catalog-controller-manager-1242994143-ddl0l,
    namespace=catalog,
    resourceVersion=462865,
    uid=11fc24bf-2a05-11e7-a27a-080027117396,
    additionalProperties={}
  ), 
  kind=Event,
  lastTimestamp=2017-04-25T23:41:54Z,
  message=Started container with id 7b51c389f153832e7719a99738706c2ff38aa28b298b80741f439b712f166262, 
  metadata=ObjectMeta(
    annotations=null,
    clusterName=null,
    creationTimestamp=2017-04-25T23:41:54Z,
    deletionGracePeriodSeconds=null,
    deletionTimestamp=null,
    finalizers=[],
    generateName=null,
    generation=null,
    labels=null,
    name=catalog-catalog-controller-manager-1242994143-ddl0l.14b8c87cc177fb77, 
    namespace=catalog,
    ownerReferences=[],
    resourceVersion=472706,
    selfLink=/api/v1/namespaces/catalog/events/catalog-catalog-controller-manager-1242994143-ddl0l.14b8c87cc177fb77,
    uid=c3851fae-2a10-11e7-a27a-080027117396,
    additionalProperties={}
  ),
  reason=Started,
  source=EventSource(
    component=kubelet,
    host=minikube,
    additionalProperties={}
  ),
  type=Normal,
  additionalProperties={}
)

So to the naïve eye, this is some sort of event that represents the deletion of something else.  But maybe it also represents the starting of a container (see the bold highlights above)?  And there is a kind=Pod property buried in there, but there’s also a kind=Event, and if this is a deletion, how come the deletionTimestamp property is null?  And if this is a deletion, how come the reason property is Started?

To understand this, we need to go to the source.

First, let’s look at the fabric8 watch machinery and see how it’s calling our Watcher implementation.  You’ll note there that the code is taking delivery of a JSON payload, supplied over WebSockets by the Kubernetes cluster, of the Kubernetes WatchEvent “kind”. So fundamentally the things being received by fabric8’s watch machinery are Java representations of Kubernetes WatchEvents.

OK, fine.  What’s a Kubernetes WatchEvent?  It is a (semantic) event that describes an addition, deletion or modification of a Kubernetes resource.  It has a type field saying whether it’s an addition, deletion or modification, and an object field that holds a representation of the Kubernetes resource that was added, deleted or modified.

OK, so what’s a Kubernetes resource?  A Kubernetes resource is one of its “things” (a Kubernetes Pod, a Kubernetes Deployment, a Kubernetes ReplicaSet, etc. etc.).  You can get a pretty good idea (maybe an exhaustive idea) of what sorts of things we’re talking about by looking at the reference documentation.

Easy so far.

But another kind of Kubernetes resource is a Kubernetes Event.

So it must follow that a Kubernetes WatchEvent can describe the addition, deletion or modification of a Kubernetes Event, because a Kubernetes Event is a kind of Kubernetes resource.

I don’t know about you, but that kind of blew my mind a little bit.  (I also don’t want to think about what happens if Kubernetes WatchEvents are also capable of being watched!)

So now that we know this, we know this too:

The io.fabric8.kubernetes.model.Event that your Watcher implementation is handed, when your Watcher implementation is constructed with the Java code listed earlier in this blog post, is really the Java representation of the JSON present in a Kubernetes WatchEvent‘s object field, and the Action that your Watcher implementation is handed is really the Java representation of the JSON present in a Kubernetes WatchEvent‘s type field.

So the (semantic) event we received reads (semantically) something like this:

Hello! This is a Kubernetes WatchEvent with a type of Deleted informing you that the Kubernetes resource it is talking about, a Kubernetes Event, describing the starting of a particular Kubernetes Pod‘s container, was deleted from the record of such things.

This suggests three interesting things as well (which I haven’t researched, so this may be common knowledge, but it was interesting to me!).

  1. The first thing is that Kubernetes Events are capable of being deleted.
  2. The second thing is that Kubernetes Events are capable of being stored.
  3. The third thing is that therefore Kubernetes Events serve as a persistent record of a Kubernetes resource’s state over time.

To Java programmers (like yours truly) used to thinking of (semantic) events as transient announcements of in-flight state (think Swing), this takes a little mental reorientation.

Once you are successfully mentally reoriented, however, it makes sense that when a Kubernetes resource notionally described by certain (definitionally persistent) Kubernetes Events is deleted, then so too are its describing Kubernetes Events.

And when a Kubernetes resource is created, so too are (definitionally persistent) Kubernetes Events describing its creation.  And so it follows that you can therefore get Kubernetes WatchEvents delivered to you describing not just “normal” resource additions and deletions but also, if you wish, Kubernetes Event additions and deletions.  In fact, these are exactly and the only Kubernetes WatchEvents you will get delivered to you if you type client.events().inAnyNamespace().watch(myWatcher).

This also suggests that the kubectl get events --watch-only output is doing some interesting unpacking and reassembling of things under the covers.

The command basically sets up a watch using the very same REST endpoint that the fabric8 Kubernetes client recipe detailed above ends up talking to, and receives the very same information.  But its output looks like this (depending on how you’re reading this blog post, you’ll probably have to scroll the following to see the (wide) output):

$ kubectl get events --watch-only
LASTSEEN                      FIRSTSEEN                     COUNT NAME    KIND       SUBOBJECT TYPE   REASON            SOURCE                MESSAGE
2017-04-27 10:18:36 -0700 PDT 2017-04-27 10:18:36 -0700 PDT 1     busybox Deployment           Normal ScalingReplicaSet deployment-controller Scaled up replica set busybox-2844454261 to 1

Note how this makes things look (properly!) like a semantic event occurred representing the scaling up of a particular deployment.

But under the covers, things are a little different.

The corresponding fabric8 io.fabric8.kubernetes.model.Event received with a corresponding Action of type ADDED looks like this (when output as a Java String via its toString() method:

Event(
  apiVersion=v1,
  count=1,
  firstTimestamp=2017-04-27T17:18:36Z,
  involvedObject=ObjectReference(
    apiVersion=extensions,
    fieldPath=null,
    kind=Deployment,
    name=busybox,
    namespace=default,
    resourceVersion=556693,
    uid=8c979eeb-2b6d-11e7-a27a-080027117396,
    additionalProperties={}),
  kind=Event,
  lastTimestamp=2017-04-27T17:18:36Z,
  message=Scaled up replica set busybox-2844454261 to 1,
  metadata=ObjectMeta(
    annotations=null,
    clusterName=null,
    creationTimestamp=2017-04-27T17:18:36Z,
    deletionGracePeriodSeconds=null,
    deletionTimestamp=null,
    finalizers=[],
    generateName=null,
    generation=null,
    labels=null,
    name=busybox.14b950bb4d3ec7c1,
    namespace=default,
    ownerReferences=[],
    resourceVersion=556695,
    selfLink=/api/v1/namespaces/default/events/busybox.14b950bb4d3ec7c1,
    uid=8c99063e-2b6d-11e7-a27a-080027117396,
    additionalProperties={}),
  reason=ScalingReplicaSet,
  source=EventSource(
    component=deployment-controller,
    host=null,
    additionalProperties={}),
  type=Normal,
  additionalProperties={}
)

Let us remember that what is being printed here is a Java representation of the Kubernetes Event that was the contents of the object field of a Kubernetes WatchEvent whose type field‘s value was Added.

You can see from the things that I’ve bolded above that:

  • The thing being Added is a Kubernetes Event
  • Its name is busybox.14b950bb4d3ec7c1
  • It “involves” an object (resource) whose kind is a Kubernetes Deployment and whose name (identifier) is busybox
  • The KubernetesEvent‘s message is “Scaled up replica set busybox-2844454261 to 1
  • The reason for the Kubernetes Event‘s creation is “ScalingReplicaSet
  • The selfLink referencing the Kubernetes Event being described references a REST endpoint in the /events “space” with the Kubernetes Event‘s name (busybox.14b950bb4d3ec7c1) as the identifier within that space.  Note particularly this is not the identity of the (involved) Kubernetes Deployment, i.e. it is not “busybox“.

I hope this helps you make sense of your Kubernetes event streams!