Because bugs are often very disparate and buried in a nuanced stack of software, it might be useful to keep a list.
In Go, I had a package level map
that associated UUID strings to helm charts.
I thought to clear out the map, or "cache", with the invocation of a refresh function.
func refreshChartCache() { cacheSkills := make(map[string]*pb.ChartDescription) }
There was a subtlety which may be obvious to the regular Go programmer.
Not only did the :=
operator clear out the map (well, it didn't), it shadowed the package level variable.
The function modified the variable it created, and the package level variable remained the same.
This operator certainly merits some best practices.
For work, I supported the configuration of my application (a simple binary written in Go) with a configuraton file.
This application I later containerzed, and then helmized.
Helm provides its own style of configuration file with a values.yaml
.
Being clever and having fortunately specified my own configuration file as YAML format, I changed the parsing to match values.yaml
's structure.
This greatly simplified plumbing configuration data from helm to my container.
So, my helm chart would both use its values file normally, but also expand it in a config map to the container.
Then, at install time, if the values file were overridden with helm install --values ...
, the underlying container would too be configured.
Unfortunately, things get tricky with this wrapping. Things work fine and dandy when running from Helm. When running from a standard container, however, there is no config map built into the deployment. There is no file with which to merge by overriding. In other words, if one runs the container outside of Helm, the values file must specify more fields. Those fields the container may have been able to default on with fields provided by the Helm chart.
After our Helm repositroy started having issues at work, this came to light. It cost a great deal of time finding why some values were not being provided to my container. I was running it with raw docker commands to get around the repository down time.
While working on sky-castle, I tried to spin up a matrix (synapse) server.
When running docker compose
for it, my forgejo
server stopped working.
The web page would only show the word "verify".
It turns out that both compose.yaml
files had the same named service: db
.
Despite being on different networks, this collided within docker
's internals.
When building with Earthly, I had a build target with a RUN --push
instruction.
Immediately after that instruction, a new context was introduced (a new FROM
).
This triggered an upstream bug in Earthly so that the push was never run.
My coworker John assumed this was related to phases in Earthly.
He wrote a bug report with an MRE.
Working in defense, we often wonder, which software would be acceptible to be fielded. This is a tricky question giving software's tendency to spiraling dependencies.
I have looked for an official, universal, comprehensive list. Apparently there is no such GRAS (generally recognized as safe) list. At least one office provides something along that vein. Sales Force meets many standards of various levels from various governments
It would be interesting to do a research project on this as a blog post, perhaps formatted as IEEE Markdown See: https://owl.purdue.edu/owl/research_and_citation/ieee_style/ieee_overview.html
I do not often work in C, but on a new project, I assigned some references of a structure to the memory addresses passed as command line arguments. This did not pose an immediate issue, but I was forewarned to do a deep copy soon after. This could have quickly become an NPE.
We have a simple C binary from a vendor wrapped in our own custom container. It has worked for months, then near the end of sprint, it anomalously broke. The normal process is to bind mount data in and run the utility against the source directory. Doing this without explanation caused a segfault.
I eventually isolated the most recent good and bad containers after much trial- and-error. I ran the containers side-by-side, with all inputs the same. Sure enough, the bad container failed. The only explanation could be the utility from the vendor. I checksummed both utilities, and they were the same...
An idea occurred to me. This probably isn't statically linked. (Oh man). I ran
ldd
and got the list of shared objects. I checksummed all of them, and there
was a single dependency that was different. What!? How did this happen?
The containers source their shared objects when pulling from a yum repository in Artifactory. But what if that repository has subfolders with identically named and versioned shared object files? That's what ours had, due in part to our vendor and in part to the complexity of the project and lack of configuration management manpower. We concluded it must pull arbitrarily from here when a new container cannot be pulled from cache and instead built.
Refactoring the yum repos to be at the top-level of the Artifactory repo and then isolating the files to what was in the subfolder (a subproject) was our solution. Then we only had to update the repo file in the project that pulled the dependencies.
docker
treats the :latest
tag more than simply a latest tag but as a
:default
in a sense. This defaulting behavior occurs on docker run
(in both
the Python API and on the CLI). My teammate also said it occurs in docker pull
which I have no reason to doubt.
In the final part of our build pipeline, we have a subsystem test which
exercises the container in the local docker container storage in the staging
environment. Our installation process pushed a container with an explicitly
versioned tag. Docker could not find this by default, because, unlike what the
systems engineer expected, it did not use a most closely matching tag, but
failed for not finding :latest
.
My opinion (and so happened to be the elected course of action) that we retag the container latest during install. This may make sense because the staging environment should match production.
Lastly, we had to make sure to percolate all the tagging of our containers through our build pipeline for this project. It has been a growing experience in engineering pipelines.