mgjm

gsoc-2023

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

11 KiB

Raw Permalink Blame History Unescape Escape

Google Summer of Code 2023:
Conmon-rs support for podman

Running containers with podman using conmon-rs as the container monitor. This requires additional features to be implemented in conmon-rs and an extension of podman to use the new conmon-rs go client as an alternative to the existing conmon binary executable.

We set the following milestones at the start of the project:

Basic Container Functions: All basic features of the container lifecycle implemented using conmon-rs as the container monitor. (create, start, stop and delete)
Attach to a Container: The attach feature allows someone to connect to the IO streams of a running container. Conmon-rs keeps these pipes open and forwards data between the container, the logs and active attach sessions.
Exec in a Container: The exec feature allows someone to execute another program inside of an already running container.

Progress

Step 1: Basic Container Functions

I started by adding a new implementation of the OCIRuntime interface to podman. The ConmonRSOCIRuntime is the new alternative to the existing ConmonOCIRuntime. Since this was the first alternative implementation of the OCIRuntime interface, I needed to add a way to select the implementation. As a first solution (suitable for testing and gathering further feedback), I added the option to prefix the engine.runtime option in containers.conf with a conmon-rs: prefix to pick the new implementation. Other options where discussed and postponed.

Some of the functions of the OCIRuntime interface directly communicate with the underlaying OCI runtime process, these ware taken over from the existing implementation. But other functions invoke the container monitor and needed to be rewritten from scratch to use the shared RPC interface of conmon-rs instead of the command line interface of conmon. I started with a MVP that just barely created a container (without logging and a lot of other features) and then extended the implementation to match the feature set of the existing implementation. In the end all options are either correctly handled by the new implementation or return an error because some options are inherently not possible with conmon-rs.

During the implementation I found some bugs in conmon-rs and added new features to support podman (features not needed by CRI-O and absent in conmon-rs):

Mention that the go client reuses a running server: I searched quite a while until I found out that the existing code already checks whether a conmon-rs instance is already running. This adds a comment to document that behaviour so others don't have to spend that much time.
Remove redundant serialization of OOMExitPaths: After an old merge the codebase contained a line of code twice (without any negative effect).
Update makefile update-proto: The make rule to generate code based on the Cap’n Proto files failed in my environment and removed any uncommitted changes to the Cap’n Proto file. This made the build rule more robust and uses the same dependency version from the go.mod file instead of latest.
Use capnp map for tracing metadata: The RPC interface contained a metadata field to transport tracing metadata to the server. This field was a string field, that contained a JSON serialised map. I replaced that field by a "native" Cap’n Proto map in a backwards compatible way.
Create container with environment variables: podman needs to pass environment variables to the container.
Remove unnecessary SetRequest calls: The encoding of RPC requests had some unnecessary function calls that were already implicitly done by the RPC library.
Add per command cgroup manager option: conmon-rs has an startup option to specify whether to use systemd cgroups or cgroupfs directly. podman wants to set this option for each container. As conmon-rs uses this option only in a container specific context, there was no need to specify this option globally at startup. This adds a backwards compatible way to opt-in to the new per container option.
Fix forget_grandchild retain condition: The forget_grandchild was forgetting all other processes except for the one it was supposed to forget. The retain condition was just flipped.
Add support for file descriptor passing: podman wants to pass user supplied file descriptors to the container and keep some (networking related) file descriptors open as long as the container is running. The existing RPC socket was not easy to use for file descriptor passing, as the socket was managed by the RPC library internally. I added a new socket specifically for file descriptor passing with a custom protocol.
Change resize channel to receive only: A go channel used to receive resize events was marked as a regular channel type (send and receive), even though it was never used to send events.
Fix early stdio close due to multiple ownership: In the container IO implementation a file descriptor was used as a raw file descriptor (just the number) and multiple code paths thought they own that file descriptor and closed it at the end. I refactored the code to use owning types to delegate the ownership management to the rust type system.

The first milestone was the biggest milestone as I needed to make myself familiar with the details of the different APIs (podman, conmon, conmon-rs, OCI Runtime Spec), add a new implementation to podman (which was the first alternative implementation) and extend conmon-rs with complex features (e.g. file descriptor passing).

Step 2: Attach to a Container

The go client of conmon-rs already contained an API to attach to a running container. It was relatively straight forward to use that API to implement attach in podman. It just didn't work as expected.

Depending on the container configuration (--tty and --interactive flags) streams closed after the first byte, after the input closed or not at all. These kind of problems are harder to diagnose than a compile time error or runtime panic. But in the end I found all bugs and fixed them.

Some of the PR's mentioned in "Step 1" where the result of this. Others were still on the way to be merged during this time.

Step 3: Exec in a Container

The RPC interface already contains methods to execute a command in a container. But this only supports sync execution. The RPC command waits for the executed program to exit and returns the IO streams and exit code. Podman needs a way to create long running exec sessions and attach to them. The attach RPC method already contains an execSessionId to support that use case, but no handling of that parameter exists yet.

I was not able to complete this step in time. The plan for this step was to add a startExecSession command to start a new exec session. The command would have similar options to the createContainer command and share a lot of the code and infrastructure (container IO, child process management). And the attach command would need to handle the execSessionId to attach to an exec session instead of the container session.

Current State

I created pull requests for the changes required in conmon-rs as they emerged. And as they where self contained and manageable, they got merged as we went along.

My changes to podman where created in my fork of podman. Since there is no way to split these changes into smaller parts, they where planned to be merged at the end in one big pull request. But because I was not able to finish "Step 3", that didn't happen yet. These changes are publicly available and can be picked up by me or anyone else in the future.

Future Plans

The exec implementation needs to be implemented, as outlined in "Step 3".

Once that is completed, a pull request can be created to merge these changes into podman. I expect there to be some bugs that we will find once the PR is created. These would need to be fixed before the PR can be further tested and merged.

The CI system of podman would need to be gradually extended to properly test the new implementation.

We would need to document the current way or add one of the discussed other ways to select the new implementation.

During the project I noticed some ways to improve the conmon-rs server. Currently the project uses the multi-threaded tokio runtime and creates a few blocking threads. After careful review of the complete codebase and the elimination of blocking code and deadlocks it should be possible to switch to the single-threaded tokio runtime and remove most of the blocking threads. I plan to contribute these changes in the future.

Lessons Learned

The process of merging a PR into conmon-rs was quite fast, between minutes for small fixes or cleanup, up to a few weeks for big PRs with requested changes. I know other projects have way more open PRs and can take a lot longer. But even this fast merge speed of conmon-rs was enough to be noticeable once PRs depended on one another. It took some work to keep a follow up PR up to date, or only create one, once the first PR was merged and keep track of these changes in my fork at the same time. This can take some speed out of the project, since progressing with the main goal would increase that work even more. In the future I will be able to plan for this extra work, or optimise the steps or the timeline to reduce the amount of interdependent pull requests.

Work life balance is hard. Especially if you work from home. I was not able to properly separate times for GSoC and for personal projects. This meant that whenever there was something to do and it fitted, I kept on working. In the spare time I only had time for smaller personal projects, which where easy to interrupt. And I have postponed larger personal projects until "after GSoC".

This led to a certain dissatisfaction, since the GSoC project was "preventing" me from doing some projects that were time-bound (e.g. need good summer weather). In the last few weeks of GSoC I started the first big personal project (to use at least the end of the summer). But once I started, I could no longer find the motivation and time to work on the GSoC project. I personally couldn't even attend our weekly meetings. The pressure I put on myself in terms of the amount of work I wanted to do was just too much in the end.

I didn't realise that "dissatisfaction" during the project, until it was too late. In the future, I will hopefully be able to better recognise such a problem, communicate it and find solutions early on. It just felt wrong not to respond to a PR comment as soon as I had time. But in the future, I have to stick to fixed work times, even if I work from home.

I plan to continue with the project in the future. But this time the GSoC project will be the project that can be interrupted. This will not be as fast as the progress during the project. But a more sustainable and healthy speed will lead to the completion of the project.

11 KiB Raw Permalink Blame History Unescape Escape

Google Summer of Code 2023:Conmon-rs support for podman