Ability to contanerize the components of Vespene


#21

Disagree

12 factor is kind of just some random person trying to impose their own particular preferences on everyone. The idea that all config must be in environment variables in particular is a non-starter for anything non trivial config wise (example: plugins config!)

We can easily have a simple pre-docker-image build step.


#22

@mpdehaan it’s worth noting that there will often be differences in docker configuration between dev / prod. Generally the Dockerfiles themselves would be reused in both environments (build args might be used to tweak certain things) but as an example from the PR, most people would be totally fine running Postgres in a container locally with a volume for data persistence but in production use a hosted DB service like AWS RDS and only run their server and worker containers with docker.

I haven’t had a chance to look at the plugin config, but if you need to copy a config file in from the host that’s pretty easy with a COPY directive in the Dockerfile.


#23

You still can have both. Have a config as complex as you want or your code requires, and let env override what changes between environments; most modern frameworks allow it.


#24

Of course and this is not a question of frameworks but one of what I will accept.

I don’t want to maintain two sets of Docker files in the tree.

So what I need is these PRs to have a recursive copy of all settings.d files the user wants when creating the image.

Also it must not use an embedded database as this doesn’t scale out nor does it work well with upgrades of the app code.

If we get to this point we can get something in tree that we can easily collectively maintain.

Further to reduce dual maintaince it should reuse as much of the setup scripts as possible even if changes to those setup scripts are required.

So if it is a for example Ubuntu image (and it must be) call 1_prepare.sh so we do not have two places in the tree where the ubuntu packages are specified.

Interested in supporting this and appreciative of all the work so far but These are my requirements to make this maintainable.

I don’t want a demo configuration in tree that doesn’t support full config or scale out as that sends all of the wrong impressions for this project.

So what needs to happen is there should be a basic python script that uses Jinja2 templates to produce a set of Docker files (or otherwise we pass the location of the sample settings file to copy in however) and then we build them.

I also need a throw-away readme detailing inputs and outputs and how to use them that will eventually be merged into a new docs chapter

We are going to have a wide variety of skill levels and will need to include everything about setup, dependencies, debugging, upgrades of the app code and schema, running a non-container database, and so on.

This wasn’t an immediate priority for me but I am trying to help steer what I would accept in a way that doesn’t frustrate users trying it, deploying it, or figuring out what would be sustainable in prod.

And we absolutely have to reduce the dual maintaince issue so it is basically self maintaining by sharing those scripts. A step that creates the docker files could copy them over if the relative pathing and incompletess of the docker build tools is a problem.

But here I do want to underscore: docker does make everything much more obfuscated - we had strong immutability with AWS and packer and now we have these halfway-broken tooling layers with very incomplete build tools :)


#25

So let me suggest this:

docker/make_dockerfiles.py —embedded_db=true/false —settings-files-from=docker/sample

So we have an expandable script that outputs docker files for all the configurations as well as I guess a compose file

We also need to be able to generate a config for multiple workers where each worker could have a different base image. In this case this is just changing the base image so maybe we have a different shell script to make the worker config file

What I want to do is get it foolproof - having 500 people all making their own unique docker files and having problems doing it seems like it would be a “support” nightmare.

Lots of people will be docker newbies and docker documentation is so very bad, I don’t want them getting frustrated by our assuming too high a level of knowledge about what to put in them.

The existing sample config dir should probably only include a DB config.

Btw we also have to get those configs doing SSL or at least suggest a way of doing SSL at a load balancing level. (Gunicorn does take certs)


#26

Thinking about this a little more - maybe this does work better as an out of tree repo.

I want to enable people to run Vespene this way but it seems like it would be an overload teaching people how to run a three tier topology like this.

For easy demos on an all in one demo machine I can see having a very easy vagrant file or something that maybe just uses the shell scripts.


#27

I think it should come another way.

  1. We need defaults in config files inside Docker container (provided along with Dockerfile in the repo)
  2. We need an ability to redefine any of config parameters during Docker image startup.

So, I offer to keep Dockerfiles in the tree and add config files templates into the repo.
This way we'll get stateless Vespene server and worker images AND database as the central state storage.

Jinja2 looks good. I can write a simple script that will take templates and use environment variables to generate configs during container startup.

Another thought from me is that currently configuration is in Python, and it's bad because any configuration file error (wrong indent, etc.) will make the application fail or result in weird/undebuggable behavior. My offer is to use JSON or, even better, YAML for configuration.


#28

hi @efim-a-efim

Thanks for the continued interest. Let me provide some friendly corrections:

(A) We do not need defaults inside the Docker container. This is already provided by the config/ tree in the main application. Any config setting not supplied in /etc/vespene/settings.d already has a default. (I worry that some folks COULD be trying to containerize Vespene before they understand the codebase and deployment needs - maybe not - but ... this should be something everyone is aware of).

(B) "We need an ability to redefine any of config parameters during Docker image startup" - to do this, you can ship anything you want in /etc/vespene/settings.d that loads any parameter setting from the environment. So this is also already possible.

(C) "and it's bad because any configuration file error" - it's not, because if you ever want to load anything in the environment, the code can do it! JSON can't :) So this argument seems to counteract "B".

This is a long standing Django convention, it's pretty darn useful, and while I would agree some unfamiliar to Django might not like it, it's not going to change - and is easy enough to work around. You may just have to tolerate it :)

I will reiterate what points I think we need here:

  • We need an easy way to ship whatever configuration the user wants in the Dockerfile. This is a pre-build step that creates some files in /etc, and then produces that file. This is mainly for PLUGIN configuration and things that won't need to change at runtime, or can be easily replaced by changing the image.

  • It seems fine to have the name of the worker queue to be served in the worker image from an environment variable, which would avoid having lots of different worker images. Although in reality, hardly any settings.d content will be the same between sites, (triggers especially) and this is the kind of stuff that doesn't make sense to shove into the environment because it's structured data.

  • It seems deeply insecure to pass the secrets.py stuff around in the environment, or worse, bake them into any public image shared in Docker hub. As such, I think there is still a pre-build step that creates PERSONAL dockerfiles (for private registries) and people can easily customize their own docker files without having to write a lot of Docker.

If the idea of a pre-build step that produces docker files and directories to load in during the "Docker build" step is too much, It may be easier to think of the Docker file work as constructing EXAMPLES, coupled with a README that says how to produce them. For instance, the user will need a common secrets file and to make some choices about plugins specific to their deployment of Vespene.

I know Docker has a lot of fans, but I think this underscores some things - it sucks for stateful configuration, and Vespene does have a lot of state (configuration). It's not insurmountable, but I want to make sure all thinking about Docker images isn't about establishing a demo environment, it's about establishing a repeatable, upgradable, production environment.

So, recap:

  • the python configs aren't changing away from python (sorry!) - but I don't think this hurts anything. JSON/YAML syntax errors are still JSON/YAML syntax errors
  • the python configs CAN load environment variables - write code that loads them and drop them into /etc/vespene/settings.d/*.py and you can overide anything!
  • we need an example repo that shows how to build a production grade config complete with custom worker images for both the web nodes and worker nodes, allowing heavy user customization of the config and site-specific secrets
  • parameters like secrets.py and the database password should not be passed around in the environment, as that seems VERY unsafe. Having that kind of stuff in the images would probably reflect badly on the project.
  • this content is probably not suitable for docker hub, but I'm quite open to an easy build step (Makefile target) etc that produces a site specific set of images so users don't have to do all the work of writing their own configurations
  • If people want to see this in tree, I STILL want the docker files to use as much of the setup/* scripts as possible, to avoid dual maintainance. This doesn't have to be the configuration steps at all, but I mean for things like installing apt packages, so the package list is in ONE place in the main repo. I am willing to bend on this a little, but I think attempts haven't been made to make the config scripts responsive to environment variables such that the docker builds can call them and reuse partial functionality. That would be interesting and keeps the Docker content easier to maintain and not a "fork" as it were of the install instructions. I agree it's not going to be 100% common, but things like the package list and repo setup and so on wouldn't need to be repeated. (Clearly one wouldn't run the systemd parts)

If work goes down those lines, I'll support it as best I can. I still think it would be easiest to let it evolve out of tree and then we can merge it back in, in something like a "docker" subdirectory, if we get it into a reasonable state.

I say "we" here despite clearly not doing the work, but I'm mostly trying to suggest what we need for security and configurability in production, and looking towards what it would take to adopt this in tree as something I would be comfortable with maintaining.

As while volunteer efforts to help maintain something do exist, in practice, any code you take is kind of like taking on a free puppy, so I need to make sure those bases are covered.

I'm still excited about this effort and I want to see it successful.

Mostly make sense?

Thank you all once again for the excitement about this - it is really cool to see.

Questions? Any points/concerns maybe not understood?


#29

Hello hello,

I agree with everything you state besides one point:

Frankly, having an official image will help people play with vespene quickly, and its not hard to do IMHO. I have not installed software on a machine in ages it feels like. Need a database? Run a docker, mount a volume to persist, same for memcache, webservers, IRC, jupyter notebooks, hell I'll even run ipython in a container. Just helps keep the kruft off of the machine.

As for the secrets.py... just have people mount that as a volume?


#30

That sounds like that might work out pretty well with /etc/vespene being a volume. I didn't think of that. Sounds good!

We'd need instructions about how to get that right. If you want to try doing that and sending that over with some foolproof readme so somebody who knows how to run one docker container can still get it going successfully that would be pretty cool!

Maybe if you wanted to start like a docker_setup.rst? If not, I can easily adapt something as I personally try it out. Something that uses I guess compose locally would be nice.

I suspect they'll just need to make sure every host has the same config files (plugins and secrets), except for worker specific configuration options, and we want to make sure we pick decent defaults and remind people they are going to want some way to do shared storage for persistent buildroots.

We also need a seperate docker file for a basic ubuntu worker or something, and I agree the worker name is probably just something that can be in the environment, since we won't need supervisor.

Cool!