The voyage to OpsMop push mode - development news


#1

Hi everyone!

I've been working on OpsMop push mode this week and things have been going GREAT.

As I noted previously, OpsMop no longer has a bin/opsmop and bin/opsmop-push script, the policy files are now executable.

You execute a local policy like this:

foo.py --local --apply

You execute a remote policy in push (SSH) mode like this:

foo.py --push --apply

Now, the best example for push mode is here right now:

To run this, you just do:

cd opsmop-demo/content/
vim inventory/inventory.toml
python3 push_demo.py --push --apply

I've written some docs which I haven't pushed yet that explain this a lot:

But there are a few things that aren't really even documented there yet.

There's a new UserDefaults class:

This allows loading a VERY wide variety of SSH, sudo, tuning, and logging preferences from an optional file in ~/.opsmop/defaults.toml or /etc/opsmop/defaults.toml, whichever is found first.

Here is where you would set your default login name if it wasn't going to be defined in the policy file.

In the Role, there are several new methods that are not used when executing it locally, but it's important to remember any role that is remoteable CAN still be executed locally:

  • def serial() - returns an integer of how many hosts in a role to complete as a group. The default right now is 80. You could set it to 5. We don't yet do anything like progressive canary deployments, but it's quite possible to add later.

  • def inventory(self) - returns a subset of an inventory. As explained in https://github.com/opsmop/opsmop/blob/master/docs/source/push.rst the only inventory we have in the tree right now is a basic TOML inventory - though I suspect the first contribution (this can happen NOW btw, nothing is holding it back) to be one for AWS using boto3. Inventory classes are super easy to write and only require returning a nested dictionary structure.

  • def ssh_as(self) - returns a tuple of (username, pass) where pass may be None - this says what to login as. I am a strong believer in not using passwords, ever, so I will probably make this so that if you return just a string (username) it doesn't need you to return (username, None)

  • def sudo_as(self) - similarly, returns a list of accounts to sudo to, and the sudo password required (if required).

  • def sudo(self) - return True if you need to sudo at all.

For users familiar with ansible, these above concepts should be pretty familiar, but they are consolidated and made a bit more programatical. You can easily see here how something like getting a password could come from ANY external data source, because it's a program.

Now, these set the settings for the whole role.

Any value that is None - like a password, or a login user, doesn't have to come from the role. A common use cases is that multiple members of a team share the same config management content and need to connect as them before sudoing.

To do this, each should create a ~/.opsmop/defaults.toml

[ssh]
username=bob

Easy enough, that machine will login as Bob.

I didn't do any automation to assume the current username as a deafult, but I suppose we COULD add that.

There are also a lot of magic variables, which I've mostly documented in the https://github.com/opsmop/opsmop/blob/master/docs/source/push.rst file, which will appear in docs when I am ready to share this more widely.

This is a LOT to take in and a lot of new stuff, so please ask any questions you may have, or share any ideas/wants...


#2

Implementation is being done with https://readthedocs.org/projects/mitogen/ which so far has been excellent.

Connection attempts are asynchronous via "concurrent.futures", which David also recommended, and then all events are asynchronous as well.

What happens is that we are actually transferring the code to run the roles to the remote machine, and then it emits dictionary events, which are pickled, and then translated into local callbacks.


#3

When you run in push mode, because the events are very asynchronous, it doesn't make a lot of sense to show as much detail as in local mode.

However, I've fixed that too, by adding logging.

It's configured by default but in defaults.toml, you can specify the location:

[log]
path=~/.opsmop/opsmop.log

This log contains the full output, as if something was executed remotely. There's all kinds of output, including what commands were run.


#4

What's left? Well this doesn't support file transfer yet, but David mentioned this is available, so I'm going to explore it NEXT

I also need to improve the quality of the output in push mode, as right now it's not formatted the best.

Mitogen also supports some AWESOME nested bastion host functionality, suitable for both long-distance or multi-cloud communications, or efficient transport through top-of-rack relays and that sort of thing. This is really cool, and while I've implemented support for this via the "opsmop_via" magic variable in inventory, I need to test it out to make sure that code is correct.


#5

So, in summary:

Push mode WORKS

Pushes happen where one role is executed asynchornously on all hosts, so it doesn't have to wait for one task before the next task starts on each host. This should make it exceptionally fast and save a lot of time if different tasks become the "long pull" on various hosts.

With my history with ansible what normally happened was if you had 200 tasks and the hosts were overtaxed, one host out of 200 might take a really long time, randomly, and the result was it slowed down everything. To fix this, I came up with the idea of the "free" strategy, though this is much cleaner organized here, and has significantly less round trip calls and file transfers.

So, I'm going to work on enabling file transfer - and at this point, when that works, we can push docs and PUSH MODE IS FULLY WORKING. And then I can definitely use some help with tweaking, performance testing, and all kinds of stuff.

But feedback is definitely possible now, and if anybody is interested in internals or has questions, let me know what your questions are.


#6

New code mostly lives in "opsmop.push.*", but you can also see changes to "opsmop.core.executor".

if you want to see how all this works, start with those files.


#7

#8

Here is the new inventory package. If folks would like to have a go at something like AWS inventory (or another cloud provider), I'd be glad to help out with feedback on pull requests - again this is stable enough to go at now

The "filter" method in the "Inventory" base class is how we tell the roles what groups to target.

You can see this used here:

It would work exactly the same for other inventory providers, and is implemented in the base class so it "just works". It uses fnmatch patterns just like you are used to in opsmop host patterns.


#9

Tuning right now is thorugh the serial method - which will limit the number of hosts being configured at once for the role, and also the max_workers parameter in UserDefaults - which determines how many hosts we try to open SSH connections to at once.

If you set serial() too high, it is probably possible to overwhelm your system, but the defaults should be completely reasonable.

We're only connecting to 8 hosts by default at a time for SSH, which is more or less based on CPU capabilities - but that number can probably be increased depending on the capabilities of the machine you are running from.

I'd definitely be interested in people with good setups for performance testing trying this out, as right now I just have the basics done to prove it works and haven't really accessed tuning it a lot yet.

Once again, thanks to David Wilson for tons of help on mitogen Q&A and making what looks to be a REALLY cool library.

https://readthedocs.org/projects/mitogen/

Thanks also go out to "dill" - an awesome python serialization library, and the core stdlib "concurrent.futures" modules, both of which are really interesting.


https://docs.python.org/3/library/concurrent.futures.html


#10

Ok that was a ton of info! Questions? Thoughts? Ideas? Anybody have inventory questions or want to write some plugins or help with performance testing?

Fire away!


#11

CRICKET

C'mon ya'll this is good stuff :)

I'm currently working on making this be able to transfer files and it will be more or less fully operational - subject to CLI output improvements and performance tests and stuff like that.


#12

Just pushed some updates where it's clear templates transfer over the wire. It's really fast, I need to work out some minor bugs about transferring other files, but this is easy to do.

After that I need to improve the output of the push mode, get the docs polished up a bit, and there should be some nice reading material and something for you to demo this weekend!


#13

Here's a proposal for implementing what extensions the fileserver can serve up in push mode.

The fileserver will by default only serve up files in subdirectories of the policy file, but can be extended by adding a method to the role to add additional paths.


#14

Things are operational on master now but I really need to get docs in better shape before people try this out, and may try to iterate on output a little bit more and cleanup some of the more recent code idioms supporting push.

But ... works and is really awesome, and I think you're going to really like it.

This will be done this weekend, look for a post here and a blog post and docs to come!


#15

Made some AWESOME progress today on the output for push mode.

I have some ideas for also improving local mode output (colorization options, count of changes) but since I've been focusing on push mode I want to hold off until I get the docs done and get this in everyone's hands before doing that. I opened tickets to improve local mode output to cover those ideas.

I will probably release with one slight bug - the file server moves files even if they don't need to be changed. I have a lot of ideas in mind for this, some are more involved than others, but for now I'd just look at as the output reporting a few more things changed than you might expect. I know people like to see "changed=0" so of course, fixing this after Saturday is going to be one or the more top things on my opsmop list - output and knowing that a host is complaint is important.

I also need to improve check mode output in push mode (dry run) and have mostly been concentrating on "apply" mode. Nonetheless you'll have access to everything and these are easy changes.

The list of possible improvements and cool ideas is starting to build up in my head quickly, in many cases implementing ideas that all systems should have, but usually putting a nice spin on it and making it simpler.

Today I also implemented support for a feature that will enable rolling updates with load balancers (some assembly required) and yesterday I also implemented support for control over the "serial" batch size - how many nodes to update and/or take out of a load balancer at one time.

Good stuff!