General package module thoughts: apt-fast, multiple packages


#1

One of my pet peeves is slow downloads for packages. This project, apt-fast, seems like it could be a good fit for opsmop. With the use of apt-fast, package downloads on Ubuntu could be parallelized. The cost is that some new dependencies would be required. Alternatively, opsmop could roll its own equivalent parallel package downloader.

edit: This post originally started with me talking about apt-fast, but has morphed into a more general discussion about the package module


#2

We could perhaps write the apt module such that it would use apt-fast if it were installed.

I'd need to try apt-fast out to see what I think of it but it's got pretty good github star counts.

As for parallization, there is something historical worth mentioning - the package modules in ansible have taken lists for some very long amount of time, I initially implemented this hack for yum, where if you supply a list it will put them together on one command line.

I think before we do this for apt, we need to first do this for both apt and yum and dnf, where they can tolerate "self.name" being a list. Not fun work, to be sure, but ... should be done before we get too many package providers and I predicted it would have to be done anyway.

ansible was a little weird in that (IIRC) the version was part of the name (it would be weird in opsmop to have a parallel array of versions and I hate parallel arrays so much that's not going to happen, which I don't know if we need to do here at all, because if you are doing it that way I'd assume you are fine with getting the latest version, which is always supposed to be compatible anyway.

Once we get that done, we could implement apt-fast by writing a method "get_apt_command" I'm sure, and it could call shutil.which and return the different command if apt-fast was installed.

How does that sound?

Also, apologies for saying "first we have to do this other complicated thing ..."


#3

Opened a ticket to track the list idea so we don't forget:


#4

Simply checking if apt-fast is installed sounds like the better way to do things to me!

On the topic of handling multiple packages. What are your thoughts about having a list of tuples?

For example,

[('gcc', '7.3.0'), ('ruby', '2.5.1p57'), ...]

One thing, to me, that's crazy is how FreeBSD ports allows for customized compile options. I'm not sure if supporting that would be outside the scope of opsmop, but it is food for thought that installing a package can be more than the name of the package and version. In FreeBSD ports it would potentially include compile options.


#5

That's a good idea...

Instead of a list of tuples, it may make more sense to provide a dictionary.

dict(gcc='7.3.0', ruby='2.5.1p57', ...)

I don't think we want to deal with any compile options for packages at the generic level, though if someone wants to do that for the ports module I am ok with that later.

I would be fine with requiring that to be one package at a time for ports if they want to have parameters that aren't easy to designate together. Free BSD users have been GREAT contributors to my stuff in the past, but I wouldn't want to make that complicate the data entry for 99% of the other users out there.

So it sounds like it needs to accept three things:

a string

a dict with versions, and some versions could be None, saying just whatever

a list of strings

Do to this effectively, I'd probably propose making a PackageList class that the provider can use to parse this, to avoid too much duplication between the package modules.

The code paths will need to become more complicated, adapting into loops. I am ok with it still saying "needs_install" and then storing the list of packages that actually need to be installed on the provider object so they don't have to be calculated again.

We don't need to have stuff like "needs_install_%s" % pkg_name as this would mess up the output in the app when returning too many actions.

Not sure if this is too "into the weeds" or not, so let me know.

I'm more than happy to take a first crack at this for probably apt and/or yum and let someone else apt dnf. I think my preferences may be a little specific. once that is done, it will be easy to add the apt-fast change.

Plus, we need to get this done anyway before moving on to pip.

Shouldn't be too bad...

thoughts?


#6

Overall it sounds good! I really wish I could give more nuanced feedback. But, my guess is that storing the package list as key-value pairs is the smarter way because it's more extensible. The value could be an object with various parameters needed by the package manager. Another thing I'm thinking about is when you need a dependency from 3rd party source. In other words, you need a package / dependency that's not provided by the package manager. Would handling that be part of the package module?

Also, it would be good to see an example of an implemented wrapper for one of the package manager systems. Once I see that I'd be willing to take a stab at implementing one for npm, apt, or maybe snapd.


#7

"In other words, you need a package / dependency that's not provided by the package manager. Would handling that be part of the package module?"

I expect modules to arise for things like npm and gem, after I do pip.

For things that are just tarballs on the intertubes, that from_url feature on the File module you added is a great way to download them.

Ansible had an "unachive" module, which probably makes good sense to have too, which could support your from_url code, to both fetch a remote source (possibly remote, anyway) and then crack that open in a given directory.

HOWEVER, I should put in a disclaimer - having a CI/CD system hammering someone's external server is generally bad form, especially if it is a small webserver hosting a tarball.

It is better to cache that stuff or get it from a package mirror when you can.

For corporate settings, this is why tools like Artifactory exist.

So yeah, open to all kinds of package modules.


#8

That makes sense! Do you know of any equivalents to Artifactory or archiva built in Python? I think one issue with the existing projects is that their codebases seem pretty inacessible, and you end up with the problem of having to pay maintainers for something that honestly shouldn't be so complex. I think a hackable and more modern package manager (with potential for private packages) would be very useful.


#9

Ah, but repo management and caching and dependency trees IS complex :) And everything works differently so there is a lot to maintain.

reposync
apt-mirror
local-mirror for npm (haven't tried this)
pulp from Red Hat - looks a little complicated, heard decent things though
squid caches (for some things)

Thankfully stuff I haven't had to worry about often.

Cobbler (install server I wrote ~12 years ago) did a lot with yum reposync for maintaining yum mirrors for install trees. That was pretty easy to set up.

I'm just not familiar with mirroring for much other than that and apt. I honestly think we just had a cache when I did apt, but it's been a while and I forget.

Generally not a problem I'd worry about for any side projects, home setups, or very small companies, but something you need to think about as you grow.


#10

I'll have to investigate into pulp a bit more. It's all done in Django which surprises me a lot!