Setup questions?


#1

The first thing probably everyone wants to go is get Vespene running.

The docs contain a lot about this - but I'm curious about your experiences. Let us know how it goes and if you get stuck hopefully I can help you through any problems and we can also improve those scripts and instructions!

Thanks and glad to have you here!


#2

Well after fixing all the minor things I got through the entire setup but no logs and its not running on port 8000, any suggestions on where to look or how I can bring it up manually to see if there are any errors?

So far really nice though with how you have abstracted away all the django stuff needed to get it going.


#3

Hmm...

A few thoughts and things to check:

Check to see if vespene.service is running (systemd status vespene.service).

Check the logs in /var/log/vespene

See if there is a gunicorn process running.

If any of those look bad, you can manually run the service with supervisor -n -c /etc/vespene/supervisord.conf and observe the runtime output

If that seems ok there is the systemd log output from the service, though supervisor is mostly sending that to /var/log/vespene

Let me know what you can figure out and if I need to change anything I'll get on that ASAP.

Also, can you share what OS are you running on?


#4

Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

I'll check through a few things and let you know my findings


#5

So can't get the systemd status command to show me anything:

systemd status vespene.service gives Excess arguments.

No logs in /var/log/vespene

Don't see anything running with ps waux | grep wsgi

The /etc/vespene folder doesn't have a supervisor file which might be the issue here:

drwxr-xr-x   3 root root 4.0K Oct 29 08:58 .
drwxr-xr-x 164 root root  12K Oct 29 08:58 ..
drwxr-xr-x   2 root root 4.0K Oct 29 09:38 settings.d

So when I run that part of the setup this is what I see:

generating supervisor config...
usage: manage.py generate_supervisor [-h] [--path PATH] [--workers WORKERS]
                                     [--executable EXECUTABLE]
                                     [--source SOURCE] [--version]
                                     [-v {0,1,2,3}] [--settings SETTINGS]
                                     [--pythonpath PYTHONPATH] [--traceback]
                                     [--no-color]
manage.py generate_supervisor: error: unrecognized arguments: --controller
creating init script...
starting the service...
Vespene is now running on port 8000 and also running workers: tutorial-pool=1

Let me know what else you want me to check out.


#6

Ah dang it, yeah the --controller argument should definitely not being passed there. Can you try without it?

What happened was I added the fileserving code that eventually decided to run the web application on all worker nodes, and in doing so I forgot to go back and update the setup script to remove --controller.

If you remove that it should be solid, and I've already committed the change and am going to re-run those tests myself again :)


#7

So the service installed and the logs are showing up but can't get page to load, let me check the logs.


#8

Here is what we are working with:

$ tail /var/log/vespene/web.log 
supervisor: couldn't chdir to None: ENOENT
supervisor: child process was not spawned
supervisor: couldn't chdir to None: ENOENT
supervisor: child process was not spawned
supervisor: couldn't chdir to None: ENOENT
supervisor: child process was not spawned
supervisor: couldn't chdir to None: ENOENT
supervisor: child process was not spawned

$ tail /var/log/vespene/supervisord.log 
2018-10-29 10:17:40,241 INFO spawned: 'worker_tutorial-pool0' with pid 4894
2018-10-29 10:17:40,242 INFO spawned: 'server' with pid 4895
2018-10-29 10:17:40,243 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
2018-10-29 10:17:40,244 INFO exited: server (exit status 127; not expected)
2018-10-29 10:17:43,248 INFO spawned: 'worker_tutorial-pool0' with pid 4898
2018-10-29 10:17:43,249 INFO spawned: 'server' with pid 4899
2018-10-29 10:17:43,250 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
2018-10-29 10:17:43,251 INFO gave up: worker_tutorial-pool0 entered FATAL state, too many start retries too quickly
2018-10-29 10:17:43,251 INFO exited: server (exit status 127; not expected)
2018-10-29 10:17:44,253 INFO gave up: server entered FATAL state, too many start retries too quickly
$ systemctl status vespene
● vespene.service - Vespene Services
   Loaded: loaded (/etc/systemd/system/vespene.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-10-29 10:17:36 EDT; 7min ago
     Docs: http://vespene.io
 Main PID: 4840 (supervisord)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/vespene.service
           └─4840 /usr/bin/python /usr/bin/supervisord -n -c /etc/vespene/supervisord.conf

Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,241 INFO spawned: 'worker_tutorial-pool0' with pid 4894
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,242 INFO spawned: 'server' with pid 4895
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,243 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,244 INFO exited: server (exit status 127; not expected)
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,248 INFO spawned: 'worker_tutorial-pool0' with pid 4898
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,249 INFO spawned: 'server' with pid 4899
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,250 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,251 INFO gave up: worker_tutorial-pool0 entered FATAL state, too many start retries too quickly
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,251 INFO exited: server (exit status 127; not expected)
Oct 29 10:17:44 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:44,253 INFO gave up: server entered FATAL state, too many start retries too quickly

#9

The chdir thing looks like it is the fatal part.

Can you paste your /etc/vespene/supervisord.conf ?

If that is NOT it, there may be some additional traceback info in /var/log/vespene


#10

Ah, there are literal "directory=None" lines in /etc/vespene/supervisord.conf. That's bad. Looking into it :)

Should say

"directory=/opt/vespene"


#11

Ok I've fixed this upstream by adding a "--source /opt/vespene" to the 6_services.sh shell script and can confirm the install process on Bionic Beaver works for me.

This would have also affected CentOS 7 / other users

Thanks very much for pointing this out!


#12

Tried going through it again after updating from the remote but still same issues, is there anything special I should do to kind of wipe all this out and start over?

● vespene.service - Vespene Services
   Loaded: loaded (/etc/systemd/system/vespene.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-10-29 10:17:36 EDT; 1h 39min ago
     Docs: http://vespene.io
 Main PID: 4840 (supervisord)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/vespene.service
           └─4840 /usr/bin/python /usr/bin/supervisord -n -c /etc/vespene/supervisord.conf

Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,241 INFO spawned: 'worker_tutorial-pool0' with pid 4894
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,242 INFO spawned: 'server' with pid 4895
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,243 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
Oct 29 10:17:40 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:40,244 INFO exited: server (exit status 127; not expected)
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,248 INFO spawned: 'worker_tutorial-pool0' with pid 4898
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,249 INFO spawned: 'server' with pid 4899
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,250 INFO exited: worker_tutorial-pool0 (exit status 127; not expected)
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,251 INFO gave up: worker_tutorial-pool0 entered FATAL state, too many start retries too quickly
Oct 29 10:17:43 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:43,251 INFO exited: server (exit status 127; not expected)
Oct 29 10:17:44 btotharye-workstation supervisord[4840]: 2018-10-29 10:17:44,253 INFO gave up: server entered FATAL state, too many start retries too quickly

#13

Ok so the weird thing is the setup process does a checkout into /opt/vespene, so if you updated where the setup scripts live in /home, you ALSO have to run a git update on /opt/vespene, or you are still running the copy that omits --source /opt/vespene in the 6_services.sh script.

But if you want to start over (you shouldn't need to), I'd just use a clean VM.


#14

ha well I was trying this on my local machine using a venv doh! guess I'll have to clear out this service and stuff.


#15

Ah if you haven't already seen http://docs.vespene.io/development_setup.html - that's my notes on what I do in my virtualenv

For development testing, I usually don't run supervisor - I run

"ssh-agent python manage.py worker "

and

"python manage.py runserver"

Runserver isn't quite the gunicorn environment but it's basically the same, and if I want to run multiple workers you could use multiple console tabs.


#16

Would a docker compose config simplify the initial "just get it running" phase?. I am massively excited by the prospect of this project but inexperienced with python in comparison to other languages so the comfort of a one line command for a guaranteed working setup certainly appeals to me.


#17

Yeah we have another thread in the Ideas section right now talking about making some images for those that want to run Docker.

Right now you have the 6 bash scripts and if you run those on either Centos 7 or Ubuntu Bionic Beaver I'm pretty sure you WILL get that running system.

I'm hoping when we do have docker images (which I expect would probably be soon - they can call the existing setup scripts in their build steps) we have a really good guide that also walks people through what that full lifecycle of using that would entail. (External DB of course...). I just want to make sure it is really easy for people to bake their chose of settings into those docker builds, rather than just have a one-size fits all demo deployment.


#18

Fair point thanks, I will follow that thread!


#19

Hi, I'm more or less clueless on python and django, and had some trouble getting the setup scripts to work on a vm. It turned out that gunicorn was binding to the loopback interface, which I fixed by updating my supervisord.conf to run gunicorn --bind 0.0.0.0:8000 vespene.wsgi

I just thought I'd add it here in case anyone else ran in to the same problem. Maybe this can be added to the settings somewhere?


#20

+1

I wrote this script originally assuming that folks would eventually proxy it with Apache or NGINX, but not everybody knows how to do that immediately and it might slow down people's attempts to try it out. I think this is a good start and what we want to do soon is make that the default, but make it a configurable option in 0_common.sh for people who want to turn that off.

so I'm thinking I'll maybe add a setting in 0_common.sh like $GUNICORN_BIND_OPTS that defaults to what you have.

Thanks very much for the suggestion - that's great and I've pushed it now.

Let me know how you get along with everything else as I totally want to make this clear as we can get!