CentOS 7 - Domain Server nodes terminate after a short period of operation (Unresolved)


#1

Hi folks,

We’re kind of at a junction at the moment with an issue we’re up against. This issue is new, and the only change to our dedicated server is the updated repo/build of the stack.

Issue: We launch the domain, all is well, we can connect, walk around for a few minutes and then it disconnects.

The control panel HTTP interface still works, however all the nodes terminate and we have to soft-restart the domain server for it start working again.

Despite our best efforts (We’ve terminated/relaunched the stack, done a hard restart on the server itself), we can’t get to the bottom of this.

[22:47:12] No match for assignment deployed with “9ac7e0bc-b317-4df0-aaf6-9c6707d70806”
[22:47:13] Refusing connection from node at 127.0.0.1:48465
[22:47:27] Unable to fulfill assignment request of type 7 from 127.0.0.1:52179 (ports change)
[22:47:49] Killed “Asset Server” (A) {ea1da53f-4aa7-4613-af73-0797234f5467} 94.101.38.62:44059 / 94.101.38.62:44059
[22:47:49] Reset UUID for assignment - UUID: {9cf3b02a-d76c-40c0-99a6-93dca3939832}, Type: 3 - and added to queue. Old UUID was “6c3e0862-2449-49ed-823e-a28f1f778031”
[22:47:55] Killed “Audio Mixer” (M) {7a8c135f-5c21-45c5-8312-a9f718d50ffe} 94.101.38.62:59264 / 94.101.38.62:59264
[10/20 23:40:29] [DEBUG] Reset UUID for assignment - UUID: {6a0c7542-de30-4c5c-ba0d-8545c55aaebd}, Type: 0 - and added to queue. Old UUID was “f3735120-79c3-412e-bdb0-b266b08d18e0”
[22:48:03] Killed “Agent” (I) {521da1c6-ebbf-4d9e-8829-fe0012914fe8} 212.159.21.42:51125 / 192.168.1.149:51125
[10/20 23:40:29] [DEBUG] Killed “Entity Server” (o) {4b379d94-4613-44d5-a450-4e8d2494fc32} 94.101.38.62:33641 / 94.101.38.62:33641
[10/20 23:40:29] [DEBUG] Reset UUID for assignment - UUID: {9a481f66-5ce4-4108-b6e5-a53d0a261ea9}, Type: 6 - and added to queue. Old UUID was “43355f7b-e0ba-4b54-9e05-1057ac6b2446”
[10/20 23:40:29] [DEBUG] Killed “Avatar Mixer” (W) {5d4ee398-e0c0-4981-af59-2d034b7d5562} 94.101.38.62:60826 / 94.101.38.62:60826
[10/20 23:40:29] [DEBUG] Reset UUID for assignment - UUID: {195beea5-74bb-4229-8fea-1a367b590b66}, Type: 1 - and added to queue. Old UUID was “650a7b3d-ed0d-4325-b61c-0f9426bf9b13”
[10/20 23:40:29] [DEBUG] Killed “Agent” (I) {75b89b29-58f9-4392-a205-153c074554a1} 80.6.189.35:65059 / 192.168.0.11:65059

Is there something we need to be aware of with the recent updates? Nothing else is running on the server currently for their to be any port conflicts.

(I should mention, the server was working absolutely fine previously when we were stress testing)

Kind regards,

Micah


#2

Windows, Linux or Mac?


#3

Environment is Linux (CentOS 7)

Full spec of our test server is here: https://www.indigofuzz.co.uk/hifi/server.php


#4

You might try running the assignment clients one at a time, specifying what they are for. Several of us (using linux) had crashes running with the -n argument.

Here are my assigment client commands:

nohup ./assignment-client -t 0 &> ac0.log&
nohup ./assignment-client -t 1 &> ac1.log&
nohup ./assignment-client -t 6 &> ac6.log&
nohup ./assignment-client -n 3 &> ac_misc.log&

The last one is just a few extras.


#5

Thank you,

We’ll give it a go to see what we can find. (@Ronnie could you have a look?)

Regards,

Micah.


#6

this yielded the same result, the web ui was from what i can tell from this command a clean slate, though without even connecting to it, the server seemed to die within a matter of seconds …

For sanity i put the terminal output on a paste bin here
http://pastebin.com/ZgR5NVSE

*EDIT: Late at night i realise running as root is not the best idea. for now disregard this, it hasn’t crashed as of me writing this so far, will keep you updated if this changes.

EDIT2: Never mind. I spoke too soon.


#7

Nope :(, Still seems to be crashing after a short period of time.


#8

@Ronnie @Micah , you have probably installed QT5.5 (which is not supported by hifi yet iirc) or the new GCC to compile it, I had the same issue as you. I decided to move to Ubuntu after trying to debug this for a long time, since High Fidelity uses this for their own servers. And until now it all works fine!


#9

Hi @thoys,

Thank you for the suggestion, we’ll have to re-evaluate our build environment to see if that is the case…
(By we I mean @Ronnie… I better put the kettle on)

We’ll let you know what we discover.

Kind regards,

Micah


#10

Is compiling the stack for ubuntu/debian finally possible without mass hair loss? compiling for CentOS was the defacto and nobody seemed to know how to compile for ubuntu, (though i did succeed once on a rpi of all things!)

Switching the dev server to ubuntu would make life easier, (for me anyway) it’s a lot to consider and set back up though, since there is a lot of other stuff running on there, we cant simply just swap the os…


#11

Also, afaik, it’s 5.4 not qt5.5

[root@d10252 /]# qmake-qt5 -v
QMake version 3.0
Using Qt version 5.4.2 in /usr/lib64

#12

For compiling in an Ubuntu environment, perhaps the following would be useful;

https://alphas.highfidelity.io/t/basic-outline-only-how-to-compile-hifi-in-ubuntu-14-04-again/9010


#13

Ah ok, that’s good. @Ronnie , if you have trouble porting everything on the server and your systems memory/diskspace allows it you could consider running a virtual machine. I use VirtualBox myself on my Fedora and run the HiFi servers in an Ubuntu install on there.


#14

I’ve had the same problem since a few months back on my CentOS node.
Had just been waiting to see if a future code update would fix it as it worked fine before.


#15

I migrated away from CentOS myself since HiFi their self is dead set on using Ubuntu and that is the way that is best overall.

I did write a script for Ubuntu to auto compile and last I checked it worked but its been a little bit. You can check it out if you need an auto compiler for Ubuntu 15.04 - https://alphas.highfidelity.io/t/ubuntu-installer-script-beta-info-about-centos-errors-centos-fixed/6887

But yes I did compile the info for those who still wanted to use Ubuntu 14.04. I use that for my Jenkins server which runs Ubuntu 14.04 but the actual binaries are run on 15.04.

Edit to add: My slave servers all run a cron script that check if the processes are running and if they are not then it kills the entire stack and restarts it. Its part of the “auto update” cron. I do not see why not do that to make sure that you are always running.


#16

Just tried upgading GCC on that box from 4.8 to 4.9 after seeing that version mentioned in the ubuntu compile guide.
Got my domain running again.

Guess that’s good enough till I find time to rebuild the VM using ubuntu instead


#17

Is that still on CentOS?


#18

I appreciate HiFi is still Alpha, but is there actually any explaination for this if it is an issue affecting CentOS systems all round?

Would be slightly irritating if it was, and isn’t being addressed in an official capacity.

Migrating to Ubuntu is not a realistic solution purely because our server isn’t being dedicated for HiFi use.


#19

Yeah, CentOS6.5 in my case.

I’m running a slightly older release of XenServer on the hypervisor which doesn’t support the newer SystemD based distro’s. There’s enough production level stuff running on it to make just updating Xen a non-trivial matter.


#20

Did your previous build (with gcc 4.8) have the same issue though? Succesful build, executes fine, however any child processes for the stacks would instantly cut out?