https://nebula-graph.io logo
#nebula
Title
# nebula
r

Róbert Kuzma

11/20/2023, 8:40 PM
Hello nebula users, I'm trying to update k8s nebula from 3.4.0 -> 3.6.0, nebula-operator is updating fine. Storaged and metad are also updated, the problem is with graphd, which is in CrashLoopBackOff state, but unfortunately kubectl get logs does not return any relevant info. Have you encountered a similar problem? Thank you
j

Jeremy Simpson

11/20/2023, 9:15 PM
Have you updated the CRDs as well?
Also If you can ssh into it before it crashes, there is a logs folder that may help. But may be difficult to do.
r

Róbert Kuzma

11/21/2023, 6:46 AM
Thank you for reply. Yes CRDs are updated too to latest available. Pod is ready for half second so its very challenging exec in to pod and see logs :)
w

wey

11/21/2023, 9:37 AM
Now the logging was not properly handled in a stdout/stderr way to enable kubectl get logs, any chance you could inspect the crashing graphd pod to know its log pvc and create a temp pod attached to it?
r

Róbert Kuzma

11/21/2023, 10:36 AM
Thanks for manual I will try it. I tried to exec in to metad pods (which are running) and I found this type of error:
75 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebula-cluster-metad-2.nebula-cluster-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
, but when a ping domain name from pod its working. Even if I have set
kubernetesClusterDomain: "cluster.local"
in nebula-operator.
w

wey

11/21/2023, 11:08 AM
All meta are sufferring from this? or ? how is metad2 going from its log?
r

Róbert Kuzma

11/21/2023, 11:13 AM
Yes all metad pods have same error, but with different pod address. Log from metad2:
75 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebula-cluster-metad-0.nebula-cluster-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I have enabled logging from graphd, and I recieved:
GraphDaemon.cpp:110] host not found:nebula-cluster-graphd-2.nebula-cluster-graphd-headless.nebula.svc.cluster.local
w

wey

11/21/2023, 11:27 AM
could you please check those headless services PublishNotReadyAddresses true or false? strange, they should be set true by the operator but now from this situation it looks like false
r

Róbert Kuzma

11/21/2023, 11:30 AM
All headless svc have PublishNotReadyAddresses true
👀 1
w

wey

11/21/2023, 11:58 AM
are metad crashing, please? If yes what are logs of metad other than services fqdn not reachable?
r

Róbert Kuzma

11/21/2023, 12:32 PM
Metad are in running state (in logs are info about "failed to resolve address..." but it appered once after nebula redeploy, but now this error is not shown anymore).
We solved it with delete nebula-graphd statefulSet, after recreate graphd pods start without error.
❤️ 1
j

Jeremy Simpson

11/21/2023, 4:59 PM
Ah yeah I completely forgot about this. I too encountered this same issue before. It’s because during the upgrade from 3.4->3.6 the fqdn was changed, but it was not updated in the statefulset (because it’s immutable iirc). So the solution is to delete the SS then it will be re-recreated properly.
❤️ 1
w

wey

11/22/2023, 3:01 AM
Oh, sorry for this, this should be highlighted in docs/ or fixed in the operator. Thanks @Róbert Kuzma @Jeremy Simpson
❤️ 1
3 Views