This message was deleted NebulaGraph #nebula

Join Slack

This message was deleted.

# nebula

Slackbot

11/20/2023, 8:40 PM

This message was deleted.

Jeremy Simpson

11/20/2023, 9:15 PM

Have you updated the CRDs as well?

Jeremy Simpson

11/20/2023, 9:16 PM

Also If you can ssh into it before it crashes, there is a logs folder that may help. But may be difficult to do.

Róbert Kuzma

11/21/2023, 6:46 AM

Thank you for reply. Yes CRDs are updated too to latest available. Pod is ready for half second so its very challenging exec in to pod and see logs :)

wey

11/21/2023, 9:37 AM

Now the logging was not properly handled in a stdout/stderr way to enable kubectl get logs, any chance you could inspect the crashing graphd pod to know its log pvc and create a temp pod attached to it?

wey

11/21/2023, 9:37 AM

TIL We actually could forward stderr log referring to https://docs.nebula-graph.io/3.6.0/k8s-operator/4.cluster-administration/4.5.logging/

Róbert Kuzma

11/21/2023, 10:36 AM

Thanks for manual I will try it. I tried to exec in to metad pods (which are running) and I found this type of error:

75 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebula-cluster-metad-2.nebula-cluster-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2

, but when a ping domain name from pod its working. Even if I have set

kubernetesClusterDomain: "cluster.local"

in nebula-operator.

wey

11/21/2023, 11:08 AM

All meta are sufferring from this? or ? how is metad2 going from its log?

Róbert Kuzma

11/21/2023, 11:13 AM

Yes all metad pods have same error, but with different pod address. Log from metad2:

75 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebula-cluster-metad-0.nebula-cluster-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2

Róbert Kuzma

11/21/2023, 11:25 AM

I have enabled logging from graphd, and I recieved:

GraphDaemon.cpp:110] host not found:nebula-cluster-graphd-2.nebula-cluster-graphd-headless.nebula.svc.cluster.local

wey

11/21/2023, 11:27 AM

could you please check those headless services PublishNotReadyAddresses true or false? strange, they should be set true by the operator but now from this situation it looks like false

Róbert Kuzma

11/21/2023, 11:30 AM

All headless svc have PublishNotReadyAddresses true

👀 1

wey

11/21/2023, 11:58 AM

are metad crashing, please? If yes what are logs of metad other than services fqdn not reachable?

Róbert Kuzma

11/21/2023, 12:32 PM

Metad are in running state (in logs are info about "failed to resolve address..." but it appered once after nebula redeploy, but now this error is not shown anymore).

Róbert Kuzma

11/21/2023, 2:32 PM

We solved it with delete nebula-graphd statefulSet, after recreate graphd pods start without error.

❤️ 1

Jeremy Simpson

11/21/2023, 4:59 PM

Ah yeah I completely forgot about this. I too encountered this same issue before. It’s because during the upgrade from 3.4->3.6 the fqdn was changed, but it was not updated in the statefulset (because it’s immutable iirc). So the solution is to delete the SS then it will be re-recreated properly.

❤️ 1

wey

11/22/2023, 3:01 AM

Oh, sorry for this, this should be highlighted in docs/ or fixed in the operator. Thanks @Róbert Kuzma @Jeremy Simpson

❤️ 1

wey

11/22/2023, 3:24 AM

issue created https://github.com/vesoft-inc/nebula-operator/issues/402

👍 1

13 Views

Open in Slack

Previous Next