Hi folks, I am trying to install nebula on ec2 ser...
# nebula
h
Hi folks, I am trying to install nebula on ec2 server from this doc. I am facing error
0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling
. after the installation, can someone please help me.
More inputs: When I look into pvc, I see they are in pending state.
NAME            STATUS  VOLUME  CAPACITY  ACCESS MODES  STORAGECLASS  AGE
metad-data-nebula-metad-0  Pending                   gp2      23h
metad-log-nebula-metad-0  Pending                   gp2      23h
pvc log throws below error.
waiting for a volume to be created, either by external provisioner "<http://ebs.csi.aws.com|ebs.csi.aws.com>" or manually created by system administrator
w
Hi Himanshu, welcome to the community! Could you plz ensure there is storage class named gp2? If not plz update corresponding sc in the crd.
h
Hi wey, yes there is. here is the output of
kubectl get sc
Copy code
NAME      PROVISIONER       RECLAIMPOLICY  VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION  AGE
ebs-sc     <http://ebs.csi.aws.com|ebs.csi.aws.com>     Delete     WaitForFirstConsumer  false         25s
gp2 (default)  <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>  Delete     Immediate       false         24h
Copy code
kubectl describe sc gp2 --namespace=nebula-operator-system
Name:         gp2
IsDefaultClass:    Yes
Annotations:      <http://storageclass.kubernetes.io/is-default-class=true|storageclass.kubernetes.io/is-default-class=true>
Provisioner:      <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>
Parameters:      fsType=ext4,type=gp2
AllowVolumeExpansion: <unset>
MountOptions:     <none>
ReclaimPolicy:     Delete
VolumeBindingMode:   Immediate
Events:        <none>
w
@kevin.qiao could you plz help look into this?
k
@Himanshu Gupta Hi Himanshu, please run cmd 'kubectl get pods -l app.kubernetes.io/cluster=nebula' and describe the pending pod, I need to confirm the reason of error 'preemption: 0/2 nodes are available'
❤️ 1
h
Copy code
ubuntu@ip:~$ kubectl get pods -l <http://app.kubernetes.io/cluster=nebula|app.kubernetes.io/cluster=nebula> --namespace=nebula-operator-system
NAME       READY  STATUS  RESTARTS  AGE
nebula-metad-0  0/1   Pending  0     3d19h
ubuntu@ip:~$ kubectl describe pod nebula-metad-0 --namespace=nebula-operator-system
Name:      nebula-metad-0
Namespace:   nebula-operator-system
Priority:    0
Node:      <none>
Labels:     <http://app.kubernetes.io/cluster=nebula|app.kubernetes.io/cluster=nebula>
        <http://app.kubernetes.io/component=metad|app.kubernetes.io/component=metad>
        <http://app.kubernetes.io/managed-by=nebula-operator|app.kubernetes.io/managed-by=nebula-operator>
        <http://app.kubernetes.io/name=nebula-graph|app.kubernetes.io/name=nebula-graph>
        controller-revision-hash=nebula-metad-7fc467d58f
        <http://statefulset.kubernetes.io/pod-name=nebula-metad-0|statefulset.kubernetes.io/pod-name=nebula-metad-0>
Annotations:  <http://nebula-graph.io/cm-hash|nebula-graph.io/cm-hash>: d6d215628f9203cb
Status:     Pending
IP:       
IPs:      <none>
Controlled By: StatefulSet/nebula-metad
Containers:
 metad:
  Image:    vesoft/nebula-metad:v3.2.0
  Ports:    9559/TCP, 19559/TCP, 19560/TCP
  Host Ports: 0/TCP, 0/TCP, 0/TCP
  Command:
   /bin/bash
   -ecx
   exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula-operator-system.svc.cluster.local:9559 --local_ip=$(hostname).nebula-metad-headless.nebula-operator-system.svc.cluster.local --ws_ip=$(hostname).nebula-metad-headless.nebula-operator-system.svc.cluster.local --daemonize=false
  Limits:
   cpu:   1
   memory: 1Gi
  Requests:
   cpu:    500m
   memory:   500Mi
  Readiness:  http-get http://:19559/status delay=10s timeout=5s period=10s #success=1 #failure=3
  Environment: <none>
  Mounts:
   /usr/local/nebula/data from metad-data (rw,path="data")
   /usr/local/nebula/etc from nebula-metad (rw)
   /usr/local/nebula/logs from metad-log (rw,path="logs")
   /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fvrp5 (ro)
Conditions:
 Type      Status
 PodScheduled  False 
Volumes:
 metad-log:
  Type:    PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
  ClaimName: metad-log-nebula-metad-0
  ReadOnly:  false
 metad-data:
  Type:    PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
  ClaimName: metad-data-nebula-metad-0
  ReadOnly:  false
 nebula-metad:
  Type:   ConfigMap (a volume populated by a ConfigMap)
  Name:   nebula-metad
  Optional: false
 kube-api-access-fvrp5:
  Type:          Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds: 3607
  ConfigMapName:      kube-root-ca.crt
  ConfigMapOptional:    <nil>
  DownwardAPI:       true
QoS Class:          Burstable
Node-Selectors:       <none>
Tolerations:         <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
               <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
 Type   Reason      Age           From        Message
 ----   ------      ----           ----        -------
 Warning FailedScheduling 106s (x1103 over 3d19h) default-scheduler 0/2 nodes are available: 2 pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Copy code
ubuntu@ip:~$ kubectl get pvc --namespace=nebula-operator-system
NAME            STATUS  VOLUME  CAPACITY  ACCESS MODES  STORAGECLASS  AGE
metad-data-nebula-metad-0  Pending                   gp2      3d19h
metad-log-nebula-metad-0  Pending                   gp2      3d19h
k
@Himanshu Gupta please show me 'kubectl describe nodes' outputs
h
Hi @kevin.qiao, please see below.
Copy code
ubuntu@ip-10-0-0-0:~$ kubectl describe nodes --namespace=nebula-opertor-system
Name:               ip-10-0-0-0
Roles:              control-plane
Labels:             <http://beta.kubernetes.io/arch=amd64|beta.kubernetes.io/arch=amd64>
                    <http://beta.kubernetes.io/os=linux|beta.kubernetes.io/os=linux>
                    <http://kubernetes.io/arch=amd64|kubernetes.io/arch=amd64>
                    <http://kubernetes.io/hostname=ip-10-0-0-0|kubernetes.io/hostname=ip-10-0-0-0>
                    <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
                    <http://node-role.kubernetes.io/control-plane=|node-role.kubernetes.io/control-plane=>
                    <http://node.kubernetes.io/exclude-from-external-load-balancers=|node.kubernetes.io/exclude-from-external-load-balancers=>
Annotations:        <http://kubeadm.alpha.kubernetes.io/cri-socket|kubeadm.alpha.kubernetes.io/cri-socket>: unix:///run/cri-dockerd.sock
                    <http://node.alpha.kubernetes.io/ttl|node.alpha.kubernetes.io/ttl>: 0
                    <http://projectcalico.org/IPv4Address|projectcalico.org/IPv4Address>: 10.0.0.0/24
                    <http://projectcalico.org/IPv4IPIPTunnelAddr|projectcalico.org/IPv4IPIPTunnelAddr>: 192.168.1.0
                    <http://volumes.kubernetes.io/controller-managed-attach-detach|volumes.kubernetes.io/controller-managed-attach-detach>: true
CreationTimestamp:  Tue, 23 Aug 2022 17:57:44 +0000
Taints:             <http://node-role.kubernetes.io/control-plane:NoSchedule|node-role.kubernetes.io/control-plane:NoSchedule>
                    <http://node-role.kubernetes.io/master:NoSchedule|node-role.kubernetes.io/master:NoSchedule>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-0-0
  AcquireTime:     <unset>
  RenewTime:       Mon, 29 Aug 2022 10:47:09 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 24 Aug 2022 14:10:42 +0000   Wed, 24 Aug 2022 14:10:42 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 29 Aug 2022 10:46:15 +0000   Tue, 23 Aug 2022 17:57:44 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 29 Aug 2022 10:46:15 +0000   Tue, 23 Aug 2022 17:57:44 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 29 Aug 2022 10:46:15 +0000   Tue, 23 Aug 2022 17:57:44 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 29 Aug 2022 10:46:15 +0000   Wed, 24 Aug 2022 14:05:44 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.0.0
  Hostname:    ip-10-0-0-0
Capacity:
  cpu:                32
  ephemeral-storage:  203070420Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             261050556Ki
  pods:               110
Allocatable:
  cpu:                32
  ephemeral-storage:  187149698763
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             260948156Ki
  pods:               110
System Info:
  Machine ID:                 ec25c504906bf381e4fe9928fb048e97
  System UUID:                ec25c504-906b-f381-e4fe-9928fb048e97
  Boot ID:                    c87dd66c-6e90-485d-a34a-0a5959eb33df
  Kernel Version:             5.13.0-1029-aws
  OS Image:                   Ubuntu 20.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  <docker://20.10.17>
  Kubelet Version:            v1.24.4
  Kube-Proxy Version:         v1.24.4
PodCIDR:                      192.168.0.0/24
PodCIDRs:                     192.168.0.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  kruise-system               kruise-daemon-pcsf5                         0 (0%)        50m (0%)    0 (0%)           128Mi (0%)     5d16h
  kube-system                 calico-kube-controllers-5b97f5d8cf-hfg7f    0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d20h
  kube-system                 calico-node-lzv9k                           250m (0%)     0 (0%)      0 (0%)           0 (0%)         4d20h
  kube-system                 etcd-ip-10-0-0-0                        100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         5d16h
  kube-system                 kube-apiserver-ip-10-0-0-0              250m (0%)     0 (0%)      0 (0%)           0 (0%)         5d16h
  kube-system                 kube-controller-manager-ip-10-0-0-0     200m (0%)     0 (0%)      0 (0%)           0 (0%)         5d16h
  kube-system                 kube-proxy-6fr2q                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d16h
  kube-system                 kube-scheduler-ip-10-0-0-0              100m (0%)     0 (0%)      0 (0%)           0 (0%)         5d16h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                900m (2%)   50m (0%)
  memory             100Mi (0%)  128Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>


Name:               ip-10-0-1-0
Roles:              <none>
Labels:             <http://beta.kubernetes.io/arch=amd64|beta.kubernetes.io/arch=amd64>
                    <http://beta.kubernetes.io/os=linux|beta.kubernetes.io/os=linux>
                    <http://kubernetes.io/arch=amd64|kubernetes.io/arch=amd64>
                    <http://kubernetes.io/hostname=ip-10-0-1-0|kubernetes.io/hostname=ip-10-0-1-0>
                    <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
Annotations:        <http://kubeadm.alpha.kubernetes.io/cri-socket|kubeadm.alpha.kubernetes.io/cri-socket>: unix:///run/cri-dockerd.sock
                    <http://node.alpha.kubernetes.io/ttl|node.alpha.kubernetes.io/ttl>: 0
                    <http://projectcalico.org/IPv4Address|projectcalico.org/IPv4Address>: 10.0.101.219/24
                    <http://projectcalico.org/IPv4IPIPTunnelAddr|projectcalico.org/IPv4IPIPTunnelAddr>: 192.168.105.0
                    <http://volumes.kubernetes.io/controller-managed-attach-detach|volumes.kubernetes.io/controller-managed-attach-detach>: true
CreationTimestamp:  Wed, 24 Aug 2022 13:39:59 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-1-0
  AcquireTime:     <unset>
  RenewTime:       Mon, 29 Aug 2022 10:47:13 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 24 Aug 2022 14:10:50 +0000   Wed, 24 Aug 2022 14:10:50 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 29 Aug 2022 10:45:34 +0000   Wed, 24 Aug 2022 13:39:59 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 29 Aug 2022 10:45:34 +0000   Wed, 24 Aug 2022 13:39:59 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 29 Aug 2022 10:45:34 +0000   Wed, 24 Aug 2022 13:39:59 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 29 Aug 2022 10:45:34 +0000   Wed, 24 Aug 2022 14:05:32 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.0.9
  Hostname:    ip-10-0-1-0
Capacity:
  cpu:                32
  ephemeral-storage:  203070420Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             261050556Ki
  pods:               110
Allocatable:
  cpu:                32
  ephemeral-storage:  187149698763
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             260948156Ki
  pods:               110
System Info:
  Machine ID:                 ec214596968bc843a7e7b569b5e3021a
  System UUID:                ec214596-968b-c843-a7e7-b569b5e3021a
  Boot ID:                    7b21ffb1-b32e-4e17-a633-4f5511b6abbb
  Kernel Version:             5.13.0-1029-aws
  OS Image:                   Ubuntu 20.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  <docker://20.10.17>
  Kubelet Version:            v1.25.0
  Kube-Proxy Version:         v1.25.0
PodCIDR:                      192.168.1.0/24
PodCIDRs:                     192.168.1.0/24
Non-terminated Pods:          (12 in total)
  Namespace                   Name                                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                              ------------  ----------  ---------------  -------------  ---
  cert-manager                cert-manager-5dd59d9d9b-r9g46                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d16h
  cert-manager                cert-manager-cainjector-8696fc9f89-mb4c7                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d16h
  cert-manager                cert-manager-webhook-7d4b5b8c56-ckz2j                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         5d16h
  kruise-system               kruise-controller-manager-766b57c995-5gqhh                        100m (0%)     200m (0%)   256Mi (0%)       512Mi (0%)     5d16h
  kruise-system               kruise-controller-manager-766b57c995-9sth9                        100m (0%)     200m (0%)   256Mi (0%)       512Mi (0%)     5d16h
  kruise-system               kruise-daemon-zhgc4                                               0 (0%)        50m (0%)    0 (0%)           128Mi (0%)     4d21h
  kube-system                 calico-node-ls8mf                                                 250m (0%)     0 (0%)      0 (0%)           0 (0%)         4d20h
  kube-system                 coredns-6d4b75cb6d-2tjl7                                          100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     5d16h
  kube-system                 coredns-6d4b75cb6d-9cmm8                                          100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     5d16h
  kube-system                 kube-proxy-dkhxs                                                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d21h
  nebula-operator-system      nebula-operator-controller-manager-deployment-5bd9fbf6b7-28jcs    200m (0%)     300m (0%)   120Mi (0%)       230Mi (0%)     4d20h
  nebula-operator-system      nebula-operator-controller-manager-deployment-5bd9fbf6b7-xxgmf    200m (0%)     300m (0%)   120Mi (0%)       230Mi (0%)     4d20h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                1050m (3%)  1050m (3%)
  memory             892Mi (0%)  1952Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>
k
@Himanshu Gupta please show me 'kubectl get sc' outputs, I think metad pending was caused by storageClass 'gp2', gp2 is aws default storage provisioner.
h
@kevin.qiao , Please note that gp2 was not created automatically by aws. I created it using yaml file. Please let me know if you need any more details. Thanks a lot for your help!
Copy code
ubuntu@ip:~$ kubectl get sc
NAME      PROVISIONER       RECLAIMPOLICY  VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION  AGE
ebs-sc     <http://ebs.csi.aws.com|ebs.csi.aws.com>     Delete     WaitForFirstConsumer  false         2d23h
gp2 (default)  <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>  Delete     Immediate       false         4d
ubuntu@ip-:~$ kubectl describe sc gp2
Name:         gp2
IsDefaultClass:    Yes
Annotations:      <http://storageclass.kubernetes.io/is-default-class=true|storageclass.kubernetes.io/is-default-class=true>
Provisioner:      <http://kubernetes.io/aws-ebs|kubernetes.io/aws-ebs>
Parameters:      fsType=ext4,type=gp2
AllowVolumeExpansion: <unset>
MountOptions:     <none>
ReclaimPolicy:     Delete
VolumeBindingMode:   Immediate
Events:        <none>
k
@Himanshu Gupta I see your k8s cluster was created by 'kubeadm', I've tested with 'eksctl' which enable IAM role automatically, I think you can check your IAM role configuration, also check ebs-csi-controller logs, please follow the link troubleshoot ebs
Copy code
$ eksctl create cluster --name ngtest --region us-east-1 --ssh-access --ssh-public-key test --nodes 2 --vpc-cidr 10.0.0.0/16
❤️ 1
👍 1
h
Worked like a charm. Thanks a lot @kevin.qiao adn @wey
👍 1
❤️ 1