A quick question regarding using nebula-importer: ...
# nebula
p
A quick question regarding using nebula-importer: I’m installing Nebula on a GCP compute instance, and trying to ingest data from csv files locally in the compute instance. The node ingest is usually pretty smooth, but when doing the edge ingestion, very often I’d incur errors like
Copy code
...
2022/10/25 23:06:06 [INFO] statsmgr.go:89: Tick: Time(90.00s), Finished(8826624), Failed(0), Read Failed(0), Latency AVG(74173us), Batches Req AVG(74755us), Rows AVG(98072.47/s)
2022/10/25 23:06:11 [INFO] statsmgr.go:89: Tick: Time(95.00s), Finished(8826624), Failed(0), Read Failed(0), Latency AVG(74173us), Batches Req AVG(74755us), Rows AVG(92911.68/s)
2022/10/25 23:06:13 [ERROR] handler.go:63: Client 8 fail to execute: INSERT EDGE `communication`(`pic_send`,`pic_reply`,`message_send`,`message_reply`) VALUES  "cd175378-5b57-40c0-8044-abcefdcccxxx"->"f9aad3da-8fe7-41d2-be7e-9866xxx13e9e":(0,0,23,15) , "f06b5211-bf19-4dxxxx78f-e32da48c390e"->"d038xxxx-4cbd-47ca-869c-91571b4697f0":(0,0,63,51) , ...
and when I check the local storage service, it was offline.
Copy code
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| Host        | Port | HTTP port | Status    | Leader count | Leader distribution  | Partition distribution | Version |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| "127.0.0.1" | 9779 | 19669     | "OFFLINE" | 0            | "No valid partition" | "us_comm:1"            | "3.2.0" |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
Got 1 rows (time spent 1303/1945 us)
although I checked right before the ingestion to make sure
nebula-storaged
was online. From the logging above, does it mean
storaged
was offline roughly 95 seconds after the ingestion started? I’ve encountered this several times, but not sure what the root cause yet. If anyone can shed some light, I’d really appreciate. Thanks.
here is the ingestion yaml in case it helps:
Copy code
version: v3
description: example graph
removeTempFiles: false
clientSettings:
  retry: 3
  concurrency: 32
  channelBufferSize: 256
  space: xxxxx (masked)
  connection:
    user: root
    password: root
    address: 127.0.0.1:9669
  postStart:
    commands: |
    afterPeriod: 8s
  preStop:
    commands: |
logPath: ./err/err_link.log
files:
  - path: <masked>
    batchSize: 256
    inOrder: true
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: edge
      edge:
        name: communication
        withRanking: false
        srcVID:
          type: string
          index: 0
        dstVID:
          type: string
          index: 1
        props:
          - name: pic_send
            type: int
            index: 2
          - name: pic_reply
            type: int
            index: 3
          - name: message_send
            type: int
            index: 4
          - name: message_reply
            type: int
            index: 5
all the nodes have been successfully ingested in a previous run already.
g
I think that inserting on single node graphd service will be too much with 32 concurrency, I will go with 4 to be on safe side. Also do you need
inOrder: true
to be used, it is with lower impact if you have
false
for ordering.
❤️ 1
👍 1
p
make sense, will try that. Thanks @Goran Cvijanovic!
tried
Copy code
inOrder: false
concurrency: 2
but still getting similar errors. In addition, after the
storaged
went offline, even if I tried to start it, it doesn’t seem to work.
Copy code
peicheng.yu@peicheng-nebula-ssd-1:~$ sudo /usr/local/nebula/scripts/nebula.service start all
[WARN] The maximum files allowed to open might be too few: 1024
[ERROR] nebula-metad already running: 1920
[ERROR] nebula-graphd already running: 2016
[INFO] Starting nebula-storaged...
[INFO] Done
peicheng.yu@peicheng-nebula-ssd-1:~$ ./nebula-console -addr 127.0.0.1 -port 9669 -u root -p root

Welcome to Nebula Graph!

(root@nebula) [(none)]> SHOW HOSTS
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| Host        | Port | HTTP port | Status    | Leader count | Leader distribution  | Partition distribution | Version |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| "127.0.0.1" | 9779 | 19669     | "OFFLINE" | 0            | "No valid partition" | "us_comm:1"            | "3.2.0" |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
Got 1 rows (time spent 879/1392 us)
Not sure why the storaged refuses to be back online even if the nebula.service start command seems to be successful? Thanks.
g
I’m little bit confused here. You have only one host in your configuration, there should be at least 3 of them to have raft consensus working properly. Maybe you should go and setup your cluster using recommended configuration from documentation. Also use latest version 3.2.1 which is with some issues resolved.
p
Ah, make sense. Potentially the raft consensus is what’s playing a role here. Will try it with the recommended config.
A quick and naive follow up question: looks like if I want to have more than 1 host, I’ll need to append the list in
storaged.conf
Copy code
--local_ip=127.0.0.1
to be a list of IPs. But I tried
Copy code
--local_ip=127.0.0.1,127.0.0.2
or
Copy code
--local_ip=[127.0.0.1,127.0.0.2]
or
Copy code
--local_ip=127.0.0.*
but none of them seem to work. What’d be the correct way to configure it? Otherwise only one of them will be online:
Copy code
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| Host        | Port | HTTP port | Status    | Leader count | Leader distribution  | Partition distribution | Version |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
| "127.0.0.1" | 9779 | 19669     | "ONLINE"  | 0            | "No valid partition" | "No valid partition"   | "3.2.0" |
| "127.0.0.2" | 9779 | 19669     | "OFFLINE" | 0            | "No valid partition" | "No valid partition"   |         |
| "127.0.0.3" | 9779 | 19669     | "OFFLINE" | 0            | "No valid partition" | "No valid partition"   |         |
+-------------+------+-----------+-----------+--------------+----------------------+------------------------+---------+
Got 3 rows (time spent 1251/1891 us)
thanks.
or is it actually not possible to have multiple hosts (i.e. high availability/reliability) in a single machine?
g
You need to have different PORT for service if you are on single machine, but it is better to use docker or other virtualisation system to have proper isolation and port mappings. Using docker (for example) will enable you to have services with same port number used, so it will be as you have normal distributed multi node cluster.
There is one important step to have working cluster
Starting from NebulaGraph 3.0.0, setting Storage hosts in the configuration files only registers the hosts on the Meta side, but does not add them into the cluster. You must run the
ADD HOSTS
statement to add the Storage hosts.
👍 1
Another thing to mention, it is actually possible to have single node Nebula version, but I didn’t try to run it.
Steps
Currently, you can only install standalone NebulaGraph with the source code. The steps are similar to those of the multi-process NebulaGraph. You only need to modify the step Generate Makefile with CMake by adding
-DENABLE_STANDALONE_VERSION=on
to the command. For example:
Copy code
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/NebulaGraph -DENABLE_TESTING=OFF -DENABLE_STANDALONE_VERSION=on -DCMAKE_BUILD_TYPE=Release ..
And another info about having list of nodes for metad services, example for 10.17.0.* range
--meta_server_addrs=10.17.0.269559,10.17.0.279559,10.17.0.28:9559
so it is just comma separated list of addresses with ports included
w
Yes, as Goran said, we could configure multiple instances of service in different port to test distributed deployment(in production, it’s expected one kind of service to have singleton per OS) Besides that also as Goran mentioned, you could optionally leverage docker-compose to enable ease of deploy distributed cluster in single server ref: https://docs.nebula-graph.io/3.2.1/4.deployment-and-installation/2.compile-and-install-nebula-graph/3.deploy-nebula-graph-with-docker-compose/
Also, @Peicheng Yu,I noticed that the open file limit is too small, this will lead file access issue for storaged, too as it warned:
Copy code
peicheng.yu@peicheng-nebula-ssd-1:~$ sudo /usr/local/nebula/scripts/nebula.service start all
[WARN] The maximum files allowed to open might be too few: 1024 <------------here (wey)
[ERROR] nebula-metad already running: 1920
[ERROR] nebula-graphd already running: 2016
We need to set this limit in OS level(verify with
ulimit -a
) to enable storaged to behave normally. ref: https://docs.nebula-graph.io/3.2.1/5.configurations-and-logs/1.configurations/6.kernel-config/
👍 1
p
make sense, thank you both! @Goran Cvijanovic @wey Will try your recommendations above 🙂