So in order to ingest parquet to nebula, we need to use the parquet reader from spark. A spark application will convert the parquet to the in-memory format Nebula needs for graph processing?
If I have a need to persist the update on the in-memory format for nebula to disk, how can I do it in nebula?
w
wey
11/21/2022, 4:23 AM
Nebula Exchange loaded them from different sources(batch or stream, parquet in HDFS is one of them, clickhouse, MySQL, hive, oracle, Kafka, Pulsar etc. are supported, too) as dataframe and then write to NebulaGraph(either via DML query or generate SST files).
For now there is no direct support of Apache Arrow, while, exchange speaks JDBC, too, could we wire it like Arrow-[JDBC]-Exchange?
https://docs.nebula-graph.io/3.3.0/nebula-exchange/use-exchange/ex-ug-import-from-jdbc/
If it’s not doable, could you help create an issue in nebula-exchange for adding support of apache arrow?
c
Chak-Pong Chung
11/21/2022, 3:57 PM
I am not sure whether I understand it.
By JDBC, do you mean persistence of incremental update on the in-memory graph will write to a data source (like a DB) via JDBC?
My question is about how to use Nebula to export the current state of the graph, not to import from some data sources to construct the in-memory graph
m
min wu
11/22/2022, 10:02 AM
Nebula has its own memory/disk format, which is not the parquet format.