https://nebula-graph.io logo
#nebula
Title
c

Chak-Pong Chung

11/21/2022, 1:11 AM
Hi! what does nebula use to import property data of parquet format into memory?
for the context: one viable approach would be to use apache arrow.
w

wey

11/21/2022, 2:03 AM
Dear @Chak-Pong Chung, Welcome to the community! Nebula-Exchange is a tool to enable multiple data sources ingesting to nebulagraph, including parquet in HDFS, could this fulfill your requirements? https://docs.nebula-graph.io/3.3.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ https://github.com/vesoft-inc/nebula-exchange
c

Chak-Pong Chung

11/21/2022, 2:23 AM
So in order to ingest parquet to nebula, we need to use the parquet reader from spark. A spark application will convert the parquet to the in-memory format Nebula needs for graph processing?
If I have a need to persist the update on the in-memory format for nebula to disk, how can I do it in nebula?
w

wey

11/21/2022, 4:23 AM
Nebula Exchange loaded them from different sources(batch or stream, parquet in HDFS is one of them, clickhouse, MySQL, hive, oracle, Kafka, Pulsar etc. are supported, too) as dataframe and then write to NebulaGraph(either via DML query or generate SST files). For now there is no direct support of Apache Arrow, while, exchange speaks JDBC, too, could we wire it like Arrow-[JDBC]-Exchange? https://docs.nebula-graph.io/3.3.0/nebula-exchange/use-exchange/ex-ug-import-from-jdbc/ If it’s not doable, could you help create an issue in nebula-exchange for adding support of apache arrow?
c

Chak-Pong Chung

11/21/2022, 3:57 PM
I am not sure whether I understand it. By JDBC, do you mean persistence of incremental update on the in-memory graph will write to a data source (like a DB) via JDBC?
My question is about how to use Nebula to export the current state of the graph, not to import from some data sources to construct the in-memory graph
m

min wu

11/22/2022, 10:02 AM
Nebula has its own memory/disk format, which is not the parquet format.
6 Views