Hi! what does nebula use to import property data o...
# nebula
c
Hi! what does nebula use to import property data of parquet format into memory?
for the context: one viable approach would be to use apache arrow.
w
Dear @Chak-Pong Chung, Welcome to the community! Nebula-Exchange is a tool to enable multiple data sources ingesting to nebulagraph, including parquet in HDFS, could this fulfill your requirements? https://docs.nebula-graph.io/3.3.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange/ https://github.com/vesoft-inc/nebula-exchange
c
So in order to ingest parquet to nebula, we need to use the parquet reader from spark. A spark application will convert the parquet to the in-memory format Nebula needs for graph processing?
If I have a need to persist the update on the in-memory format for nebula to disk, how can I do it in nebula?
w
Nebula Exchange loaded them from different sources(batch or stream, parquet in HDFS is one of them, clickhouse, MySQL, hive, oracle, Kafka, Pulsar etc. are supported, too) as dataframe and then write to NebulaGraph(either via DML query or generate SST files). For now there is no direct support of Apache Arrow, while, exchange speaks JDBC, too, could we wire it like Arrow-[JDBC]-Exchange? https://docs.nebula-graph.io/3.3.0/nebula-exchange/use-exchange/ex-ug-import-from-jdbc/ If it’s not doable, could you help create an issue in nebula-exchange for adding support of apache arrow?
c
I am not sure whether I understand it. By JDBC, do you mean persistence of incremental update on the in-memory graph will write to a data source (like a DB) via JDBC?
My question is about how to use Nebula to export the current state of the graph, not to import from some data sources to construct the in-memory graph
m
Nebula has its own memory/disk format, which is not the parquet format.