hello! what's the recommended migration setup for ingesting larger datasets (say SNB benchmark with bigger SFs - 1k-30k)? specifically how the migration should be handled to make it efficient, should I use exchange or importer? if exchange, is hdfs a must and would migrating to ssts first be fastest?
08/17/2022, 8:05 AM
Hi Kasper, Welcome to the nebulagraph community!
https://github.com/vesoft-inc/nebula-bench/ could be used to ingest SNB data to nebula with ease 🙂
For exchange SST, yes, it’s now assumed to have HDFS to enable file being downloaded from storaged(this means we have hdfs client in storaged and could be called by storaged process with shell.)
08/17/2022, 8:41 AM
should be enough for say Scale Factor 10k? I understand it uses
08/18/2022, 4:24 AM
Yes, although the importer compared to spark-based toolings is a binary tool only, while the bottleneck is normally not in client side, but in lsm tree compact, the importer is enough here 🙂
And we could use exchange with sink of SST or server(nebula graph client) for sure, but compaction afterward is needed, too.
update: we havent tried sf-10k with nebula-bench before, and in case you encounter issues on its datagen, you could try snf vanilla datagen and import with importer or exchange(both should be fine)