Hi Zack,
Welcome to the community!
The split-n-embedding approach by nature:
- chunking assumes the information density/how data is spread
- embedding model may get literally similar chunks but not related on domain knowledge sense
The KG on the other hand is:
- fine-grained segmentation of info
- interconnection-relevance oriented
While the pure KG approach:
- Normally is string exact match based on start-node searching
- Not good at persisting detailed information like text chunks in Vector Store
Thus combining the two is optimal in some use cases.
We for now cherry-picked some cases that KG helps
https://colab.research.google.com/drive/1tLjOg2ZQuIClfuWrAC2LdiZHCov8oUbs#scrollTo=_Cherry_picked_Examples_that_KG_helps
We are creating a proprietary SDK/tool suite(early stage now) to enable advanced RAGs with the help of Graph, Vector, planning-based graph exploration approaches, etc(paper on the way) π.
Feel free to discuss more about it.