h4cker/ai_research/RAG/README.md
2024-08-18 20:11:07 -04:00

376 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RAG Resources
- [Files from my "Using Retrieval Augmented Generation (RAG), Langchain, and LLMs for Cybersecurity Operations" Course](https://github.com/santosomar/RAG-for-cybersecurity)
- [LangFlow](https://github.com/langflow-ai/langflow) - a low-code app builder for RAG and multi-agent AI applications. Its Python-based and agnostic to any model, API, or database.
### Disadvantages of RAG
- [Disadvantages of RAG](https://medium.com/@kelvin.lu.au/disadvantages-of-rag-5024692f2c53)
### RAG Patterns
- [Generative AI Lifecycle Patterns](https://dr-arsanjani.medium.com/the-generative-ai-lifecycle-1b0c7d9463ec)
- [Why do RAG pipelines fail? Advanced RAG Patterns — Part1
Ozgur Guler](https://cloudatlas.me/why-do-rag-pipelines-fail-advanced-rag-patterns-part1-841faad8b3c2)
- [How to improve RAG peformance — Advanced RAG Patterns — Part2](https://cloudatlas.me/how-to-improve-rag-peformance-advanced-rag-patterns-part2-0c84e2df66e6)
- [Patterns for Building LLM-based Systems & Products](https://eugeneyan.com/writing/llm-patterns/)
- [AI Engineer Summit - Building Blocks for LLM Systems & Products](https://eugeneyan.com/speaking/ai-eng-summit/)
- [Technical Considerations for Complex RAG](https://medium.com/enterprise-rag/a-first-intro-to-complex-rag-retrieval-augmented-generation-a8624d70090f)
### Dialogue Routing
- [Routing in RAG-Driven Applications](https://towardsdatascience.com/routing-in-rag-driven-applications-a685460a7220)
## Retrieval
### Vector Retrieval
- [Boosting RAG: Picking the Best Embedding & Reranker models](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83)
- [What We Need to Know Before Adopting a Vector Database](https://medium.com/@kelvin.lu.au/what-we-need-to-know-before-adopting-a-vector-database-85e137570fbb)
#### Chunking
- [Chunking Strategies for LLM Applications](https://www.pinecone.io/learn/chunking-strategies/)
- [Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex](https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5)
- [How to Chunk Text Data — A Comparative Analysis](https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a)
#### Embeddings
#### Vector Search
- [Awesome Search](https://github.com/frutik/awesome-search)
- [Advanced RAG Retrieval Strategies: Sentence Window Retrieval](https://generativeai.pub/advanced-rag-retrieval-strategies-sentence-window-retrieval-b6964b6e56f7)
### Not Vector Retrieval
- [Vector Search Is Not All You Need](https://towardsdatascience.com/vector-search-is-not-all-you-need-ecd0f16ad65e)
- [Build a search engine, not a vector DB](https://blog.elicit.com/search-vs-vector-db/)
- [Improving RAG (Retrieval Augmented Generation) Answer Quality with Re-ranker](https://medium.com/towards-generative-ai/improving-rag-retrieval-augmented-generation-answer-quality-with-re-ranker-55a19931325)
- [From Search to Synthesis: Enhancing RAG with BM25 and Reciprocal Rank Fusion](https://medium.com/@kachari.bikram42/from-search-to-synthesis-enhancing-rag-with-bm25-and-reciprocal-rank-fusion-872d21dc4ca7)
## Generation
### Prompts
- [Emerging RAG & Prompt Engineering Architectures for LLMs](https://cobusgreyling.medium.com/updated-emerging-rag-prompt-engineering-architectures-for-llms-17ee62e5cbd9)
- [How to Cut RAG Costs by 80% Using Prompt Compression](https://towardsdatascience.com/how-to-cut-rag-costs-by-80-using-prompt-compression-877a07c6bedb)
#### Prompting strategies
##### Multi-Modal RAG
- [Multi-Modal RAG](https://blog.llamaindex.ai/multi-modal-rag-621de7525fea)
##### Multi-index RAG
- [Having all of your data stored in one collection isn't always the best for RAG apps](https://twitter.com/ecardenas300/status/1724829560041038072)
##### Multi-Document
- [Advanced RAG — Multi-Documents Agent with LlamaIndex](https://blog.gopenai.com/advanced-rag-multi-documents-agent-with-llamaindex-43b604f84909)
##### FLARE
- [Better RAG with Active Retrieval Augmented Generation FLARE](https://blog.lancedb.com/better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f)
##### Chain-of-Verification
- [in-Of-Verification Reduces Hallucination in LLMs](https://cobusgreyling.medium.com/chain-of-verification-reduces-hallucination-in-llms-20af5ea67672)
##### Chain-Of-Thought
- [Chain-Of-Thought Prompting In LLMs](https://cobusgreyling.medium.com/chain-of-thought-prompting-in-llms-1077164edf97)
### Context
- [The Needle In a Haystack Test](https://towardsdatascience.com/the-needle-in-a-haystack-test-a94974c1ad38)
- [Conversational Memory for LLMs with Langchain](https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/)
#### Long context RAG
- [The next generation of RAG: Long-Context RAG](https://twitter.com/ecardenas300/status/1724129722492142048)
- [NVIDIA Research: RAG with Long Context LLMs](https://blog.llamaindex.ai/nvidia-research-rag-with-long-context-llms-7d94d40090c4)
#### Knowledge and Knowledge Graphs
- [Graph RAG: Unleashing the Power of Knowledge Graphs with LLM](https://medium.com/@nebulagraph/graph-rag-the-new-llm-stack-with-knowledge-graphs-e1e902c504ed)
- [Embeddings + Knowledge Graphs: The Ultimate Tools for RAG Systems](https://towardsdatascience.com/embeddings-knowledge-graphs-the-ultimate-tools-for-rag-systems-cbbcca29f0fd)
- [The Practical Benefits to Grounding an LLM in a Knowledge Graph
Daniel Bukowski](https://medium.com/@bukowski.daniel/the-practical-benefits-to-grounding-an-llm-in-a-knowledge-graph-919918eb493)
- [Implement RAG with Knowledge Graph and Llama-Index](https://medium.aiplanet.com/implement-rag-with-knowledge-graph-and-llama-index-6a3370e93cdd)
- [Awesome Knowledge Graphs](https://github.com/frutik/awesome-knowledge-graphs)
- [HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models](https://arxiv.org/abs/2405.14831)
### Automated prompt optimization
### Hallucination
- [How to Detect Hallucinations in LLMs](https://towardsdatascience.com/real-time-llm-hallucination-detection-9a68bb292698)
- [Measuring Hallucinations in RAG Systems](https://vectara.com/measuring-hallucinations-in-rag-systems/)
### Guardrails
- [Safeguarding LLMs with Guardrails](https://towardsdatascience.com/safeguarding-llms-with-guardrails-4f5d9f57cff2)
- [NeMo Guardrails: The Missing Manual](https://www.pinecone.io/learn/nemo-guardrails-intro/)
## LLM Models
### Finetuning and Pretraining
- [Fine-Tuning Llama 2.0 with Single GPU Magic](https://ai.plainenglish.io/fine-tuning-llama2-0-with-qloras-single-gpu-magic-1b6a6679d436)
- [Practitioners guide to fine-tune LLMs for domain-specific use case](https://cismography.medium.com/practitioners-guide-to-fine-tune-llms-for-domain-specific-use-case-part-1-4561714d874f)
- [Are You Pre-training your RAG Models on Your Raw Text?](https://medium.com/thirdai-blog/are-you-pre-training-your-rag-models-on-your-raw-text-40f832d87703)
- [Combine Multiple LoRA Adapters for Llama 2](https://towardsdatascience.com/combine-multiple-lora-adapters-for-llama-2-ea0bef9025cf)
- [RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?](https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7)
## Evaluation of RAGs
- [RAG Evaluation](https://cobusgreyling.medium.com/rag-evaluation-9813a931b3d4)
- [Evaluating RAG: A journey through metrics](https://www.elastic.co/search-labs/blog/articles/evaluating-rag-metrics)
- [Exploring End-to-End Evaluation of RAG Pipelines](https://betterprogramming.pub/exploring-end-to-end-evaluation-of-rag-pipelines-e4c03221429)
- [Evaluation Driven Development, the Swiss Army Knife for RAG Pipelines](https://levelup.gitconnected.com/evaluation-driven-development-the-swiss-army-knife-for-rag-pipelines-dba24218d47e)
- [Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex](https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5)
## Performance and cost
- [Secrets to Optimizing RAG LLM Apps for Better Performance, Accuracy and Lower Costs!](https://medium.com/madhukarkumar/secrets-to-optimizing-rag-llm-apps-for-better-accuracy-performance-and-lower-cost-da1014127c0a)
## Privacy
- [Masking PII Data in RAG Pipeline](https://betterprogramming.pub/masking-pii-data-in-rag-pipeline-326d2d330336)
## Security
- [Hijacking Chatbots: Dangerous Methods Manipulating GPTs](https://medium.com/@jankammerath/hijacking-chatbots-dangerous-methods-manipulating-gpts-52342f4f88b8)
## Applications of RAG
### Chatbots
## Tools
- [Three Open-Source RAG Tools You Need to Know About](https://medium.com/programmers-journey/three-open-source-rag-tools-you-need-to-know-about-331c3f28ab22)
- [HayStack](https://github.com/deepset-ai/haystack)
- [RAGAS](https://github.com/explodinggradients/ragas)
### DSPy
- [DSPy — Does It Live Up To The Hype?](https://medium.com/emalpha/dspy-does-it-live-up-to-the-hype-6e56c2c6e7a0)
### AutoRAG
- [AutoRAG](https://github.com/Marker-Inc-Korea/AutoRAG) - AutoML tool for RAG. Automatically optimize RAG pipeline with single YAML file.
### AutoGPT
### Langchain
- [Langchain](https://github.com/langchain-ai/langchain)
- [Langchain is NOT for production use. Here is why ..](https://medium.com/@aldendorosario/langchain-is-not-for-production-use-here-is-why-9f1eca6cce80)
### LlamaIndex
- [LlamaIndex](https://github.com/run-llama/llama_index)
- [Building Production-Ready LLM Apps with LlamaIndex: Document Metadata for Higher Accuracy Retrieval](https://betterprogramming.pub/building-production-ready-llm-apps-with-llamaindex-document-metadata-for-higher-accuracy-retrieval-a8ceca641fb5)
- [Building Production-Ready LLM Apps With LlamaIndex: Recursive Document Agents for Dynamic Retrieval](https://betterprogramming.pub/building-production-ready-llm-apps-with-llamaindex-recursive-document-agents-for-dynamic-retrieval-1f4b25287918)
## Vendor-specific examples
- [RAG Pipeline with Mistral 7B Instruct Model in Colab: A Step-by-Step Guide
Qendel AI GoPenAI](https://blog.gopenai.com/rag-pipeline-with-mistral-7b-instruct-model-a-step-by-step-guide-138df378a0c2)
### Elastcisearch + OpenAI
- [ChatGPT and Elasticsearch: OpenAI meets private data](https://www.elastic.co/search-labs/blog/chatgpt-elasticsearch-openai-meets-private-data)
### OpenAI and ChatGPT
- [Compare PDF Question Answering Systems Build with OpenAI and Google VertexAI](https://medium.com/@kelvin.lu.au/compare-pdf-question-answering-with-openai-and-google-vertexai-46638d62327b)
#### Tools and fucntions
- [Unlocking the Power of the OpenAI API: Master Function-Calling with Practical Examples](https://medium.com/@apollovro/unlocking-the-power-of-the-openai-api-master-function-calling-with-practical-examples-f8b9ab2fceec)
- [penAI/Chat-GPT Function Calling : for Enhanced AI Interactions](https://levelup.gitconnected.com/openai-chat-gpt-function-calling-for-enhanced-ai-interactions-338be974027)
### Vespa
- [Hands-On RAG guide for personal data with Vespa and LLamaIndex](https://blog.vespa.ai/scaling-personal-ai-assistants-with-streaming-mode/)
### Qdrant
## Running RAGs in production
## Vectors corner
- [Similarity Search, Part 2: Product Quantization](https://towardsdatascience.com/similarity-search-product-quantization-b2a1a6397701)
- [Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval](https://huggingface.co/blog/embedding-quantization)
- [Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets
Image of Nils Reimers](https://cohere.com/blog/int8-binary-embeddings)
## Research Papers
### Survey and Benchmark
**Benchmarking Large Language Models in Retrieval-Augmented Generation** \
*Jiawei Chen, Hongyu Lin, Xianpei Han, Le Sun* \
arXiv 2023. [[Paper](https://arxiv.org/abs/2309.01431)][[Github](https://github.com/chen700564/RGB)] \
4 Sep 2023
### Retrieval-enhanced LLMs
**Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models** \
*Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu* \
arxiv - Nov 2023 [[Paper](https://arxiv.org/abs/2311.09210)]
**REST: Retrieval-Based Speculative Decoding** \
*Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, Di He* \
arXiv - Nov 2023 [[Paper](https://arxiv.org/abs/2311.08252)][[Github](https://github.com/fasterdecoding/rest)]
**Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection**
*Anonymous*
ICLR 24 Oct 2023 [[paper](https://openreview.net/forum?id=hSyW5go0v8)]
**Self-Knowledge Guided Retrieval Augmentation for Large Language Models** \
*Yile Wang, Peng Li, Maosong Sun, Yang Liu* \
arXiv - Oct 2023 [[Ppaer](https://arxiv.org/abs/2310.05002)]
**Retrieval meets Long Context Large Language Models** \
*Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro* \
arxiv - Oct 2023 [[Paper](https://arxiv.org/abs/2310.03025)]
**DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines**
*Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts*
arXiv Oct 2023 [[paper](https://arxiv.org/abs/2310.03714)] [[code](https://github.com/stanfordnlp/dspy)]
**Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts**
*Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su*
ICLR 24 May 2023 [[paper](https://arxiv.org/abs/2305.13300)] [[code](https://github.com/OSU-NLP-Group/LLM-Knowledge-Conflict)]
**Active Retrieval Augmented Generation**
*Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig*
arXiv May 2023 [[paper](https://arxiv.org/abs/2305.06983)] [[code](https://github.com/jzbjyb/FLARE)]
**REPLUG: Retrieval-Augmented Black-Box Language Models**
*Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih*
arXiv Jan 2023 [[paper](https://arxiv.org/abs/2301.12652)]
**Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks**
*Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela*
NeurIPS 2020 - May 2020 [[Paper](https://arxiv.org/abs/2005.11401)]
### RAG Instruction Tuning
**RA-DIT: Retrieval-Augmented Dual Instruction Tuning**
*Anonymous*
ICLR 24 Oct 23 [[paper](https://openreview.net/forum?id=22OTbutug9)]
**InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining**
*Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro* \
arXiv - Oct 23 [[paper](https://openreview.net/forum?id=4stB7DFLp6)]
### RAG In-Context Learning
**In-Context Retrieval-Augmented Language Models**
*Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham*
AI21 Labs Jan 2023 [[paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/63c6c20dec4479564db21819_NEW_In_Context_Retrieval_Augmented_Language_Models.pdf)] [[code](https://github.com/AI21Labs/in-context-ralm)]
### RAG Embeddings
**RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling** \
*Jingcheng Deng, Liang Pang, Huawei Shen, Xueqi Cheng* \
EMNLP 2023 - Oct 2023 [[Paper](https://arxiv.org/abs/2310.10567)][[Github](https://github.com/TrustedLLM/RegaVAE)]
**Text Embeddings Reveal (Almost) As Much As Text** \
*John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush* \
EMNLP 2023 - Oct 2023 [[Paper](https://arxiv.org/abs/2310.06816?ref=upstract.com)][[Github](https://github.com/jxmorris12/vec2text)]
**Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents** \
*Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao* \
arXiv - Oct 2023. [[Paper](https://arxiv.org/abs/2310.19923)][[Model](https://huggingface.co/jinaai/jina-embeddings-v2-small-en)]
### RAG Simulators
**KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants** \
*Kaustubh D. Dhole* \
Simulation of Conversational Intelligence in Chat, EACL 2024 [[Paper](https://arxiv.org/abs/2401.16454)]
### RAG Search
### RAG Long-text and Memory
**HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models** \
*Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su* \
arXiv - May 2024 [[paper](https://arxiv.org/abs/2405.14831)] [[GitHub](https://github.com/OSU-NLP-Group/HippoRAG)]
**Understanding Retrieval Augmentation for Long-Form Question Answering** \
*Hung-Ting Chen, Fangyuan Xu, Shane A. Arora, Eunsol Choi* \
arXiv - Oct 2023 [[Paper](https://arxiv.org/abs/2310.12150)]
### RAG Evaluation
**ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems** \
*Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia* \
arXiv - Nov 2023. [[Paper](https://arxiv.org/abs/2311.09476)] [[Github](https://github.com/stanford-futuredata/ares)]
### RAG Optimization
**Learning to Filter Context for Retrieval-Augmented Generation** \
*Zhiruo Wang, Jun Araki, Zhengbao Jiang, Md Rizwan Parvez, Graham Neubig* \
arxiv- Nov 2023 [[Paper](https://arxiv.org/abs/2311.08377)][[Github](https://github.com/zorazrw/filco)]
**Large Language Models Can Be Easily Distracted by Irrelevant Context** \
*Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou* \
ICML 2023 - Jan 2023 [[Paper](https://arxiv.org/abs/2302.00093)][[Github](https://github.com/google-research-datasets/GSM-IC)]
**Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks** \
*Akari Asai, Matt Gardner, Hannaneh Hajishirzi* \
NAACL 2022 - Dec 2021 [[Paper](https://arxiv.org/abs/2112.08688)][[Github](https://github.com/akariasai/evidentiality_qa)]
**When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories** \
*Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi* \
ACL 2023 - Dec 2022 [[Paper](https://arxiv.org/abs/2212.10511)][[Github](https://github.com/alextmallen/adaptive-retrieval)]
### RAG Application
**Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination** \
*Haoqiang Kang, Xiao-Yang Liu* \
arXiv - Nov 2023 [[Paper](https://arxiv.org/abs/2311.15548)]
**Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature** \
*Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah* \
arXiv - Oct 2023. [[Paper](https://arxiv.org/abs/2310.16146v1)]
**PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers** \
*Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Steve Menezes, Tina Baghaee, Emmanuel Barajas Gonzalez, Jennifer Neville, Tara Safavi* \
arXiv - Nov 2023. [[Paper](https://arxiv.org/abs/2311.09180)]