AI Engineering Toolbox

A curated list of libraries for all phases of AI Engineering.
Author

Amit Chaudhary

Published

May 11, 2025

Modified

May 12, 2025

Agents

Category Tool Remarks
Libraries autogen, crewai, langgraph, langroid, openai-agents, pydantic-ai, smolagents, marvin
Web Use browser-use

Data

Category Tool Remarks
Synthetic Data curator, datadreamer, distillabel, fabricator, promptwright
Pretraining pipelines datatrove, nemo-curator
Data Exploration lilac
Deduplication semhash

Evals

Category Tool Remarks
Libraries athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai
evaluate Off-the-shelf metrics
Agents agentsevals
Benchmarks openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness
llmtest-needleinahaystack Needle in a Hatstack
Diversity diversity
Hallucination lettucedetect
RAG ragas, ragchecker
auto-evaluator Generate synthetic QA pairs from docs
chunking-evaluation Evaluate chunking strategies
Text Generation bleurt

Experiment Tracking

Category Tool Remarks
Libraries trulens, mlflow, wandb

Inference

Category Tool Remarks
Inference Servers ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm
Batch Processing skypilot
Inference Optimization medusa
Multi-LoRA inference s-lora, punica, lorax
LLM Gateway litellm, portkey
LLM Routing routellm
Semantic Cache gptcache
Quantization llm-compressor, bitsandbytes, optimum
Embeddings text-embeddings-inference, infinity
Kernels liger-kernel
Local Servers lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp
Frontend UI gradio, streamlit

Finetuning

Category Tool Remarks
LLM Finetuning axolotl, unsloth, torchtune, peft, litgpt
Model Merging mergekit, mergoo
Multi-modal VLMs maestro, nanovlm
Reinforcement Learning art, trl, verl, openrlhf
verifiers Verifiers
Distributed Training metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai
hivemind Decentralized Training

Prompt Engineering

Category Tool Remarks
Automatic Prompt Engineering dspy, textgrad, adalflow
Function Calling functionary
Structured Output guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose
Memory mem0, letta, memobase, memary, langmem, memoripy,
Rate Limiting backoff, tenacity, ratelimit

RAG

Category Tool Remarks
Libraries llama-index, verba, fastrag, haystack
Chunking wtsplit, semchunk, chonkie, langchain-text-splitters
open-parse Layout parsing visually
sparseprimingrepresentations SPR
Reranking rerankers, flaskrank
Sparse Retrieval bm25s
Embeddings fastembed, sentence-transformers
model2vec Static Vectors
Vector Index annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus
pgvector, sqlite-vec SQL Extensions
simsimd Faster dot-product on CPUs
Late Interaction ragatouille Train ColBERT models
Graph RAG fast-graphrag, graphrag, nano-graphrag

Audio

Category Tool Remarks
Voice Activity Detection silero-vad
Speech to Text faster-whisper, whisperx

Production Monitoring

Category Tool Remarks
Observability openllmetry, phoenix, logfire
Guardrails giskard, langkit, garak, deepchecks
rebuff Prompt Injection Detection
Drift Detection ft-drift Detect drift in OpenAI messages
AI Detection binoculars