AI Engineering Toolbox
A curated list of libraries for all phases of AI Engineering.
Agents
Category | Tool | Remarks |
---|---|---|
Libraries | autogen, crewai, langgraph, langroid, openai-agents, pydantic-ai, smolagents, marvin | |
Web Use | browser-use |
Data
Category | Tool | Remarks |
---|---|---|
Synthetic Data | curator, datadreamer, distillabel, fabricator, promptwright | |
Pretraining pipelines | datatrove, nemo-curator | |
Data Exploration | lilac | |
Deduplication | semhash |
Evals
Category | Tool | Remarks |
---|---|---|
Libraries | athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai | |
evaluate | Off-the-shelf metrics | |
Agents | agentsevals | |
Benchmarks | openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness | |
llmtest-needleinahaystack | Needle in a Hatstack | |
Diversity | diversity | |
Hallucination | lettucedetect | |
RAG | ragas, ragchecker | |
auto-evaluator | Generate synthetic QA pairs from docs | |
chunking-evaluation | Evaluate chunking strategies | |
Text Generation | bleurt |
Experiment Tracking
Category | Tool | Remarks |
---|---|---|
Libraries | trulens, mlflow, wandb |
Inference
Category | Tool | Remarks |
---|---|---|
Inference Servers | ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm | |
Batch Processing | skypilot | |
Inference Optimization | medusa | |
Multi-LoRA inference | s-lora, punica, lorax | |
LLM Gateway | litellm, portkey | |
LLM Routing | routellm | |
Semantic Cache | gptcache | |
Quantization | llm-compressor, bitsandbytes, optimum | |
Embeddings | text-embeddings-inference, infinity | |
Kernels | liger-kernel | |
Local Servers | lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp | |
Frontend UI | gradio, streamlit |
Finetuning
Category | Tool | Remarks |
---|---|---|
LLM Finetuning | axolotl, unsloth, torchtune, peft, litgpt | |
Model Merging | mergekit, mergoo | |
Multi-modal VLMs | maestro, nanovlm | |
Reinforcement Learning | art, trl, verl, openrlhf | |
verifiers | Verifiers | |
Distributed Training | metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai | |
hivemind | Decentralized Training |
Prompt Engineering
Category | Tool | Remarks |
---|---|---|
Automatic Prompt Engineering | dspy, textgrad, adalflow | |
Function Calling | functionary | |
Structured Output | guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose | |
Memory | mem0, letta, memobase, memary, langmem, memoripy, | |
Rate Limiting | backoff, tenacity, ratelimit |
RAG
Category | Tool | Remarks |
---|---|---|
Libraries | llama-index, verba, fastrag, haystack | |
Chunking | wtsplit, semchunk, chonkie, langchain-text-splitters | |
open-parse | Layout parsing visually | |
sparseprimingrepresentations | SPR | |
Reranking | rerankers, flaskrank | |
Sparse Retrieval | bm25s | |
Embeddings | fastembed, sentence-transformers | |
model2vec | Static Vectors | |
Vector Index | annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus | |
pgvector, sqlite-vec | SQL Extensions | |
simsimd | Faster dot-product on CPUs | |
Late Interaction | ragatouille | Train ColBERT models |
Graph RAG | fast-graphrag, graphrag, nano-graphrag |
Audio
Category | Tool | Remarks |
---|---|---|
Voice Activity Detection | silero-vad | |
Speech to Text | faster-whisper, whisperx |
Production Monitoring
Category | Tool | Remarks |
---|---|---|
Observability | openllmetry, phoenix, logfire | |
Guardrails | giskard, langkit, garak, deepchecks | |
rebuff | Prompt Injection Detection | |
Drift Detection | ft-drift | Detect drift in OpenAI messages |
AI Detection | binoculars |