AI Engineering Toolbox
A collection of libraries for all phases of AI Engineering.
Phase: Data
Data Preparation
Category | Tool | Remarks |
---|---|---|
Synthetic Data | curator, datadreamer, distillabel, fabricator, promptwright | |
auto-label | Generate annotated datasets | |
Pretraining pipelines | datatrove, nemo-curator | |
Data Exploration | lilac | |
Deduplication | semhash | |
Web Scraping | trafilatura, crawl4ai |
Phase: Evaluation
Evals
Category | Tool | Remarks |
---|---|---|
Libraries | athina-evals, deepeval, geval, openevals, promptfoo, inspect-ai | |
evaluate, torchmetrics, torcheval | Off-the-shelf metrics | |
mir-eval | ||
Agents | agentsevals | |
Benchmarks | openai-evals, yet-another-applied-llm-benchmark, lm-evaluation-harness | |
llmtest-needleinahaystack | Needle in a Hatstack | |
agentbench, textarena | Agents | |
Diversity | diversity | |
Hallucination | lettucedetect, hallucination-leaderboard | |
RAG | ragas, ragchecker | |
auto-evaluator | Generate synthetic QA pairs from docs | |
chunking-evaluation | Evaluate chunking strategies | |
Text Generation | bleurt |
Phase: Modeling
Experiment Tracking
Category | Tool | Remarks |
---|---|---|
Libraries | trulens, mlflow, wandb, mlop |
Prompt Engineering
Category | Tool | Remarks |
---|---|---|
Automatic Prompt Engineering | dspy, textgrad, adalflow | |
openevolve | Evolutionary code optimization | |
Function Calling | functionary | |
Structured Output | guidance, instructor, jsonformer,lm-format-enforcer, outlines, xgrammar, lqml, fructose | |
llm-scraper | webpage to json | |
Memory | mem0, letta, memobase, memary, langmem, memoripy, | |
Rate Limiting | backoff, tenacity, ratelimit | |
limits | Rate-limit for own APIs | |
Code Interpreter | gpt-code-interpreter, open-interpreter, codeinterpreter-api | |
Steering Vectors | dialz, repeng |
RAG
Category | Tool | Remarks |
---|---|---|
Libraries | llama-index, verba, fastrag, haystack | |
Chunking | wtsplit, semchunk, chonkie, langchain-text-splitters | |
open-parse | Layout parsing visually | |
sparseprimingrepresentations | SPR | |
Reranking | rerankers, flaskrank | |
Retrieval | pyserini | |
bm25s | Sparse retrieval | |
Embeddings | fastembed, sentence-transformers | |
model2vec | Static Vectors | |
Vector Index | annoy, diskann, faiss, chroma, qdrant, pinecone, weviate, milvus | |
pgvector, sqlite-vec | SQL Extensions | |
simsimd | Faster dot-product on CPUs | |
Late Interaction | ragatouille, pylate | Train ColBERT models |
Graph RAG | fast-graphrag, graphrag, nano-graphrag | |
OCR | textract | |
nougat | Academic Documents | |
Document Understanding | donut | |
table-transformer | Table Extraction | |
marker | PDF to markdown | |
RAG on internal docs | onyx, xyne |
Agents
Category | Tool | Remarks |
---|---|---|
Libraries | autogen, crewai, langroid, openai-agents, pydantic-ai, marvin, metagpt, semantic-kernel | |
langgraph | Graphs | |
smolagents | Code-based agent | |
Web Use | browser-use | |
deer-flow | Deep Research | |
magnitude | Automated Testing | |
Computer Use | ui-tars-desktop, cuda | |
Code Execution Sandbox | microsandbox, e2b |
Finetuning
Category | Tool | Remarks |
---|---|---|
LLM Finetuning | axolotl, unsloth, torchtune, peft, litgpt, llama-factory | |
onebitllms | 1.58-bit LLMs | |
Model Merging | mergekit, mergoo | |
Multi-modal VLMs | maestro, nanovlm, mlx-vlm | |
smol-vision | Recipes | |
llava | Visual Instruction Tuning | |
RL | art, trl, verl, openrlhf, atropos, retrain | |
verifiers | Verifiers | |
Distributed Training | metagron-lm, deepspeed, yafsdp, nanotron, fairscale, colossalai | |
hivemind, psyche | Decentralized Training | |
Self-instruct | airoboros | |
Tokenization | supertokenizer | Train multi-word BPE |
Classification | adaptive-classifier | Continuous Learning |
Multimodality
Audio
Category | Tool | Remarks |
---|---|---|
Voice Activity Detection | silero-vad | |
General models | seamless-communication | |
Speech to Text | faster-whisper, whisperx | |
Text to Speech | chatterbox | |
whisper-streaming | Realtime |
Vision
Category | Tool | Remarks |
---|---|---|
Facial recognition | deepface | |
VLM | CogVLM |
Phase: Deployment
Inference
Category | Tool | Remarks |
---|---|---|
Inference Servers | ray, vllm, powerinfer, text-generation-inference, sglang, tensorrt-llm, ctransalte2, mlc-llm, deepspeed-mii, openllm, exllamav2, fastgen | |
Batch Processing | skypilot | |
Multi-token prediction | medusa | |
Multi-LoRA inference | s-lora, punica, lorax | |
LLM Gateway | litellm, portkey | |
LLM Routing | routellm, automix, openrouter, awesome-ai-model-routing | |
Model Cascade | frugalgpt | |
Semantic Cache | gptcache | |
Quantization | llm-compressor, bitsandbytes, optimum | |
Embeddings | text-embeddings-inference, infinity | |
Kernels | liger-kernel | |
attorch | Triton kernels | |
Local Servers | lmstudio, ollama, text-generation-webui, koboldcpp, llama.cpp | |
Frontend UI | gradio, streamlit | |
copilot-kit | Chat UI components | |
Caching | panza | Caching for async functions |
Monitoring
Category | Tool | Remarks |
---|---|---|
Observability | openllmetry, phoenix, logfire | |
Guardrails | giskard, langkit, garak, deepchecks, nemo-guardrails | |
rebuff | Prompt Injection Detection | |
uqlm | Uncertainty Quantification | |
Drift Detection | ft-drift | Detect drift in OpenAI messages |
AI Detection | binoculars |