Services

Local LLMs

Ollamallama.cppvLLMLLMDSGVOSelf-Hosted

Know-how for running LLMs on your own hardware. Local inference with open-source models as an alternative to cloud APIs.

Advantages: full data control, no API costs at high volume, GDPR-compliant. Hands-on with various model sizes, quantization, GPU requirements, and integration into existing applications.

Components

Inference — Ollama, llama.cpp, vLLM — from quick local setup to high-performance production serving on macOS and Linux
Security & access control — API gateway in front of the inference server for authentication, rate limiting, and routing
Model selection — Choosing the right model for the use case: small & fast for classification, large & capable for generation
Hardware consulting — Advice on GPU, RAM, and infrastructure requirements depending on model size and throughput needs

Overview