Janis Iranee

Private AI systems — self-hosted, controlled, compliant

Use LLMs on internal knowledge without sending data to external APIs.

  • Keep sensitive data internal (on-prem / VPC / local)
  • Turn documents into usable knowledge (search + Q&A with citations)
  • Choose the right approach: RAG, LoRA, or fine-tuning — with evaluation
Talk about your private AI use case

What this is

A local AI system can summarize, answer questions, and help teams find knowledge. It runs on controlled infrastructure and needs governance and evaluation to be trustworthy.

When to use what

  • RAG when you need up-to-date knowledge from documents
  • LoRA/partial fine-tuning when you need style/format/domain adaptation
  • Both when you need grounded answers + tailored behavior

Who it’s for

  • SMEs with sensitive documents (finance/legal/HR)
  • Teams that can’t use external LLM APIs for compliance reasons
  • Agencies building AI features for clients with privacy requirements
  • Internal teams wanting a knowledge assistant for support/sales/ops

Typical engagements

Local AI Feasibility & Prototype (2–4 weeks)

  • Select use case, scope, success criteria
  • Minimal RAG prototype on a sample document set
  • Evaluation checklist + next steps

Production Self-Hosted RAG (4–10 weeks)

  • Ingestion pipeline, chunking, embeddings, indexing
  • Permissions / access model concept
  • Evaluation harness (regression tests)
  • Monitoring (latency, retrieval quality proxies, failure logs)

Model Adaptation (LoRA / partial fine-tuning) (2–8 weeks)

  • Dataset design (instructions, examples)
  • LoRA/QLoRA experiments
  • Evaluation against baseline
  • Deployment strategy and rollback

Example tasks

  • Internal “document assistant” with citations and access control
  • Summarize long PDFs and produce structured outputs
  • Draft internal emails/FAQs based on company docs
  • Build a knowledge base that stays private
  • Tailor outputs to your company style (templates, formats)

Deliverables

  • Running system (prototype or production) + documentation
  • Evaluation harness + test set guidance
  • Deployment docs (how to run/update)
  • Handover workshop + maintenance plan options
For technical readers
  • Ingestion pipeline: parsing, chunking strategies, metadata
  • Embeddings and retrieval: vector store selection, indexing strategy
  • RAG prompt strategy with citations and traceability
  • Evaluation harness: golden set, regression tests, human review loop
  • LoRA/QLoRA / partial fine-tuning experiments
  • Quantization and performance optimization for local hardware
  • Governance: access control, audit logs, safe deployment patterns

Why I’m good at this

  • Hands-on experience building local workflows with LLaMA/Mistral/BERT-style models
  • Built RAG pipelines (embeddings, indexing, retrieval) for offline/internal use
  • Understands performance and deployment constraints on local hardware
  • Strong focus on governance, evaluation, and maintainability

How I work

  • Start with a well-scoped use case and a small document set
  • Define what “good” means (accuracy, citations, safety, latency)
  • Iterate quickly: prototype → evaluation → production hardening
  • Deliver a maintainable system with docs and handover

Get in touch

Have a project in mind? I typically respond within 1-2 business days.