I’ve been wrestling with a problem that keeps me up at night: How can I use powerful AI tools without handing over my most sensitive data to tech giants? The more I used ChatGPT, Claude, and other cloud AI services, the more uncomfortable I became. Every document I uploaded was being processed on someone else’s servers.
Motivation: Why I Had to Take Control Back
This whole project started from frustration. I was working on confidential business documents and realized I couldn’t use any of the AI tools I’d grown to depend on. The vendor lock-in issue became real – each provider had its own API, pricing, and limitations. I felt trapped, and my monthly AI bills kept climbing.
I started thinking: what if there was a way to get the best of both worlds? Keep the powerful AI capabilities but run everything on my own infrastructure. That’s when I discovered LibreChat and decided to build something that would give me complete control – and create a blueprint that other companies facing similar challenges could follow.
What I Built: A Blueprint for Private AI
The Private AI project became my answer to this dilemma. I built a self-hosted LibreChat setup on AWS that gives me access to multiple AI models while keeping data under my control. LibreChat is a pretty cool open-source project, a ChatGPT-style chat platform with a mature, enterprise-grade foundation that enables full privacy and rich configurability.
What started as a personal solution quickly became something bigger – a repeatable blueprint for any organization struggling with the same privacy-versus-capability trade-offs.
Here’s what I ended up with: LibreChat, an innovative open-source project that gives you a web interface rivaling ChatGPT but running on your own server. This competitive alternative lets you switch seamlessly between local models, cloud models from AWS Bedrock and public AI providers like Antrophic or OpenAi. When working with sensitive documents, everything stays local. When you need the power of Claude or GPT-4, you can use those too.
The setup includes document processing where you can upload PDFs and chat with them without files ever leaving your server. Everything runs in Docker containers with proper user management – perfect for teams that need to collaborate while maintaining data sovereignty.
The Interface: What It Actually Looks Like
LibreChat gives me a clean, familiar interface that feels just like using ChatGPT or Claude. I can select different AI models from a dropdown, create custom agents with specific personalities, and manage all my conversations in one place.
What I love most is the flexibility. In the morning I might use a local Ollama model for brainstorming (keeping everything private), then switch to Claude for complex analysis, then back to a local model for document processing. The interface makes it seamless.
The document upload feature was a game-changer for me. I can drag and drop PDFs, and the system creates embeddings locally on my server. Then I can ask questions about the content without the document ever touching external servers. This was exactly what I needed for working with confidential materials.
The Technical Setup: Simpler Than You’d Think
I built this on AWS using a pretty straightforward architecture. Everything runs in Docker containers on a single EC2 instance, which makes it much easier to manage than I initially expected.
The AWS setup is clean – one EC2 instance with proper security groups, an elastic IP so the address doesn’t change, and IAM roles that let the server talk to AWS Bedrock models. I spent way too much time initially trying to overcomplicate the networking, but the simple approach worked best.
The Docker setup runs LibreChat as the main app, MongoDB for storing conversations, and nginx as a reverse proxy with SSL. When I want to use local AI models, I add Ollama containers that can use the GPU. Everything talks to each other through Docker networks, which keeps it secure and organized.
Scaling Up: Learning From My Mistakes
One of my biggest lessons was about server sizing. I started with a tiny t3.medium
instance thinking it would be enough. Wrong. Once I tried running local AI models, everything crawled to a halt.
Instance Type | vCPUs | Memory | GPU | What I Used It For |
---|---|---|---|---|
t3.medium | 2 | 4 GB | None | Basic testing, cloud models only |
g4dn.xlarge | 4 | 16 GB | NVIDIA T4 (16 GB) | Local models, document processing |
g6.12xlarge | 48 | 192 GB | 4x NVIDIA L4 (96 GB total) | Heavy workloads, multiple users |
The t3.medium
was fine for the web interface and using cloud models, but forget about running anything locally. When I upgraded to g4dn.xlarge
with a GPU, suddenly I could run small language models and process documents locally. The difference was night and day.
For the final testing phase, I splurged on a g6.12xlarge
instance. This beast could handle multiple large models simultaneously and felt like having a proper AI workstation in the cloud. The cost made my eyes water, but for serious work, it is worth it.
My advice: start small to test everything, then size up based on what you actually need. The beauty of this approach is that upgrading is just a few Terraform commands.
The jump to the g6.12xlarge was like going from a bicycle to a Ferrari. Four NVIDIA L4 GPUs with 96 GB of video memory meant I could run multiple large models at the same time without breaking a sweat. Suddenly I could have Claude-sized models running locally while still serving other users.
Here’s what I learned about performance: the tiny t3.medium is great for testing the web interface and trying cloud models, but don’t even think about local AI. The g4dn.xlarge is the sweet spot for getting started with local models – I could run 7B parameter models pretty smoothly. But if you want to run the really big models (13B+) with decent speed, you need something like the g6.12xlarge.
Live-Scripting: The Documentation That Actually Works
Here’s something I’m really excited about – the way I documented this entire project. I used what I call live-scripting, a documentation oriented work style which I invented years ago and used in many customer projects. This produces a document, where every single command can be executed directly. No more following tutorials only to find out step 3 doesn’t work anymore.
I wrote everything in Emacs Org-mode, which lets me embed executable code blocks right in the documentation. When I’m working through the deployment, I can literally run each command by pressing F4. But even if someone doesn’t use Emacs, they can copy and paste every command and it will work exactly as written.
This approach saved me countless hours of debugging outdated instructions. Traditional documentation gets stale fast – APIs change, software versions update, and suddenly nothing works. With live-scripting, my documentation stays current because I test it every time I use it.
The whole deployment is broken into seven clear phases, from setting up the basic infrastructure to adding advanced AI features. Each step includes both the commands to run and explanations of what’s happening and why. It’s like having a knowledgeable colleague walking me through the process.
Conclusion: Finally, AI on My Terms
Building this private AI assistant solved my immediate problem, but it became something bigger – a proven blueprint that any company can follow. What started as personal frustration turned into a repeatable deployment strategy for organizations facing the same privacy-versus-capability dilemma.
The best part? Companies don’t have to choose anymore. Sensitive work stays on local models, complex analysis can still use cloud AI when needed – but it’s your infrastructure, your choice, your control.
The live-scripting documentation makes this blueprint truly shareable. Every command works exactly as written, which means other teams can deploy this without the usual deployment headaches.
For companies wanting powerful AI without surrendering data sovereignty – this approach works. The technology is finally at the point where running your own AI isn’t just possible for tech giants, it’s practical for any organization willing to invest in their data independence.
Resources and Links
Project Documentation
- Complete Deployment Guide (PDF) – The full live-scripted documentation with all commands and explanations needed to deploy this setup from scratch. Every command is tested and executable.
GitHub Repositories
- Private AI Project – My complete implementation including Terraform configs, Docker setup, and all the infrastructure code used in this deployment.
- LibreChat Official Repository – The innovative open-source project that makes this all possible. A powerful, ChatGPT-like interface that you can run on your own infrastructure with support for multiple AI providers.