🧬 How to Build a Complete Local Medical AI System with Proxmox and Ollama

One of the most interesting AI projects you can implement at home is a **complete local medical expert system**, capable of functioning offline, without content restrictions, and without dependence on external services. In this complete guide, you will learn how to create such a system using **Proxmox**, **Ollama**, and an AI medical model optimized for local use.


🎯 Final Goal

We will build a **medical AI assistant** that runs entirely on your local infrastructure, without a cloud connection, and is subsequently integrable with an existing website (for example, via AI Engine).
The complete flow will look like this:

**Website → AI Engine → Ollama Local → Medical Response**


🧱 Chapter 1: Basic Proxmox Configuration

Minimum Hardware Requirements

  • 8 GB RAM (16 GB recommended)
  • 50 GB free space
  • Root/SSH access
  • Proxmox installed on the host server

Checking Available Resources

In the Proxmox web interface:

  • Monitor CPU and RAM
  • Check free storage space

📦 Chapter 2: Creating the LXC Container

We recommend an **LXC Container** instead of a virtual machine, as it is lighter and faster.

Recommended Configuration

  • Template: ubuntu-22.04-standard_22.04-1_amd64.tar.zst
  • RAM: 4 GB + 512 MB swap
  • CPU: 2 cores
  • Storage: 40 GB (20 GB root + 20 GB for models)
  • Static IP: 192.168.0.27
  • Firewall enabled

Network Configuration

  • Bridge: vmbr0
  • Gateway: 192.168.0.1
  • DNS: 192.168.0.1 or 8.8.8.8

After completion, click **Finish** and check the settings.


🔑 Chapter 3: Configuring SSH in the Container

SSH Installation and Activation

apt update && apt upgrade -y
apt install openssh-server sudo curl wget -y
systemctl enable ssh
systemctl start ssh
passwd root

Allowing Root Login

Edit the file:

nano /etc/ssh/sshd_config

And change the line:

PermitRootLogin yes

Then restart the service:

systemctl restart ssh

Test:

ssh [email protected]

🦙 Chapter 4: Installing Ollama

Configuring Dedicated Storage for Models

export OLLAMA_HOME=/var/lib/ollama
echo 'export OLLAMA_HOME=/var/lib/ollama' >> ~/.bashrc
source ~/.bashrc

Ollama Installation

curl -fsSL https://ollama.ai/install.sh | sh
systemctl enable ollama
systemctl start ollama
ollama --version

🧠 Chapter 5: AI Models

Ollama allows downloading and running open-source models locally. For this project, we will use **Llama 3.2 (3B)** — powerful enough for basic medical inference.

Model Download

ollama pull llama3.2:3b

Installation Verification

ollama list

Quick Test

ollama run llama3.2:3b "Explain diabetic polyneuropathy"

The model will respond completely offline, directly from your container.

🧬 Chapter 6: Recommended AI Models for Ollama (Medical and General)

Ollama supports a wide range of models, both general (for conversation, programming, or text analysis) and specialized (medical, scientific, or for summarization).
Below is a comparative table to help you choose the right model based on available resources and application purpose.

Comparative Table of Medical AI Models

text
MODEL           TYPE       PARAMETERS   MINIMUM RAM   GPU REQUIREMENTS  SIZE           STRENGTHS                    RECOMMENDED USE
--------------- --------- ----------- ------------ ----------------- ------------ ---------------------------- -----------------------------------------
Llama 3.2:3b    General   3B          4 GB         Optional         ~2.8 GB      Fast and compact             General chat, basic medical AI
Llama 3.1:8b    General   8B          8 GB         Recommended      ~7.8 GB      Coherent and clear analysis  Complex medical explanations
Mistral 7B      General   7B          8 GB         Recommended      ~6.9 GB      GPT-3.5 quality, speed      General conversations
Gemma 7B        General   7B          8 GB         Recommended      ~6.5 GB      Multi-lingual, well-trained  Translations and medical explanations
MedLlama2:7b    Medical   7B          8 GB         Optional         ~6.8 GB      Specialized clinical language  Differential diagnosis
BioMedLM        Medical   2.7B        4 GB         Optional         ~2.5 GB      Based on PubMed articles

🔍 Practical Recommendations

  • For Proxmox LXC with 4 GB RAM, we recommend Llama 3.2:3b or BioMedLM.

  • For a server with a dedicated GPU (RTX 3060 / 8 GB), you can run MedLLama2:7b or Gemma 7B without issues.

  • If you want responses in combined Romanian/English natural language, Phi-3 Mini is the most balanced lightweight model.

  • Larger models (>8B) can only be partially loaded into RAM and require fast swap or GPU.


Would you like to continue the work with this section fully integrated and styled (with emojis, headers, and identical formatting to the rest of the document), as v1.1 of the guide?
I can also include the exact instructions for downloading each model (ollama pull) and usage recommendations in the terminal.


🌐 Chapter 7: Local Web Interface (Simple Web UI)

You can create a minimal web interface in Flask for quick testing of the system.

Create the file simple_webui.py:

from flask import Flask, request, jsonify, render_template_string
import requests

app = Flask(__name__)

HTML = '''
<!DOCTYPE html>
<html>
<body>
    <h2>🦙 Ollama Medical Chat</h2>
    <input id="input" placeholder="Ask the medical model..." style="width:300px">
    <button onclick="ask()">Send</button>
    <div id="response" style="margin-top:20px; padding:10px; border:1px solid #ccc"></div>

    <script>
        async function ask() {
            const input = document.getElementById('input').value;
            document.getElementById('response').innerText = "Processing...";
            const response = await fetch('/chat', {
                method: 'POST',
                headers: {'Content-Type': 'application/json'},
                body: JSON.stringify({prompt: input})
            });
            const data = await response.json();
            document.getElementById('response').innerText = data.response;
        }
    </script>
</body>
</html>
'''

@app.route('/')
def home():
    return render_template_string(HTML)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    response = requests.post('http://localhost:11434/api/generate', json={
        "model": "llama3.2:3b",
        "prompt": data['prompt'],
        "stream": False
    })
    return jsonify(response.json())

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=3000, debug=True)

Flask installation and startup:

apt install python3-pip -y
pip3 install flask requests
python3 simple_webui.py

Then access in the browser:

http://192.168.0.27:3000

🩺 Chapter 8: Testing and Validation

Test the entire flow:

  • Website → AI Engine → Ollama Local
  • Complex medical questions (e.g., *treatment for polyneuropathy*, *botulinum toxin use*)
  • Check that the responses are not censored or incomplete

🟢 Chapter 9: Ollama and WebUI Permanent and Externally Visible

To prevent the situation where the closed terminal stops Ollama and WebUI, we create systemd services:

Ollama Service

Create /etc/systemd/system/ollama.service:

[Unit]
Description=Ollama Service
After=network-online.target
Wants=network-online.target

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
User=ollama
Group=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl restart ollama
ss -tuln | grep 11434

✅ **Ollama is now listening on all interfaces and starts automatically on boot.**

WebUI Service

Create /etc/systemd/system/ollama-webui.service:

[Unit]
Description=Ollama Simple WebUI
After=network.target ollama.service
Requires=ollama.service

[Service]
Type=simple
WorkingDirectory=/root
ExecStart=/usr/bin/python3 /root/simple_webui.py
Restart=always
RestartSec=5
User=root
Environment=PATH=/usr/bin:/usr/local/bin

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable ollama-webui
sudo systemctl start ollama-webui
sudo systemctl status ollama-webui

**Now your WebUI will start automatically after boot and connect directly to Ollama.**

🩺 Chapter 10: Monitoring and Maintenance

**Resource Monitoring:**

htop
df -h
ollama ps

**Recommended Backup:**

  • Complete snapshot of the container in Proxmox
  • Periodic backup for /var/lib/ollama

🧭 **Conclusion**

By correctly configuring a Proxmox container, installing Ollama with local AI models, and the automated WebUI system, you have a complete autonomous medical expert system, completely offline, and ready for integration with websites or other local applications.

**This setup guarantees:**

  • Stability and automatic startup
  • External visibility for Cloudflare Tunnel
  • Full capacity for testing and AI development without restrictions