PARAGUAY

TAIWAN

ALBANIA

ARGENTINA

AUSTRALIA

AUSTRIA

AZERBAIJAN

BANGLADESH

BELGIUM

BOSNIA AND HERZEGOVINA

BRAZIL

BULGARIA

CANADA

CHILE

CHINA

COLOMBIA

COSTA RICA

CROATIA

CYPRUS

CZECH

DENMARK

ECUADOR

EGYPT

ESTONIA

FINLAND

FRANCE

GEORGIA

GERMANY

GREECE

GUATEMALA

HUNGARY

ICELAND

IN AFRICA

IN ASIA

IN AUSTRALIA

IN EUROPE

IN NORTH AMERICA

IN SOUTH AMERICA

INDIA

INDONESIA

IRELAND

ISRAEL

ITALY

JAPAN

KAZAKHSTAN

KENYA

KOSOVO

LATVIA

LIBYA

LITHUANIA

LUXEMBOURG

MALAYSIA

MALTA

MEXICO

MOLDOVA

MONTENEGRO

MOROCCO

NETHERLANDS

NEW ZEALAND

NIGERIA

NORWAY

PAKISTAN

PANAMA

PERU

PHILIPPINES

POLAND

PORTUGAL

QATAR

ROMANIA

RUSSIA

SAUDI ARABIA

SERBIA

SINGAPORE

SLOVAKIA

SLOVENIA

SOUTH AFRICA

SOUTH KOREA

SPAIN

SWEDEN

SWITZERLAND

THAILAND

TUNISIA

TURKEY

UAE

UK

UKRAINE

URUGUAY

USA

UZBEKISTAN

VIETNAM

LOGIN

How to Host Milvus Vector Database on a Dedicated Server

Everyone is building AI applications right now. But if you’ve ever deployed a RAG (Retrieval-Augmented Generation) app using cloud vector databases like Pinecone or Weaviate Cloud, you’ve likely run into two massive walls: cost and data privacy.

As your dataset grows from thousands to millions of vectors, those cloud bills start looking like a mortgage payment. Plus, do you really want to send your sensitive company data, financial records, legal docs, or proprietary code to a public cloud API?

The solution is simple: Bring it home.

In this guide, I’m going to walk you through hosting Milvus, the world’s most advanced open-source vector database, right on a dedicated server. We are going to build a high-performance, private, and cost-effective infrastructure for your AI.

Let’s get technical.

What You'll Learn

Why Bare Metal for Vector Search?

Before we type a single command, you need to understand why we are doing this. Vector searches are computationally expensive. They require:

  • Massive RAM: Vector indexes (like HNSW) live in memory for speed.

  • Fast Storage: When RAM fills up, you need NVMe SSDs to swap data instantly.

  • Dedicated CPU Cycles: Indexing millions of vectors will choke a shared vCPU on a standard VPS.

A dedicated server gives you raw, unshared power. No "noisy neighbors" slowing down your AI's response time.

The Hardware You Need

For a production-ready Milvus setup, don't skimp on RAM. Here is my recommended baseline:

  • CPU: At least 8 Cores (Intel Xeon or AMD EPYC ideally).

  • RAM: 32GB minimum (64GB+ recommended for datasets over 10M vectors).

  • Storage: Enterprise NVMe SSD (Avoid HDDs; they are too slow for vector retrieval).

  • OS: Ubuntu 24.04 LTS or Debian 12.

Pro Tip: If you are looking for a server that handles this workload without breaking the bank, check out the High-RAM instances at BytesRack. We tune our hardware specifically for high-throughput IO tasks like this.

Step 1: Preparing the Environment

We will use Docker Compose to deploy Milvus. It’s the cleanest way to manage the database along with its dependencies (etcd for metadata and MinIO for object storage) without polluting your host OS.

First, SSH into your server and update your package lists.

bash
sudo apt update && sudo apt upgrade -y

Now, let's install the Docker engine. If you already have Docker, skip this.

bash
 
# Install required certificates
sudo apt install -y ca-certificates curl gnupg

# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker and Compose
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
                                        

Verify that Docker is running:

bash
sudo docker ps

Step 2: Configuring Milvus (Standalone Mode)

Milvus runs in two modes: Standalone (everything in one container) and Cluster (distributed across multiple nodes). For 99% of use cases—including serving RAG apps to thousands of users—Standalone mode on a powerful dedicated server is more than enough.

Create a directory for your project:

bash
mkdir milvus-docker && cd milvus-docker

Now, download the official Docker Compose configuration file for Milvus.

bash
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml -O docker-compose.yml

Note: I am using version v2.4.0 here. Always check for the latest stable release if you are reading this later.

The Secret Sauce: Optimization

Don't just run the default file. We want to ensure Milvus has persistent storage so you don't lose data if you restart the container.

Open the file:

bash
nano docker-compose.yml

Check the volumes section. Ensure that the paths /var/lib/milvus, /var/lib/etcd, and /var/lib/minio are mapped correctly. On a server, if you have a secondary NVMe drive mounted (e.g., at /mnt/nvme), change the volume mapping to point there for maximum speed.

Example:

yaml
 
volumes:
  - /mnt/nvme/milvus/db:/var/lib/milvus
  - /mnt/nvme/milvus/etcd:/var/lib/etcd
  - /mnt/nvme/milvus/minio:/var/lib/minio
                                        

Save and exit (Ctrl+O, Enter, Ctrl+X).

Step 3: Launching the Vector Database

This is the easy part. Spin it up.

bash
sudo docker compose up -d

Docker will pull the images and start three containers:

  • milvus-standalone: The core vector engine.

  • milvus-etcd: Stores metadata and coordinates processes.

  • milvus-minio: Stores the actual data logs and index files.

Check if everything is healthy:

bash
sudo docker compose ps

You should see all three with a status of Up.

Step 4: Installing "Attu" (The Management GUI)

Managing a vector DB via command line is a pain. Attu is an amazing open-source administration GUI for Milvus. Let's add it to our stack.

Run this command to start Attu on port 8000:

bash
 
sudo docker run -d --name attu \
-p 8000:3000 \
-e MILVUS_URL=YOUR_SERVER_IP:19530 \
zilliz/attu:latest
                                        

(Replace YOUR_SERVER_IP with your actual server IP).

Now, open your browser and go to http://:8000. You will see a dashboard where you can view collections, check vector counts, and monitor query performance.

Step 5: Testing the Connection (The "Hello World" of AI)

Let's prove this works. We will use a simple Python script to connect to your new server, create a collection, and insert some random vectors.

First, install the Python SDK on your local machine (not the server):

bash
pip install pymilvus

Create a file named test_milvus.py:

python
 
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
import random

# 1. Connect to your server
connections.connect("default", host="YOUR_SERVER_IP", port="19530")

# 2. Define a schema
fields = [
    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields, "Hello BytesRack AI")

# 3. Create collection
hello_milvus = Collection("hello_milvus", schema)

# 4. Insert dummy data
import random
entities = [
    [i for i in range(1000)], # pk
    [[random.random() for _ in range(128)] for _ in range(1000)] # vectors
]
insert_result = hello_milvus.insert(entities)
hello_milvus.flush()

print(f"Success! Inserted {hello_milvus.num_entities} vectors into your private server.")
                                        

Run it. If you see the success message, congratulations! You just bypassed the cloud giants and built your own AI infrastructure.

Why This Matters for Your Business

By moving to a dedicated server, you have achieved three things:

  • Data Sovereignty: Your data never leaves a server you control.

  • Predictable Billing: Whether you run 10 queries or 10 million, your server cost stays the same.

  • Latency Reduction: Local network speeds on bare metal will always beat shared cloud API latency.

Ready to Scale?

If you are serious about AI, you need hardware that can keep up.

At BytesRack, we specialize in high-performance dedicated servers tailored for AI workloads. Whether you need massive RAM for vector storage or GPU power for inference, we have the metal you need to build the future.

👉 Explore High-Performance AI Servers at BytesRack Today

Discover BytesRack Dedicated Server Locations

BytesRack servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.