部署本地LLM

本地大模型Docker部署指南：打造你的私人AI助手

前言

在AI技术日益普及的今天，拥有一个私人的AI助手不再是遥不可及的梦想。通过Docker部署本地大语言模型，我们可以在保护隐私的同时，享受AI带来的便利。本文将详细介绍如何使用Docker在本地部署大语言模型，并通过命令行进行交互，让你轻松拥有属于自己的AI对话伙伴。

为什么选择本地部署？

🔒 隐私保护

数据不出本地：所有对话内容都在本地处理
无网络依赖：离线环境下也能正常使用
完全控制：对数据处理过程有完全的掌控权

💰 成本优势

零API费用：无需支付云端API调用费用
长期经济：一次部署，长期使用
资源自控：根据需求灵活调整资源分配

⚡ 自主可控

定制化配置：可以根据需求调整模型参数
版本控制：可以选择和固定特定的模型版本
服务稳定：不受外部服务状态影响

环境准备

系统要求

最低配置

CPU: 4核心以上
内存: 16GB RAM
存储: 50GB可用空间
GPU: 可选，但推荐NVIDIA GPU

Docker环境验证

在开始之前，让我们确认Docker环境是否正常：

# 检查Docker版本
docker --version
# 预期输出: Docker version 24.0.x, build xxxxx

# 检查Docker服务状态
docker info
# 确认Docker daemon正在运行

# 测试Docker功能
docker run hello-world
# 应该看到 "Hello from Docker!" 消息

如果有NVIDIA GPU，还需要验证Docker对GPU的支持：

# 检查NVIDIA Docker支持
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi

方案一：使用Ollama进行部署

Ollama是目前最受欢迎的本地大模型管理工具，提供了简洁的Docker部署方案。

🚀 快速部署

1. 拉取Ollama Docker镜像

# 拉取最新的Ollama镜像
docker pull ollama/ollama:latest

# 查看镜像信息
docker images | grep ollama

2. 启动Ollama服务

# CPU版本部署
docker run -d \
  --name ollama-server \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -v /tmp:/tmp \
  --restart unless-stopped \
  ollama/ollama

# GPU版本部署（推荐）
docker run -d \
  --name ollama-server \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  -v /tmp:/tmp \
  --restart unless-stopped \
  ollama/ollama

3. 验证服务状态

# 检查容器状态
docker ps | grep ollama

# 查看服务日志
docker logs ollama-server

# 测试API连接
curl http://localhost:11434/api/version

📥 下载和管理模型

进入容器环境

# 进入Ollama容器
docker exec -it ollama-server bash

下载常用模型

# 在容器内执行以下命令

# 下载Llama 3.1 8B模型（推荐入门）
ollama pull llama3.1:8b

# 下载DeepSeek Coder模型（代码生成专用）
ollama pull deepseek-coder:6.7b

# 下载Qwen2.5模型（中文友好）
ollama pull qwen2.5:7b

# 下载Gemma2模型（Google开源）
ollama pull gemma2:9b

# 查看已下载的模型
ollama list

模型信息展示

# 查看模型详细信息
$ ollama list
NAME                    ID              SIZE    MODIFIED        
llama3.1:8b            42182c40c896    4.7GB   2 minutes ago   
deepseek-coder:6.7b    8b09afb70a0e    3.8GB   5 minutes ago   
qwen2.5:7b             845dbda0c5b4    4.3GB   8 minutes ago

💬 命令行交互体验

1. 基础对话

# 启动与Llama3.1的对话
docker exec -it ollama-server ollama run llama3.1:8b

# 交互示例
>>> 你好，请介绍一下自己
我是Llama 3.1，一个由Meta开发的大型语言模型。我可以帮助您：

1. 回答各种问题
2. 协助创作和写作
3. 进行代码编程
4. 翻译和语言学习
5. 分析和总结文本

有什么我可以帮助您的吗？

>>> 用Python写一个计算斐波那契数列的函数

当然！这里是一个高效的斐波那契数列计算函数：

```python
def fibonacci(n):
    """
    计算斐波那契数列的第n项
    使用动态规划方法，时间复杂度O(n)，空间复杂度O(1)
    """
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    
    a, b = 0, 1
    for i in range(2, n + 1):
        a, b = b, a + b
    
    return b

# 使用示例
for i in range(10):
    print(f"F({i}) = {fibonacci(i)}")

这个函数的优点：

时间复杂度：O(n)
空间复杂度：O(1)
避免了递归导致的重复计算

/bye

#### 2. 代码专家对话
```bash
# 使用DeepSeek Coder进行代码相关对话
docker exec -it ollama-server ollama run deepseek-coder:6.7b

# 代码生成示例
>>> 写一个FastAPI的用户认证中间件

```python
from fastapi import FastAPI, HTTPException, Depends, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.middleware.base import BaseHTTPMiddleware
from fastapi.requests import Request
from fastapi.responses import JSONResponse
import jwt
from datetime import datetime, timedelta
import logging

# 配置
SECRET_KEY = "your-secret-key-here"
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 30

# 安全方案
security = HTTPBearer()

class AuthenticationMiddleware(BaseHTTPMiddleware):
    """用户认证中间件"""
    
    def __init__(self, app, exclude_paths=None):
        super().__init__(app)
        self.exclude_paths = exclude_paths or ["/docs", "/openapi.json", "/login", "/register"]
    
    async def dispatch(self, request: Request, call_next):
        # 检查是否为排除路径
        if request.url.path in self.exclude_paths:
            return await call_next(request)
        
        # 检查Authorization头
        auth_header = request.headers.get("Authorization")
        if not auth_header or not auth_header.startswith("Bearer "):
            return JSONResponse(
                status_code=status.HTTP_401_UNAUTHORIZED,
                content={"detail": "Missing or invalid authorization header"}
            )
        
        # 提取token
        token = auth_header.split(" ")[1]
        
        try:
            # 验证JWT token
            payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
            user_id = payload.get("sub")
            
            if user_id is None:
                raise HTTPException(status_code=401, detail="Invalid token")
            
            # 将用户信息添加到请求状态
            request.state.user_id = user_id
            request.state.user_email = payload.get("email")
            
        except jwt.ExpiredSignatureError:
            return JSONResponse(
                status_code=status.HTTP_401_UNAUTHORIZED,
                content={"detail": "Token has expired"}
            )
        except jwt.JWTError:
            return JSONResponse(
                status_code=status.HTTP_401_UNAUTHORIZED,
                content={"detail": "Invalid token"}
            )
        
        return await call_next(request)

# 使用示例
app = FastAPI()

# 添加认证中间件
app.add_middleware(
    AuthenticationMiddleware,
    exclude_paths=["/docs", "/openapi.json", "/login", "/register", "/health"]
)

@app.get("/protected")
async def protected_route(request: Request):
    return {
        "message": "This is a protected route",
        "user_id": request.state.user_id,
        "user_email": request.state.user_email
    }

这个中间件有什么改进建议吗？

这个认证中间件已经很不错了，但还有一些改进空间：

缓存优化：

import redis
from functools import lru_cache

class AuthenticationMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, exclude_paths=None, redis_client=None):
        super().__init__(app)
        self.exclude_paths = exclude_paths or []
        self.redis_client = redis_client
        
    @lru_cache(maxsize=1000)
    def is_excluded_path(self, path: str) -> bool:
        """缓存路径检查结果"""
        return any(path.startswith(excluded) for excluded in self.exclude_paths)

用户信息缓存：

async def get_user_from_cache(self, user_id: str):
    """从Redis缓存获取用户信息"""
    if self.redis_client:
        cached_user = await self.redis_client.get(f"user:{user_id}")
        if cached_user:
            return json.loads(cached_user)
    return None

更灵活的路径匹配：

import re

def matches_excluded_pattern(self, path: str) -> bool:
    """支持正则表达式的路径匹配"""
    patterns = [
        r"^/api/v\d+/public/.*",  # 公共API
        r"^/health/?$",           # 健康检查
        r"^/metrics/?$",          # 监控指标
    ]
    return any(re.match(pattern, path) for pattern in patterns)

这些改进可以提升性能和灵活性。

/bye

#### 3. 中文对话体验
```bash
# 使用Qwen2.5进行中文对话
docker exec -it ollama-server ollama run qwen2.5:7b

>>> 请帮我写一份技术方案：如何在微服务架构中实现分布式锁

# 微服务架构中的分布式锁技术方案

## 1. 需求分析

### 1.1 应用场景
- **库存扣减**：防止超卖问题
- **订单号生成**：确保全局唯一性
- **定时任务调度**：避免重复执行
- **缓存更新**：防止缓存击穿

### 1.2 技术要求
- **互斥性**：同一时间只能有一个客户端持有锁
- **安全性**：只有锁的持有者才能释放锁
- **活性**：避免死锁，具备锁超时机制
- **高可用**：锁服务本身要高可用
- **性能**：低延迟，高并发支持

## 2. 技术方案对比

### 2.1 基于Redis的分布式锁

**优点：**
- 性能优异，延迟低
- 实现简单，支持过期时间
- 社区生态完善

**实现示例：**
```java
@Component
public class RedisDistributedLock {
    
    @Autowired
    private StringRedisTemplate redisTemplate;
    
    public boolean tryLock(String key, String requestId, int expireTime) {
        String result = redisTemplate.execute((RedisCallback<String>) connection -> {
            Jedis jedis = (Jedis) connection.getNativeConnection();
            return jedis.set(key, requestId, "NX", "PX", expireTime);
        });
        return "OK".equals(result);
    }
    
    public boolean releaseLock(String key, String requestId) {
        String luaScript = 
            "if redis.call('get', KEYS[1]) == ARGV[1] then " +
            "return redis.call('del', KEYS[1]) " +
            "else return 0 end";
        
        Object result = redisTemplate.execute((RedisCallback<Object>) connection -> {
            Jedis jedis = (Jedis) connection.getNativeConnection();
            return jedis.eval(luaScript, Collections.singletonList(key), 
                            Collections.singletonList(requestId));
        });
        return "1".equals(result.toString());
    }
}

2.2 基于Zookeeper的分布式锁

优点：

强一致性保证
支持可重入锁
自动故障检测

实现框架：

<dependency>
    <groupId>org.apache.curator</groupId>
    <artifactId>curator-recipes</artifactId>
    <version>5.4.0</version>
</dependency>

@Component
public class ZookeeperDistributedLock {
    
    private CuratorFramework client;
    
    public void executeWithLock(String lockPath, Runnable task) {
        InterProcessMutex lock = new InterProcessMutex(client, lockPath);
        try {
            if (lock.acquire(30, TimeUnit.SECONDS)) {
                task.run();
            }
        } catch (Exception e) {
            log.error("获取锁失败", e);
        } finally {
            try {
                lock.release();
            } catch (Exception e) {
                log.error("释放锁失败", e);
            }
        }
    }
}

3. 推荐实现方案

3.1 Redis + Redisson方案（推荐）

@Configuration
public class DistributedLockConfig {
    
    @Bean
    public RedissonClient redissonClient() {
        Config config = new Config();
        config.useSingleServer()
              .setAddress("redis://localhost:6379")
              .setPassword("your-password")
              .setConnectionPoolSize(50)
              .setConnectionMinimumIdleSize(10);
        return Redisson.create(config);
    }
}

@Service
public class BusinessService {
    
    @Autowired
    private RedissonClient redissonClient;
    
    public void processOrder(String orderId) {
        String lockKey = "order:lock:" + orderId;
        RLock lock = redissonClient.getLock(lockKey);
        
        try {
            // 尝试获取锁，最多等待3秒，锁定10秒后自动释放
            if (lock.tryLock(3, 10, TimeUnit.SECONDS)) {
                // 业务逻辑处理
                doProcessOrder(orderId);
            } else {
                throw new RuntimeException("获取锁超时");
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException("获取锁被中断", e);
        } finally {
            if (lock.isHeldByCurrentThread()) {
                lock.unlock();
            }
        }
    }
}

3.2 注解式分布式锁

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface DistributedLock {
    String key();
    long waitTime() default 3000;
    long leaseTime() default 10000;
}

@Aspect
@Component
public class DistributedLockAspect {
    
    @Around("@annotation(distributedLock)")
    public Object around(ProceedingJoinPoint joinPoint, DistributedLock distributedLock) 
            throws Throwable {
        String lockKey = parseLockKey(distributedLock.key(), joinPoint);
        RLock lock = redissonClient.getLock(lockKey);
        
        try {
            if (lock.tryLock(distributedLock.waitTime(), 
                           distributedLock.leaseTime(), TimeUnit.MILLISECONDS)) {
                return joinPoint.proceed();
            } else {
                throw new RuntimeException("获取分布式锁失败");
            }
        } finally {
            if (lock.isHeldByCurrentThread()) {
                lock.unlock();
            }
        }
    }
}

// 使用示例
@DistributedLock(key = "'inventory:' + #productId", waitTime = 5000, leaseTime = 15000)
public void updateInventory(String productId, int quantity) {
    // 库存更新逻辑
}

4. 最佳实践

4.1 锁的粒度控制

细粒度锁：提高并发性能
业务隔离：不同业务使用不同锁前缀
避免热点：合理设计锁的Key

4.2 异常处理

@Component
public class SafeDistributedLock {
    
    public <T> T executeWithLock(String lockKey, Supplier<T> supplier, 
                                long waitTime, long leaseTime) {
        RLock lock = redissonClient.getLock(lockKey);
        try {
            if (lock.tryLock(waitTime, leaseTime, TimeUnit.MILLISECONDS)) {
                return supplier.get();
            } else {
                throw new BusinessException("系统繁忙，请稍后重试");
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new BusinessException("操作被中断");
        } catch (Exception e) {
            log.error("分布式锁执行异常, lockKey: {}", lockKey, e);
            throw new BusinessException("系统异常");
        } finally {
            try {
                if (lock.isHeldByCurrentThread()) {
                    lock.unlock();
                }
            } catch (Exception e) {
                log.error("释放锁异常, lockKey: {}", lockKey, e);
            }
        }
    }
}

这个方案在生产环境中已经得到验证，性能和稳定性都很好。

太详细了，谢谢！还有其他问题也可以随时问我。

/bye

## 方案二：使用Text Generation WebUI

Text Generation WebUI是另一个流行的本地大模型部署方案，提供了更丰富的配置选项。

### 🛠️ 部署步骤

#### 1. 创建部署目录
```bash
# 创建工作目录
mkdir -p ~/text-generation-webui
cd ~/text-generation-webui

# 创建模型存储目录
mkdir -p models downloads

2. 创建Docker Compose配置

# docker-compose.yml
version: '3.8'

services:
  text-generation-webui:
    image: atinoda/text-generation-webui:latest-cuda
    container_name: textgen-webui
    ports:
      - "7860:7860"
      - "5000:5000"  # API端口
    volumes:
      - ./models:/app/text-generation-webui/models
      - ./downloads:/app/text-generation-webui/downloads
      - ./characters:/app/text-generation-webui/characters
      - ./presets:/app/text-generation-webui/presets
    environment:
      - CLI_ARGS=--api --listen --verbose
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

3. 启动服务

# 启动服务
docker-compose up -d

# 查看启动日志
docker-compose logs -f text-generation-webui

# 检查服务状态
curl http://localhost:5000/api/v1/model

📥 模型下载和管理

下载模型文件

# 进入容器
docker exec -it textgen-webui bash

# 使用huggingface-cli下载模型
cd /app/text-generation-webui
python download-model.py microsoft/DialoGPT-medium

# 或者下载GGUF格式模型
python download-model.py TheBloke/Llama-2-7B-Chat-GGUF

自动下载脚本

#!/bin/bash
# download_models.sh

models=(
    "microsoft/DialoGPT-medium"
    "meta-llama/Llama-2-7b-chat-hf"
    "deepseek-ai/deepseek-coder-6.7b-instruct"
)

for model in "${models[@]}"; do
    echo "Downloading $model..."
    docker exec textgen-webui python download-model.py "$model"
done

echo "All models downloaded!"

🔧 API调用示例

Python客户端

#!/usr/bin/env python3
# chat_client.py

import requests
import json
import sys

class TextGenClient:
    def __init__(self, base_url="http://localhost:5000"):
        self.base_url = base_url
        self.session = requests.Session()
    
    def load_model(self, model_name):
        """加载指定模型"""
        payload = {
            "model_name": model_name
        }
        response = self.session.post(
            f"{self.base_url}/api/v1/model",
            json=payload
        )
        return response.json()
    
    def chat(self, message, history=None):
        """发送聊天消息"""
        if history is None:
            history = []
        
        payload = {
            "user_input": message,
            "history": history,
            "mode": "chat",
            "character": "Assistant"
        }
        
        response = self.session.post(
            f"{self.base_url}/api/v1/chat",
            json=payload
        )
        
        if response.status_code == 200:
            data = response.json()
            return data.get("results", [{}])[0].get("history", {})
        else:
            return {"error": f"Request failed: {response.status_code}"}
    
    def generate(self, prompt, max_tokens=512):
        """生成文本"""
        payload = {
            "prompt": prompt,
            "max_new_tokens": max_tokens,
            "temperature": 0.7,
            "top_p": 0.9,
            "do_sample": True
        }
        
        response = self.session.post(
            f"{self.base_url}/api/v1/generate",
            json=payload
        )
        
        if response.status_code == 200:
            return response.json()["results"][0]["text"]
        else:
            return f"Error: {response.status_code}"

def main():
    client = TextGenClient()
    
    print("🤖 本地AI助手已启动!")
    print("输入 'quit' 或 'exit' 退出程序")
    print("输入 'clear' 清空对话历史")
    print("-" * 50)
    
    history = []
    
    while True:
        try:
            user_input = input("\n🧑 您: ").strip()
            
            if user_input.lower() in ['quit', 'exit', 'bye']:
                print("👋 再见!")
                break
            
            if user_input.lower() == 'clear':
                history = []
                print("✅ 对话历史已清空")
                continue
            
            if not user_input:
                continue
            
            print("🤖 AI: ", end="", flush=True)
            
            # 发送消息并获取回复
            result = client.chat(user_input, history)
            
            if "error" in result:
                print(f"❌ 错误: {result['error']}")
            else:
                # 更新对话历史
                history = result.get("history", [])
                
                # 显示AI回复
                if history:
                    ai_response = history[-1][1] if len(history[-1]) > 1 else "抱歉，我没有理解您的问题。"
                    print(ai_response)
                else:
                    print("抱歉，服务暂时不可用。")
        
        except KeyboardInterrupt:
            print("\n👋 再见!")
            break
        except Exception as e:
            print(f"\n❌ 发生错误: {e}")

if __name__ == "__main__":
    main()

使用客户端

# 安装依赖
pip install requests

# 运行聊天客户端
python chat_client.py

# 示例对话
🤖 本地AI助手已启动!
输入 'quit' 或 'exit' 退出程序
输入 'clear' 清空对话历史
--------------------------------------------------

🧑 您: 你好，请介绍一下你自己

🤖 AI: 你好！我是一个本地部署的AI助手，基于大型语言模型构建。我可以帮助您：

1. 回答各种问题和提供信息
2. 协助编程和代码调试
3. 帮助写作和创作
4. 进行翻译和语言学习
5. 分析和总结文本内容

我运行在您的本地环境中，这意味着：
- 您的数据完全私密，不会上传到云端
- 响应速度取决于您的硬件配置
- 可以离线使用，无需网络连接

有什么我可以帮助您的吗？

🧑 您: 写一个Python装饰器用于记录函数执行时间

🤖 AI: 这里是一个功能完整的Python装饰器，用于记录函数执行时间：

```python
import time
import functools
import logging

# 配置日志
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

def timing_decorator(func=None, *, unit='seconds', log_level='INFO'):
    """
    记录函数执行时间的装饰器
    
    Args:
        func: 被装饰的函数
        unit: 时间单位 ('seconds', 'milliseconds', 'microseconds')
        log_level: 日志级别 ('DEBUG', 'INFO', 'WARNING', 'ERROR')
    """
    def decorator(f):
        @functools.wraps(f)
        def wrapper(*args, **kwargs):
            start_time = time.perf_counter()
            
            try:
                result = f(*args, **kwargs)
                success = True
            except Exception as e:
                result = e
                success = False
            
            end_time = time.perf_counter()
            execution_time = end_time - start_time
            
            # 转换时间单位
            if unit == 'milliseconds':
                execution_time *= 1000
                time_unit = 'ms'
            elif unit == 'microseconds':
                execution_time *= 1000000
                time_unit = 'μs'
            else:
                time_unit = 's'
            
            # 记录日志
            status = "成功" if success else "失败"
            log_message = f"函数 {f.__name__} 执行{status} - 耗时: {execution_time:.4f} {time_unit}"
            
            # 根据指定的日志级别记录
            log_func = getattr(logger, log_level.lower(), logger.info)
            log_func(log_message)
            
            if not success:
                raise result
            
            return result
        
        return wrapper
    
    # 支持带参数和不带参数的使用方式
    if func is None:
        return decorator
    else:
        return decorator(func)

# 使用示例

# 1. 简单使用
@timing_decorator
def simple_function():
    time.sleep(1)
    return "Hello World"

# 2. 指定时间单位
@timing_decorator(unit='milliseconds')
def compute_heavy_task():
    return sum(i**2 for i in range(1000000))

# 3. 指定日志级别
@timing_decorator(unit='microseconds', log_level='DEBUG')
def quick_function():
    return [i for i in range(100)]

# 4. 类方法装饰
class Calculator:
    @timing_decorator(unit='milliseconds')
    def fibonacci(self, n):
        if n <= 1:
            return n
        return self.fibonacci(n-1) + self.fibonacci(n-2)

# 5. 高级版本：可配置的装饰器
def advanced_timing(
    enabled=True, 
    unit='seconds', 
    log_level='INFO',
    custom_message=None,
    store_results=False
):
    """
    高级计时装饰器
    
    Args:
        enabled: 是否启用计时
        unit: 时间单位
        log_level: 日志级别
        custom_message: 自定义日志消息
        store_results: 是否存储执行结果到全局变量
    """
    execution_results = []
    
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if not enabled:
                return func(*args, **kwargs)
            
            start_time = time.perf_counter()
            
            try:
                result = func(*args, **kwargs)
                success = True
                error = None
            except Exception as e:
                result = None
                success = False
                error = str(e)
            
            end_time = time.perf_counter()
            execution_time = end_time - start_time
            
            # 转换时间单位
            time_conversions = {
                'milliseconds': (1000, 'ms'),
                'microseconds': (1000000, 'μs'),
                'seconds': (1, 's')
            }
            
            multiplier, time_unit = time_conversions.get(unit, (1, 's'))
            converted_time = execution_time * multiplier
            
            # 构建日志消息
            if custom_message:
                log_message = custom_message.format(
                    func_name=func.__name__,
                    time=converted_time,
                    unit=time_unit,
                    status="成功" if success else "失败"
                )
            else:
                status = "成功" if success else f"失败({error})"
                log_message = f"[{func.__name__}] 执行{status} - 耗时: {converted_time:.4f} {time_unit}"
            
            # 记录日志
            log_func = getattr(logger, log_level.lower(), logger.info)
            log_func(log_message)
            
            # 存储结果
            if store_results:
                execution_results.append({
                    'function': func.__name__,
                    'execution_time': converted_time,
                    'unit': time_unit,
                    'success': success,
                    'error': error,
                    'timestamp': time.time()
                })
            
            if not success:
                raise Exception(error)
            
            return result
        
        # 添加获取执行结果的方法
        wrapper.get_execution_results = lambda: execution_results.copy()
        wrapper.clear_execution_results = lambda: execution_results.clear()
        
        return wrapper
    
    return decorator

# 使用高级版本示例
@advanced_timing(
    unit='milliseconds',
    custom_message="⚡ {func_name} 完成 - 用时 {time:.2f}{unit} [{status}]",
    store_results=True
)
def data_processing_task():
    # 模拟数据处理
    time.sleep(0.5)
    return {"processed": 1000, "errors": 0}

if __name__ == "__main__":
    print("🧪 测试计时装饰器...")
    
    # 测试基本功能
    simple_function()
    compute_heavy_task()
    
    # 测试类方法
    calc = Calculator()
    result = calc.fibonacci(10)
    print(f"Fibonacci(10) = {result}")
    
    # 测试高级版本
    data_processing_task()
    
    # 查看执行结果
    results = data_processing_task.get_execution_results()
    print(f"📊 执行历史: {results}")

这个装饰器提供了多种配置选项，你可以根据需要选择合适的版本使用。

🧑 您: clear

✅ 对话历史已清空

🧑 您: quit

👋 再见!

## 方案三：使用LocalAI

LocalAI是一个完全兼容OpenAI API的本地推理服务，支持多种模型格式。

### 🚀 快速部署

#### 1. 创建配置文件
```bash
# 创建工作目录
mkdir -p ~/localai/{models,config}
cd ~/localai

2. 创建Docker Compose配置

# docker-compose.yml
version: '3.6'

services:
  localai:
    image: quay.io/go-skynet/local-ai:v2.17.1-cublas-cuda12
    container_name: localai
    ports:
      - "8080:8080"
    environment:
      - DEBUG=true
      - MODELS_PATH=/app/models
      - CONTEXT_SIZE=1024
      - THREADS=4
    volumes:
      - ./models:/app/models:cached
      - ./config:/app/config:cached
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

3. 下载模型配置

# 下载预配置的模型
curl -O https://raw.githubusercontent.com/go-skynet/LocalAI/master/examples/configurations/llama.yaml
mv llama.yaml config/

# 下载模型文件
cd models
wget https://huggingface.co/microsoft/DialoGPT-medium/resolve/main/pytorch_model.bin

4. 启动服务

# 启动LocalAI服务
docker-compose up -d

# 检查服务状态
curl http://localhost:8080/readiness

# 查看可用模型
curl http://localhost:8080/v1/models

💬 OpenAI兼容客户端

Python客户端

#!/usr/bin/env python3
# localai_chat.py

import openai
import json
import sys
from datetime import datetime

class LocalAIChat:
    def __init__(self, base_url="http://localhost:8080/v1", api_key="not-needed"):
        self.client = openai.OpenAI(
            base_url=base_url,
            api_key=api_key
        )
        self.conversation_history = []
    
    def list_models(self):
        """列出可用模型"""
        try:
            models = self.client.models.list()
            return [model.id for model in models.data]
        except Exception as e:
            print(f"❌ 获取模型列表失败: {e}")
            return []
    
    def chat(self, message, model="llama", system_prompt=None):
        """发送聊天消息"""
        try:
            # 构建消息历史
            messages = []
            
            if system_prompt:
                messages.append({"role": "system", "content": system_prompt})
            
            # 添加历史对话
            messages.extend(self.conversation_history)
            
            # 添加当前用户消息
            messages.append({"role": "user", "content": message})
            
            # 调用API
            response = self.client.chat.completions.create(
                model=model,
                messages=messages,
                max_tokens=512,
                temperature=0.7,
                stream=False
            )
            
            assistant_message = response.choices[0].message.content
            
            # 更新对话历史
            self.conversation_history.append({"role": "user", "content": message})
            self.conversation_history.append({"role": "assistant", "content": assistant_message})
            
            # 保持历史长度在合理范围内
            if len(self.conversation_history) > 20:
                self.conversation_history = self.conversation_history[-20:]
            
            return assistant_message
            
        except Exception as e:
            return f"❌ 请求失败: {e}"
    
    def clear_history(self):
        """清空对话历史"""
        self.conversation_history = []
    
    def save_conversation(self, filename=None):
        """保存对话到文件"""
        if not filename:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"conversation_{timestamp}.json"
        
        try:
            with open(filename, 'w', encoding='utf-8') as f:
                json.dump(self.conversation_history, f, ensure_ascii=False, indent=2)
            return f"✅ 对话已保存到 {filename}"
        except Exception as e:
            return f"❌ 保存失败: {e}"
    
    def load_conversation(self, filename):
        """从文件加载对话"""
        try:
            with open(filename, 'r', encoding='utf-8') as f:
                self.conversation_history = json.load(f)
            return f"✅ 对话已从 {filename} 加载"
        except Exception as e:
            return f"❌ 加载失败: {e}"

def print_help():
    """显示帮助信息"""
    help_text = """
🤖 LocalAI 聊天客户端 - 可用命令:

基本命令:
  help          - 显示此帮助信息
  models        - 列出可用模型
  clear         - 清空对话历史
  quit/exit     - 退出程序

文件操作:
  save [filename]    - 保存对话 (可选指定文件名)
  load <filename>    - 加载之前的对话

模型切换:
  use <model_name>   - 切换使用的模型

其他:
  /system <prompt>   - 设置系统提示词
  /status           - 显示当前状态
    """
    print(help_text)

def main():
    chat = LocalAIChat()
    current_model = "llama"
    system_prompt = None
    
    print("🚀 LocalAI 聊天客户端已启动!")
    print("输入 'help' 查看可用命令")
    print("-" * 60)
    
    # 检查可用模型
    print("🔍 检查可用模型...")
    available_models = chat.list_models()
    if available_models:
        print(f"📋 可用模型: {', '.join(available_models)}")
        current_model = available_models[0]
        print(f"🎯 当前使用模型: {current_model}")
    else:
        print("⚠️  未找到可用模型，请检查LocalAI服务状态")
    
    print("-" * 60)
    
    while True:
        try:
            user_input = input(f"\n🧑 您 [{current_model}]: ").strip()
            
            if not user_input:
                continue
            
            # 处理命令
            if user_input.lower() in ['quit', 'exit', 'bye']:
                print("👋 再见!")
                break
            
            elif user_input.lower() == 'help':
                print_help()
                continue
            
            elif user_input.lower() == 'models':
                models = chat.list_models()
                if models:
                    print(f"📋 可用模型: {', '.join(models)}")
                else:
                    print("❌ 无法获取模型列表")
                continue
            
            elif user_input.lower() == 'clear':
                chat.clear_history()
                print("✅ 对话历史已清空")
                continue
            
            elif user_input.lower().startswith('save'):
                parts = user_input.split(' ', 1)
                filename = parts[1] if len(parts) > 1 else None
                result = chat.save_conversation(filename)
                print(result)
                continue
            
            elif user_input.lower().startswith('load'):
                parts = user_input.split(' ', 1)
                if len(parts) > 1:
                    result = chat.load_conversation(parts[1])
                    print(result)
                else:
                    print("❌ 请指定要加载的文件名")
                continue
            
            elif user_input.lower().startswith('use'):
                parts = user_input.split(' ', 1)
                if len(parts) > 1:
                    new_model = parts[1]
                    if new_model in available_models:
                        current_model = new_model
                        print(f"✅ 已切换到模型: {current_model}")
                    else:
                        print(f"❌ 模型 '{new_model}' 不可用")
                else:
                    print("❌ 请指定模型名称")
                continue
            
            elif user_input.startswith('/system'):
                system_prompt = user_input[7:].strip()
                print(f"✅ 系统提示词已设置: {system_prompt}")
                continue
            
            elif user_input == '/status':
                print(f"📊 当前状态:")
                print(f"   模型: {current_model}")
                print(f"   系统提示词: {system_prompt or '未设置'}")
                print(f"   对话历史长度: {len(chat.conversation_history)}")
                continue
            
            # 发送聊天消息
            print("🤖 AI: ", end="", flush=True)
            
            response = chat.chat(user_input, current_model, system_prompt)
            print(response)
        
        except KeyboardInterrupt:
            print("\n👋 再见!")
            break
        except Exception as e:
            print(f"\n❌ 发生错误: {e}")

if __name__ == "__main__":
    main()

使用示例

# 安装依赖
pip install openai

# 运行聊天客户端
python localai_chat.py

# 示例交互
🚀 LocalAI 聊天客户端已启动!
输入 'help' 查看可用命令
------------------------------------------------------------
🔍 检查可用模型...
📋 可用模型: llama, gpt-3.5-turbo, text-davinci-003
🎯 当前使用模型: llama
------------------------------------------------------------

🧑 您 [llama]: /system 你是一个专业的Python编程助手，请用简洁明了的方式回答问题

✅ 系统提示词已设置: 你是一个专业的Python编程助手，请用简洁明了的方式回答问题

🧑 您 [llama]: 如何在Python中创建一个单例模式？

🤖 AI: 这里是几种Python单例模式的实现方法：

## 1. 使用__new__方法（推荐）
```python
class Singleton:
    _instance = None
    
    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

# 使用
s1 = Singleton()
s2 = Singleton()
print(s1 is s2)  # True

2. 装饰器方式

def singleton(cls):
    instances = {}
    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance

@singleton
class MyClass:
    pass

3. 元类方式

class SingletonMeta(type):
    _instances = {}
    
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

class Singleton(metaclass=SingletonMeta):
    pass

推荐使用第一种方法，简单易懂且线程安全。

🧑 您 [llama]: save python_singleton_chat

✅ 对话已保存到 python_singleton_chat

🧑 您 [llama]: quit

👋 再见!

## 性能优化与监控

### 📊 资源监控脚本

```bash
#!/bin/bash
# monitor_resources.sh

echo "🔍 本地AI模型资源监控"
echo "================================"

while true; do
    clear
    echo "📅 时间: $(date)"
    echo "================================"
    
    # GPU监控
    if command -v nvidia-smi &> /dev/null; then
        echo "🖥️  GPU使用情况:"
        nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total,temperature.gpu --format=csv,noheader,nounits | \
        while IFS=, read -r index name util mem_used mem_total temp; do
            printf "GPU%s: %s | 使用率: %s%% | 显存: %s/%sMB | 温度: %s°C\n" \
                   "$index" "$name" "$util" "$mem_used" "$mem_total" "$temp"
        done
        echo ""
    fi
    
    # CPU和内存监控
    echo "💻 CPU和内存使用情况:"
    top -bn1 | head -20 | grep -E "(Cpu|Mem|swap)" | \
    sed 's/%Cpu(s):/CPU:/' | sed 's/KiB Mem :/内存:/' | sed 's/KiB Swap:/交换:'
    echo ""
    
    # Docker容器监控
    echo "🐳 Docker容器状态:"
    docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}" | \
    grep -E "(ollama|textgen|localai)"
    echo ""
    
    echo "按 Ctrl+C 退出监控"
    sleep 5
done

🎯 性能调优建议

1. 内存优化

# 增加swap空间（如果内存不足）
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# 永久启用swap
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

2. Docker优化

# 清理未使用的Docker资源
docker system prune -a -f

# 限制容器资源使用
docker run -d \
  --name ollama-server \
  --memory="16g" \
  --cpus="4.0" \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  ollama/ollama

3. 模型选择策略

# 根据硬件配置选择合适的模型
hardware_check() {
    total_mem=$(free -g | awk '/^Mem:/{print $2}')
    gpu_mem=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits | head -1)
    
    echo "💻 系统内存: ${total_mem}GB"
    echo "🖥️  GPU显存: ${gpu_mem}MB"
    
    if [ "$total_mem" -ge 32 ] && [ "$gpu_mem" -ge 12000 ]; then
        echo "✅ 推荐模型: llama3.1:8b, deepseek-coder:6.7b"
    elif [ "$total_mem" -ge 16 ] && [ "$gpu_mem" -ge 8000 ]; then
        echo "✅ 推荐模型: llama3.1:8b-q4, qwen2.5:7b-q4"
    else
        echo "✅ 推荐模型: llama3.1:8b-q8, phi3:mini"
    fi
}

hardware_check

故障排除

🔧 常见问题解决

1. 内存不足

# 症状：容器频繁重启或卡死
# 解决方案：
# 1. 使用量化版本模型
docker exec ollama-server ollama pull llama3.1:8b-q4_0

# 2. 增加swap空间
sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
sudo mkswap /swapfile
sudo swapon /swapfile

# 3. 限制模型并发数
echo "OLLAMA_NUM_PARALLEL=1" >> ~/.bashrc

2. GPU不被识别

# 检查NVIDIA驱动
nvidia-smi

# 安装nvidia-container-toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

3. 模型下载失败

# 使用镜像站点
export HF_ENDPOINT=https://hf-mirror.com

# 手动下载模型
wget -c https://hf-mirror.com/microsoft/DialoGPT-medium/resolve/main/pytorch_model.bin

# 使用代理下载
docker run --rm -e HTTP_PROXY=http://your-proxy:port \
  ollama/ollama pull llama3.1:8b

🛠️ 诊断脚本

#!/bin/bash
# diagnose.sh

echo "🔍 本地AI部署诊断工具"
echo "=========================="

# 检查系统资源
echo "1. 系统资源检查"
echo "内存: $(free -h | grep Mem | awk '{print $3"/"$2}')"
echo "磁盘: $(df -h / | tail -1 | awk '{print $3"/"$2" ("$5" used)"}')"

# 检查Docker
echo -e "\n2. Docker环境检查"
if command -v docker &> /dev/null; then
    echo "✅ Docker已安装: $(docker --version)"
    if docker info &> /dev/null; then
        echo "✅ Docker服务正常"
    else
        echo "❌ Docker服务异常"
    fi
else
    echo "❌ Docker未安装"
fi

# 检查GPU
echo -e "\n3. GPU环境检查"
if command -v nvidia-smi &> /dev/null; then
    echo "✅ NVIDIA驱动已安装"
    nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
    
    if docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi &> /dev/null; then
        echo "✅ Docker GPU支持正常"
    else
        echo "❌ Docker GPU支持异常"
    fi
else
    echo "⚠️  未检测到NVIDIA GPU或驱动"
fi

# 检查容器状态
echo -e "\n4. AI容器状态检查"
containers=("ollama-server" "textgen-webui" "localai")
for container in "${containers[@]}"; do
    if docker ps | grep -q "$container"; then
        echo "✅ $container 运行中"
    elif docker ps -a | grep -q "$container"; then
        echo "⚠️  $container 已停止"
    else
        echo "❌ $container 未找到"
    fi
done

# 检查端口占用
echo -e "\n5. 端口占用检查"
ports=(11434 7860 5000 8080)
for port in "${ports[@]}"; do
    if ss -tlnp | grep -q ":$port "; then
        echo "✅ 端口 $port 被占用"
    else
        echo "❌ 端口 $port 未被占用"
    fi
done

echo -e "\n诊断完成！"

总结与展望

通过本文的详细介绍，我们已经掌握了使用Docker部署本地大语言模型的多种方案。每种方案都有其独特的优势：

🎯 方案对比总结

方案	优点	适用场景	推荐指数
Ollama	简单易用，模型管理方便	个人使用，快速体验	⭐⭐⭐⭐⭐
Text Generation WebUI	功能丰富，配置灵活	研究实验，高级用户	⭐⭐⭐⭐
LocalAI	OpenAI兼容，生态完善	企业集成，API服务	⭐⭐⭐⭐

🚀 最佳实践建议

新手入门：推荐从Ollama开始，简单直接
开发集成：选择LocalAI，兼容性最好
研究实验：使用Text Generation WebUI，配置最灵活
生产部署：结合具体需求选择，注重稳定性和性能

💡 未来发展方向

模型效率提升：更小的模型，更强的能力
硬件优化：更好的GPU利用率和内存管理
部署简化：一键部署，自动化配置
多模态支持：图像、语音等多模态能力集成

本地大语言模型的部署不仅保护了数据隐私，还为我们提供了完全可控的AI体验。随着技术的不断发展，相信本地AI助手会变得更加智能和易用，成为我们日常工作和学习的得力伙伴。

部署本地LLM

前言

为什么选择本地部署？

🔒 隐私保护

💰 成本优势

⚡ 自主可控

环境准备

系统要求

最低配置

推荐配置

Docker环境验证

方案一：使用Ollama进行部署

🚀 快速部署

1. 拉取Ollama Docker镜像

2. 启动Ollama服务

3. 验证服务状态

📥 下载和管理模型

进入容器环境

下载常用模型

模型信息展示

💬 命令行交互体验

1. 基础对话

2.2 基于Zookeeper的分布式锁

3. 推荐实现方案

3.1 Redis + Redisson方案（推荐）

3.2 注解式分布式锁

4. 最佳实践

4.1 锁的粒度控制

4.2 异常处理

2. 创建Docker Compose配置

3. 启动服务

📥 模型下载和管理

下载模型文件

自动下载脚本

🔧 API调用示例

Python客户端

使用客户端

2. 创建Docker Compose配置

3. 下载模型配置

4. 启动服务

💬 OpenAI兼容客户端

Python客户端

使用示例

2. 装饰器方式

3. 元类方式

🎯 性能调优建议

1. 内存优化

2. Docker优化

3. 模型选择策略

故障排除

🔧 常见问题解决

1. 内存不足

2. GPU不被识别

3. 模型下载失败

🛠️ 诊断脚本

总结与展望

🎯 方案对比总结

🚀 最佳实践建议

💡 未来发展方向

相关资源

📚 官方文档

🔗 有用链接

🛠️ 工具推荐