Ollama本地运行大语言模型

2024-06-22

6 minute read

如何快速上手了解LLM应用是怎么构建的, 本文介绍Ollama一个很棒的工具.

我用的工具,平台,和硬件

WSL with Window 11
Conda Python3.12
Nvida 1080Ti

安装Ollama

Ollama 简介 (by Google Gemini)

Ollama 是一款开源框架，旨在简化在本地运行大型语言模型 (LLM) 的过程。它通过将模型权重、配置和数据集打包成一个名为 Modelfile 的统一包来实现这一点。这使得下载、安装和使用 LLM 更加方便，即使对于没有广泛技术知识的用户也是如此。

Ollama 支持各种 LLM，包括 Llama 2、Mistral、CodeLLaMA、Falcon、Vicuna 和 Wizard。它还具有广泛的自定义选项，允许用户根据其特定需求调整模型行为。

Ollama 的一些主要优点包括：

易用性： Ollama 易于安装和使用，即使对于没有广泛技术知识的用户也是如此。灵活性： Ollama 支持各种 LLM，并具有广泛的自定义选项。可访问性： Ollama 是开源的，免费供任何人使用。 Ollama 适用于各种用例，包括：

研究： Ollama 可用于研究 LLM 的行为和性能。开发： Ollama 可用于开发使用 LLM 的应用程序。教育： Ollama 可用于教育人们了解 LLM。

安装

1curl -fsSL https://ollama.com/install.sh | sh

此外, 还有docker container可以选用. 支持的model可以在library里查看

这里highlight几个1080Ti可以跑起来的models

Model	Parameters	Size	Download
Llama 3	8B	4.7GB	`ollama run llama3`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Gemma	2B	1.4GB	`ollama run gemma:2b`
Gemma	7B	4.8GB	`ollama run gemma:7b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`

CLI

Ollam的工具很好用, 整理的感觉很像把model当作container image一样

 1➜ ollama
 2Usage:
 3  ollama [flags]
 4  ollama [command]
 5
 6Available Commands:
 7  serve       Start ollama
 8  create      Create a model from a Modelfile
 9  show        Show information for a model
10  run         Run a model
11  pull        Pull a model from a registry
12  push        Push a model to a registry
13  list        List models
14  cp          Copy a model
15  rm          Remove a model
16  help        Help about any command
17
18Flags:
19  -h, --help      help for ollama
20  -v, --version   Show version information
21
22Use "ollama [command] --help" for more information about a command.

几个重要的命令介绍一下

ollama serve 启动Ollama后端服务, 包括了API服务, 模型管理等. Ollama是用go写的, 我会找时间做一些code deep dive.

ollama run <model-name> 可以启动一个chat interface

Conda

用的最新的python3.12的miniconda

1conda create --name ollama # create a new env
2conda activate ollama
3
4# install packages 
5
6conda install conda-forge::chromadb
7conda install pysqlite3 
8conda install pip
9pip3 install ollama # have to use pip to install ollama py library

Demo

这里展示一个简单RAG的Demo 原文在这里这个demo利用了chromadb来做vector store.

第一步生成embeddings

这里我们专门用了一个embedding model来做embedding 然后在chromadb里构建了一个index

 1import ollama
 2import chromadb
 3
 4documents = [
 5  "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
 6  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
 7  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
 8  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
 9  "Llamas are vegetarians and have very efficient digestive systems",
10  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
11]
12
13client = chromadb.Client()
14collection = client.create_collection(name="docs")
15
16# store each document in a vector embedding database
17for i, d in enumerate(documents):
18  response = ollama.embeddings(model="mxbai-embed-large", prompt=d)
19  embedding = response["embedding"]
20  collection.add(
21    ids=[str(i)],
22    embeddings=[embedding],
23    documents=[d]
24  )

第二步 retrieve

利用chromadb的query能力找到了一个topk(k=1)和prompt相关的文档,这个文档会作为我们的第三步Generation的输入

 1# an example prompt
 2prompt = "What animals are llamas related to?"
 3
 4# generate an embedding for the prompt and retrieve the most relevant doc
 5response = ollama.embeddings(
 6  prompt=prompt,
 7  model="mxbai-embed-large"
 8)
 9results = collection.query(
10  query_embeddings=[response["embedding"]],
11  n_results=1
12)
13data = results['documents'][0][0]

第三步 generation

我们需要用llm理解上一步找到的文档, 然后回答prompt

1# generate a response combining the prompt and data we retrieved in step 2
2output = ollama.generate(
3  model="phi3",
4  prompt=f"Using this data: {data}. Respond to this prompt: {prompt}"
5)
6
7print(output['response'])

结果 (from Phi3)

 Llamas are related to other members of the Camelidae family, including vicuñas and true camels such as dromedary (one-humped) and Bactrian (two-humped) camels. These species share common ancestors and exhibit similar physical characteristics like long necks, leathery pads on their feet, and a unique ability to thrive in arid environments due to their efficient water conservation methods.