Ollama本地运行大语言模型
如何快速上手了解LLM应用是怎么构建的, 本文介绍Ollama一个很棒的工具.
我用的工具,平台,和硬件
- WSL with Window 11
- Conda Python3.12
- Nvida 1080Ti
安装Ollama
Ollama 简介 (by Google Gemini)
Ollama 是一款开源框架,旨在简化在本地运行大型语言模型 (LLM) 的过程。它通过将模型权重、配置和数据集打包成一个名为 Modelfile 的统一包来实现这一点。这使得下载、安装和使用 LLM 更加方便,即使对于没有广泛技术知识的用户也是如此。
Ollama 支持各种 LLM,包括 Llama 2、Mistral、CodeLLaMA、Falcon、Vicuna 和 Wizard。它还具有广泛的自定义选项,允许用户根据其特定需求调整模型行为。
Ollama 的一些主要优点包括:
易用性: Ollama 易于安装和使用,即使对于没有广泛技术知识的用户也是如此。 灵活性: Ollama 支持各种 LLM,并具有广泛的自定义选项。 可访问性: Ollama 是开源的,免费供任何人使用。 Ollama 适用于各种用例,包括:
研究: Ollama 可用于研究 LLM 的行为和性能。 开发: Ollama 可用于开发使用 LLM 的应用程序。 教育: Ollama 可用于教育人们了解 LLM。
安装
1curl -fsSL https://ollama.com/install.sh | sh
此外, 还有docker container可以选用. 支持的model可以在library里查看
这里highlight几个1080Ti可以跑起来的models
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3 | 8B | 4.7GB | ollama run llama3 |
Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
Gemma | 2B | 1.4GB | ollama run gemma:2b |
Gemma | 7B | 4.8GB | ollama run gemma:7b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
CLI
Ollam的工具很好用, 整理的感觉很像把model当作container image一样
1➜ ollama
2Usage:
3 ollama [flags]
4 ollama [command]
5
6Available Commands:
7 serve Start ollama
8 create Create a model from a Modelfile
9 show Show information for a model
10 run Run a model
11 pull Pull a model from a registry
12 push Push a model to a registry
13 list List models
14 cp Copy a model
15 rm Remove a model
16 help Help about any command
17
18Flags:
19 -h, --help help for ollama
20 -v, --version Show version information
21
22Use "ollama [command] --help" for more information about a command.
几个重要的命令介绍一下
ollama serve
启动Ollama后端服务, 包括了API服务, 模型管理等.
Ollama是用go写的, 我会找时间做一些code deep dive.
ollama run <model-name>
可以启动一个chat interface
Conda
用的最新的python3.12的miniconda
1conda create --name ollama # create a new env
2conda activate ollama
3
4# install packages
5
6conda install conda-forge::chromadb
7conda install pysqlite3
8conda install pip
9pip3 install ollama # have to use pip to install ollama py library
Demo
这里展示一个简单RAG的Demo 原文在这里 这个demo利用了chromadb来做vector store.
第一步 生成embeddings
这里我们专门用了一个embedding model来做embedding 然后在chromadb里构建了一个index
1import ollama
2import chromadb
3
4documents = [
5 "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
6 "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
7 "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
8 "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
9 "Llamas are vegetarians and have very efficient digestive systems",
10 "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
11]
12
13client = chromadb.Client()
14collection = client.create_collection(name="docs")
15
16# store each document in a vector embedding database
17for i, d in enumerate(documents):
18 response = ollama.embeddings(model="mxbai-embed-large", prompt=d)
19 embedding = response["embedding"]
20 collection.add(
21 ids=[str(i)],
22 embeddings=[embedding],
23 documents=[d]
24 )
第二步 retrieve
利用chromadb的query能力找到了一个topk(k=1)和prompt相关的文档,这个文档会作为我们的第三步Generation的输入
1# an example prompt
2prompt = "What animals are llamas related to?"
3
4# generate an embedding for the prompt and retrieve the most relevant doc
5response = ollama.embeddings(
6 prompt=prompt,
7 model="mxbai-embed-large"
8)
9results = collection.query(
10 query_embeddings=[response["embedding"]],
11 n_results=1
12)
13data = results['documents'][0][0]
第三步 generation
我们需要用llm理解上一步找到的文档, 然后回答prompt
1# generate a response combining the prompt and data we retrieved in step 2
2output = ollama.generate(
3 model="phi3",
4 prompt=f"Using this data: {data}. Respond to this prompt: {prompt}"
5)
6
7print(output['response'])
结果 (from Phi3)
Llamas are related to other members of the Camelidae family, including vicuñas and true camels such as dromedary (one-humped) and Bactrian (two-humped) camels. These species share common ancestors and exhibit similar physical characteristics like long necks, leathery pads on their feet, and a unique ability to thrive in arid environments due to their efficient water conservation methods.