---
title: "Using ollamar"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using ollamar}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
ollamar is the easiest way to integrate R with [Ollama](https://ollama.com/), which lets you run language models locally on your own machine.
## Installation
1. Download and install the [Ollama](https://ollama.com) app.
- [macOS](https://ollama.com/download/Ollama-darwin.zip)
- [Windows preview](https://ollama.com/download/OllamaSetup.exe)
- Linux: `curl -fsSL https://ollama.com/install.sh | sh`
- [Docker image](https://hub.docker.com/r/ollama/ollama)
2. Open/launch the Ollama app to start the local server.
3. Install either the stable or latest/development version of `ollamar`.
Stable version:
```{r eval=FALSE}
install.packages("ollamar")
```
For the latest/development version with more features/bug fixes (see latest changes [here](https://hauselin.github.io/ollama-r/news/index.html)), you can install it from GitHub using the `install_github` function from the `remotes` library. If it doesn't work or you don't have `remotes` library, please run `install.packages("remotes")` in R or RStudio before running the code below.
```{r eval=FALSE}
# install.packages("remotes") # run this line if you don't have the remotes library
remotes::install_github("hauselin/ollamar")
```
## Usage
`ollamar` uses the [`httr2` library](https://httr2.r-lib.org/index.html) to make HTTP requests to the Ollama server, so many functions in this library returns an `httr2_response` object by default. If the response object says `Status: 200 OK`, then the request was successful.
```{r eval=FALSE}
library(ollamar)
test_connection() # test connection to Ollama server
# if you see "Ollama local server not running or wrong server," Ollama app/server isn't running
# generate a response/text based on a prompt; returns an httr2 response by default
resp <- generate("llama3.1", "tell me a 5-word story")
resp
#' interpret httr2 response object
#'
#' Status: 200 OK # if successful, status code should be 200 OK
#' Content-Type: application/json
#' Body: In memory (414 bytes)
# get just the text from the response object
resp_process(resp, "text")
# get the text as a tibble dataframe
resp_process(resp, "df")
# alternatively, specify the output type when calling the function initially
txt <- generate("llama3.1", "tell me a 5-word story", output = "text")
# list available models (models you've pulled/downloaded)
list_models()
name size parameter_size quantization_level modified
1 codegemma:7b 5 GB 9B Q4_0 2024-07-27T23:44:10
2 llama3.1:latest 4.7 GB 8.0B Q4_0 2024-07-31T07:44:33
```
### Pull/download model
Download a model from the ollama library (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#pull-a-model)). For the list of models you can pull/download, see [Ollama library](https://ollama.com/library).
```{r eval=FALSE}
pull("llama3.1") # download a model (equivalent bash code: ollama run llama3.1)
list_models() # verify you've pulled/downloaded the model
```
### Delete model
Delete a model and its data (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#delete-a-model)). You can see what models you've downloaded with `list_models()`. To download a model, specify the name of the model.
```{r eval=FALSE}
list_models() # see the models you've pulled/downloaded
delete("all-minilm:latest") # returns a httr2 response object
```
### Generate completion
Generate a response for a given prompt (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion)).
```{r eval=FALSE}
resp <- generate("llama3.1", "Tomorrow is a...") # return httr2 response object by default
resp
resp_process(resp, "text") # process the response to return text/vector output
generate("llama3.1", "Tomorrow is a...", output = "text") # directly return text/vector output
generate("llama3.1", "Tomorrow is a...", stream = TRUE) # return httr2 response object and stream output
generate("llama3.1", "Tomorrow is a...", output = "df", stream = TRUE)
# image prompt
# use a vision/multi-modal model
generate("benzie/llava-phi-3", "What is in the image?", images = "image.png", output = 'text')
```
### Chat
Generate the next message in a chat/conversation.
```{r eval=FALSE}
messages <- create_message("what is the capital of australia") # default role is user
resp <- chat("llama3.1", messages) # default returns httr2 response object
resp #
resp_process(resp, "text") # process the response to return text/vector output
# specify output type when calling the function
chat("llama3.1", messages, output = "text") # text vector
chat("llama3.1", messages, output = "df") # data frame/tibble
chat("llama3.1", messages, output = "jsonlist") # list
chat("llama3.1", messages, output = "raw") # raw string
chat("llama3.1", messages, stream = TRUE) # stream output and return httr2 response object
# create chat history
messages <- create_messages(
create_message("end all your sentences with !!!", role = "system"),
create_message("Hello!"), # default role is user
create_message("Hi, how can I help you?!!!", role = "assistant"),
create_message("What is the capital of Australia?"),
create_message("Canberra!!!", role = "assistant"),
create_message("what is your name?")
)
cat(chat("llama3.1", messages, output = "text")) # print the formatted output
# image prompt
messages <- create_message("What is in the image?", images = "image.png")
# use a vision/multi-modal model
chat("benzie/llava-phi-3", messages, output = "text")
```
#### Stream responses
```{r eval=FALSE}
messages <- create_message("Tell me a 1-paragraph story.")
# use "llama3.1" model, provide list of messages, return text/vector output, and stream the output
chat("llama3.1", messages, output = "text", stream = TRUE)
# chat(model = "llama3.1", messages = messages, output = "text", stream = TRUE) # same as above
```
#### Format messages for chat
Internally, messages are represented as a `list` of many distinct `list` messages. Each list/message object has two elements: `role` (can be `"user"` or `"assistant"` or `"system"`) and `content` (the message text). The example below shows how the messages/lists are presented.
```{r eval=FALSE}
list( # main list containing all the messages
list(role = "user", content = "Hello!"), # first message as a list
list(role = "assistant", content = "Hi! How are you?") # second message as a list
)
```
To simplify the process of creating and managing messages, `ollamar` provides functions to format and prepare messages for the `chat()` function. These functions also work with other APIs or LLM providers like OpenAI and Anthropic.
- `create_messages()`: create messages to build a chat history
- `create_message()` creates a chat history with a single message
- `append_message()` adds a new message to the end of the existing messages
- `prepend_message()` adds a new message to the beginning of the existing messages
- `insert_message()` inserts a new message at a specific index in the existing messages
- by default, it inserts the message at the -1 (final) position
- `delete_message()` delete a message at a specific index in the existing messages
- positive and negative indices/positions are supported
- if there are 5 messages, the positions are 1 (-5), 2 (-4), 3 (-3), 4 (-2), 5 (-1)
```{r eval=FALSE}
# create a chat history with one message
messages <- create_message(content = "Hi! How are you? (1ST MESSAGE)", role = "assistant")
# or simply, messages <- create_message("Hi! How are you?", "assistant")
messages[[1]] # get 1st message
# append (add to the end) a new message to the existing messages
messages <- append_message("I'm good. How are you? (2ND MESSAGE)", "user", messages)
messages[[1]] # get 1st message
messages[[2]] # get 2nd message (newly added message)
# prepend (add to the beginning) a new message to the existing messages
messages <- prepend_message("I'm good. How are you? (0TH MESSAGE)", "user", messages)
messages[[1]] # get 0th message (newly added message)
messages[[2]] # get 1st message
messages[[3]] # get 2nd message
# insert a new message at a specific index/position (2nd position in the example below)
# by default, the message is inserted at the end of the existing messages (position -1 is the end/default)
messages <- insert_message("I'm good. How are you? (BETWEEN 0 and 1 MESSAGE)", "user", messages, 2)
messages[[1]] # get 0th message
messages[[2]] # get between 0 and 1 message (newly added message)
messages[[3]] # get 1st message
messages[[4]] # get 2nd message
# delete a message at a specific index/position (2nd position in the example below)
messages <- delete_message(messages, 2)
# create a chat history with multiple messages
messages <- create_messages(
create_message("You're a knowledgeable tour guide.", role = "system"),
create_message("What is the capital of Australia?") # default role is user
)
```
You can convert `data.frame`, `tibble` or `data.table` objects to `list()` of messages and vice versa with functions from base R or other popular libraries.
```{r eval=FALSE}
# create a list of messages
messages <- create_messages(
create_message("You're a knowledgeable tour guide.", role = "system"),
create_message("What is the capital of Australia?")
)
# convert to dataframe
df <- dplyr::bind_rows(messages) # with dplyr library
df <- data.table::rbindlist(messages) # with data.table library
# convert dataframe to list with apply, purrr functions
apply(df, 1, as.list) # convert each row to a list with base R apply
purrr::transpose(df) # with purrr library
```
### Embeddings
Get the vector embedding of some prompt/text (see [API doc](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings)). By default, the embeddings are normalized to length 1, which means the following:
- cosine similarity can be computed slightly faster using just a dot product
- cosine similarity and Euclidean distance will result in the identical rankings
```{r eval=FALSE}
embed("llama3.1", "Hello, how are you?")
# don't normalize embeddings
embed("llama3.1", "Hello, how are you?", normalize = FALSE)
```
```{r eval=FALSE}
# get embeddings for similar prompts
e1 <- embed("llama3.1", "Hello, how are you?")
e2 <- embed("llama3.1", "Hi, how are you?")
# compute cosine similarity
sum(e1 * e2) # not equals to 1
sum(e1 * e1) # 1 (identical vectors/embeddings)
# non-normalized embeddings
e3 <- embed("llama3.1", "Hello, how are you?", normalize = FALSE)
e4 <- embed("llama3.1", "Hi, how are you?", normalize = FALSE)
```
### Parse `httr2_response` objects with `resp_process()`
`ollamar` uses the [`httr2` library](https://httr2.r-lib.org/index.html) to make HTTP requests to the Ollama server, so many functions in this library returns an `httr2_response` object by default.
You can either parse the output with `resp_process()` or use the `output` parameter in the function to specify the output format. Generally, the `output` parameter can be one of `"df"`, `"jsonlist"`, `"raw"`, `"resp"`, or `"text"`.
```{r eval=FALSE}
resp <- list_models(output = "resp") # returns a httr2 response object
#
# Status: 200 OK
# Content-Type: application/json
# process the httr2 response object with the resp_process() function
resp_process(resp, "df")
# or list_models(output = "df")
resp_process(resp, "jsonlist") # list
# or list_models(output = "jsonlist")
resp_process(resp, "raw") # raw string
# or list_models(output = "raw")
resp_process(resp, "resp") # returns the input httr2 response object
# or list_models() or list_models("resp")
resp_process(resp, "text") # text vector
# or list_models("text")
```
## Advanced usage
### Parallel requests
For the `generate()` and `chat()` endpoints/functions, you can specify `output = 'req'` in the function so the functions return `httr2_request` objects instead of `httr2_response` objects.
```{r eval=FALSE}
prompt <- "Tell me a 10-word story"
req <- generate("llama3.1", prompt, output = "req") # returns a httr2_request object
```
When you have multiple `httr2_request` objects in a list, you can make parallel requests with the `req_perform_parallel` function from the `httr2` library. See [`httr2` documentation](https://httr2.r-lib.org/reference/req_perform_parallel.html) for details.
```{r eval=FALSE}
library(httr2)
prompt <- "Tell me a 5-word story"
# create 5 httr2_request objects that generate a response to the same prompt
reqs <- lapply(1:5, function(r) generate("llama3.1", prompt, output = "req"))
# make parallel requests and get response
resps <- req_perform_parallel(reqs) # list of httr2_request objects
# process the responses
sapply(resps, resp_process, "text") # get responses as text
# [1] "She found him in Paris." "She found the key upstairs."
# [3] "She found her long-lost sister." "She found love on Mars."
# [5] "She found the diamond ring."
```
Example sentiment analysis with parallel requests with `generate()` function
```{r eval=FALSE}
library(httr2)
library(glue)
library(dplyr)
# text to classify
texts <- c('I love this product', 'I hate this product', 'I am neutral about this product')
# create httr2_request objects for each text, using the same system prompt
reqs <- lapply(texts, function(text) {
prompt <- glue("Your only task/role is to evaluate the sentiment of product reviews, and your response should be one of the following:'positive', 'negative', or 'other'. Product review: {text}")
generate("llama3.1", prompt, output = "req")
})
# make parallel requests and get response
resps <- req_perform_parallel(reqs) # list of httr2_request objects
# process the responses
sapply(resps, resp_process, "text") # get responses as text
# [1] "Positive" "Negative."
# [3] "'neutral' translates to... 'other'."
```
Example sentiment analysis with parallel requests with `chat()` function
```{r eval=FALSE}
library(httr2)
library(dplyr)
# text to classify
texts <- c('I love this product', 'I hate this product', 'I am neutral about this product')
# create system prompt
chat_history <- create_message("Your only task/role is to evaluate the sentiment of product reviews provided by the user. Your response should simply be 'positive', 'negative', or 'other'.", "system")
# create httr2_request objects for each text, using the same system prompt
reqs <- lapply(texts, function(text) {
messages <- append_message(text, "user", chat_history)
chat("llama3.1", messages, output = "req")
})
# make parallel requests and get response
resps <- req_perform_parallel(reqs) # list of httr2_request objects
# process the responses
bind_rows(lapply(resps, resp_process, "df")) # get responses as dataframes
# # A tibble: 3 × 4
# model role content created_at
#
# 1 llama3.1 assistant Positive 2024-08-05T17:54:27.758618Z
# 2 llama3.1 assistant negative 2024-08-05T17:54:27.657525Z
# 3 llama3.1 assistant other 2024-08-05T17:54:27.657067Z
```