Running a local LLM model
Notice:
This tutorial is intended for expert users
Local LLM requires a lot of computational power
Smaller models typically have lower performance than bigger ones like OpenAI models
High-level explanation
You can use any program that creates a server with OpenAI-compatible API.
After you started your service, you can do this:
The "Chat Model" field in AI preferences is editable, so you can enter any model you have downloaded
There is a field called "API base URL" in "Expert Settings" where you need to provide the address of an OpenAI-compatible API server
Voilà! You can use a local LLM right away in JabRef.
Step-by-step guide for ollama
ollama
The following steps guide you on how to use ollama
to download and runn local LLMs.
Install
ollama
from their websiteSelect a model that you want to run. The
ollama
provides a large list of models to choose from (we recommend tryinggemma2:2b
, ormistral:7b
, ortinyllama
)When you have selected your model, type
ollama pull <MODEL>:<PARAMETERS>
in your terminal.<MODEL>
refers to the model name likegemma2
ormistral
, and<PARAMETERS>
refers to parameters count like2b
or9b
ollama
will download the model for youAfter that, you can run ollama serve to start a local web server. This server will accept requests and respond with LLM output. Note: The ollama server may already be running, so do not be alarmed by a cannot bind error.
Go to JabRef Preferences -> AI
Set the "AI provider" to "OpenAI"
Set the "Chat Model" to the model you have downloaded in the format
<MODEL>:<PARAMETERS>
Set the "API base URL" in "Expert Settings" to
http://localhost:11434/v1/
Now, you are all set and can chat "locally".
Last updated