March 5, 2025

yo-GPT: run GPT models locally

Why I Built yo-GPT?

Using GPT models usually means choosing between expensive monthly subscriptions or free tools that track your data.

I wanted a simple and transparent way to use powerful, open-source LLMs—without giving up control of my data or overspending.

That is why I built yo-GPT: a lightweight boilerplate to run LLMs locally through the Fireworks.ai API, right on your own machine. You only pay for what you use, with no subscriptions, and your data stays private—saved entirely on your device. Powered by high-quality open-source models, yo-GPT gives you full control, without the hassle.

I built yo-GPT to hit these key points:

✅ Local-first: All your conversations are saved on your machine. Nothing stored on the cloud.

✅ Transparent pricing: No subscriptions. Pay-as-you-go with clear cost tracking per conversation and overall usage.

✅ Easy setup: A boilerplate you can run locally without deep-diving into MLOps.

✅ Powered by Fireworks.ai: Leverage best-in-class open-source models through their API, while keeping your data private (Fireworks doesn’t store prompts or outputs).


What’s possible with yo-GPT?

The core of yo-GPT is simple: chat with the latest, most powerful open-source LLMs — all running locally on your machine, with full control over your data and costs.

Out of the box, yo-GPT supports 4 high-quality models via Fireworks.ai:

  • 🔹 Llama 3.3 70B Instruct 💰 $0.90 /M Tokens | 🧠 128k Context

    A 70-billion parameter, instruction-tuned model designed for general-purpose tasks. It offers an excellent balance between performance and affordability.

  • 🔹 Llama 3.1 405B Instruct 💰 $3.00 /M Tokens | 🧠 128k Context

    A massive 405-billion parameter model built for complex instructions and high-accuracy outputs — perfect for advanced, demanding use cases.

  • 🔹 Mixtral-8x22B Instruct 💰 $1.20 /M Tokens | 🧠 64k Context

    A Mixture of Experts (MoE) model with eight 22-billion parameter experts. Ideal for creative tasks, code generation, and efficient, high-quality outputs.

  • 🔹 Deepseek-R1 💰 $8.00 /M Tokens | 🧠 160k Context

    A premium research-grade model designed for deep reasoning and long-context understanding. Best suited for complex problem-solving and precision work.

What else?

Beyond just chatting with top LLMs, yo-GPT gives you extra tools to stay organized, save costs, and customize your experience:

Automatic conversation saving All your chats are autosaved locally, with the last 30 days of conversations kept to avoid taking up too much space. Prefer manual control? You can disable autosave and choose what to keep.

Smart titles (optional) By default, yo-GPT uses a small amount of your credits to automatically generate titles for your conversations. Want to save even more? You can turn this off anytime.

Full chat management Use the built-in UI to easily rename or delete your chats whenever you like.

Customizable parameters Adjust settings like token limits and temperature to fine-tune how the model responds.

Create your own custom GPTs Build tailored versions of GPTs with specific behavior, personalities, or instructions — directly within yo-GPT.


Install & run in minutes

  1. Clone the repo:

    git clone https://github.com/sylvainzircher/yo-GPT.git [your-name]
    
    cd [your-name]
    
  2. Install dependencies:

    npm install
    
  3. Add your Fireworks API key:

    Create a .env.local file in the root directory and add your fireworks api key (you can find it here):

    NEXT_PUBLIC_FIREWORKS_API_KEY=your_api_key_here
    
  4. Run Locally:

    npm run dev
  5. Open http://localhost:3000 and start chatting!

You can also add other models — here’s how!

  1. Go to the fireworks website.

  2. Visit the model library here.

  3. Choose a serverless and instruct model, for example Qwen2.5-Coder-32B-Instruct.

  4. In your yo-GPT code, locate the data folder and open the model.js file.

  5. Add the model details to the models array in the models.js file like this:

  {
    name: "Qwen2.5-Coder-32B-Instruct",
    size: "32B parameters",
    context: "128k",
    price: "$0.90/M Token",
    priceSimplified: 0.9,
    api: "accounts/fireworks/models/qwen2p5-coder-32b-instruct",
  },

It is really important to use the exact api route and the correct price. The priceSimplified value is used to calculate the cost in the App.

After adding the new model, your full array will look like this:

export const models = [
  {
    name: "deepseek-r1",
    size: "176B parameters",
    context: "160k",
    price: "$8.00/M Token",
    priceSimplified: 8.0,
    api: "accounts/fireworks/models/deepseek-r1",
  },
  {
    name: "Llama-3.3-405B",
    size: "410B parameters",
    context: "128k",
    price: "$3.00/M Token",
    priceSimplified: 3.0,
    api: "accounts/fireworks/models/llama-v3p1-405b-instruct",
  },
  {
    name: "mixtral-8x22b",
    size: "176B parameters",
    context: "64k",
    price: "$1.20/M Token",
    priceSimplified: 1.2,
    api: "accounts/fireworks/models/mixtral-8x22b-instruct",
  },
  {
    name: "Llama-3.3-70B",
    size: "70B parameters",
    context: "128k",
    price: "$0.90/M Token",
    priceSimplified: 0.9,
    api: "accounts/fireworks/models/llama-v3p3-70b-instruct",
  },
  {
    name: "Qwen2.5-Coder-32B-Instruct",
    size: "32B parameters",
    context: "128k",
    price: "$0.90/M Token",
    priceSimplified: 0.9,
    api: "accounts/fireworks/models/qwen2p5-coder-32b-instruct",
  },
]

That’s it! Your new model is now available in your app and ready to use 🚀

What is next for yo-GPT?

Here’s a sneak peek at what’s coming:

✨ UI improvements – Make the interface smoother, faster, and even easier to use.

📄 Chat with documents – Upload files and have conversations directly with their content.

🖼 Text-to-image generation – Generate images from your prompts using powerful open-source models.

🔊 Text-to-speech support – Explore adding natural-sounding voice outputs to read responses aloud.

Stay tuned — more updates coming soon! 🚀