Running FERAL Fully Offline

Running FERAL offline means every inference stays on your machine. No API keys to rotate, no data leaves your network, and you can unplug the Ethernet cable after setup.

Prerequisites

Requirement	Minimum	Recommended
Python	3.11+	3.12
RAM	8 GB (small models)	16 GB (medium models)
Disk	10 GB free	25 GB free
OS	macOS 13+, Ubuntu 22.04+, Windows 11	Apple Silicon or CUDA GPU

Tested Combinations

These combos are regularly tested by the team. Mix and match to fit your hardware budget.

Component	Provider	Model	RAM	Speed
LLM	Ollama	llama3:8b	6 GB	Good
LLM	Ollama	qwen2:7b	5 GB	Good
Vision	Ollama	llava:7b	5 GB	OK
STT	faster-whisper	base	1 GB	Fast
STT	faster-whisper	small	2 GB	Good
TTS	Piper	en_US-lessac-medium	50 MB	Fast

RAM figures are approximate peak usage. When running LLM + STT + TTS simultaneously, add the individual figures together.

Setup

Troubleshooting

”Connection refused” from Ollama

Make sure ollama serve is running. By default it listens on http://localhost:11434. Check with:

curl http://localhost:11434/api/tags

Slow first response

The first inference after pulling a model is slow because Ollama loads weights into memory. Subsequent calls are fast. You can pre-warm with:

ollama run llama3:8b "hello" --nowordwrap

Out of memory

If your machine runs out of RAM:

Switch to a smaller model (qwen2:7b uses ~1 GB less than llama3:8b).
Close other heavy applications.
On Linux, increase swap: sudo fallocate -l 8G /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile.

STT not detecting speech

Ensure your microphone is accessible and the correct device is selected:

python -c "import sounddevice; print(sounddevice.query_devices())"

Set the device index if needed:

export FERAL_STT_DEVICE=1

Performance Tips

Choose models based on your hardware:

8 GB RAM (no GPU): qwen2:7b + faster-whisper base + Piper — expect 2–4 s response times.
16 GB RAM (no GPU): llama3:8b + faster-whisper small + Piper — expect 1–3 s response times.
Apple Silicon (M1+): Ollama uses Metal acceleration automatically. llama3:8b runs at ~30 tokens/s on M2.
NVIDIA GPU (8 GB+ VRAM): Ollama detects CUDA automatically. Expect 40–80 tokens/s depending on model and GPU.

General tips:

Keep Ollama running between sessions to avoid cold-start latency.
Use faster-whisper base unless you need higher accuracy — small is 2× slower for a modest accuracy gain.
Piper TTS is CPU-only and extremely fast; it won’t bottleneck your setup.
If you run FERAL on a headless server, disable TTS with FERAL_TTS_PROVIDER=none to save resources.

Autonomy Browser Automation — Setup & Troubleshooting

Overview

Getting Started

Guides

Marketplace

Memory

Hardware

Channels

Connectivity

Security

Native Apps

Operations

SDKs

Reference

Help

Community

Running FERAL Fully Offline

Prerequisites

Tested Combinations

Setup

Troubleshooting

”Connection refused” from Ollama

Slow first response

Out of memory

STT not detecting speech

Performance Tips

​Prerequisites

​Tested Combinations

​Setup

​Troubleshooting

​”Connection refused” from Ollama

​Slow first response

​Out of memory

​STT not detecting speech

​Performance Tips

Prerequisites

Tested Combinations

Setup

Troubleshooting

”Connection refused” from Ollama

Slow first response

Out of memory

STT not detecting speech

Performance Tips