homeassistant

15533 readers

26 users here now

Home Assistant is open source home automation that puts local control and privacy first.
Powered by a worldwide community of tinkerers and DIY enthusiasts.

Home Assistant can be self-installed on ProxMox, Raspberry Pi, or even purchased pre-installed: Home Assistant: Installation

Discussion of Home-Assistant adjacent topics is absolutely fine, within reason.
If you're not sure, DM @GreatAlbatross@feddit.uk

founded 2 years ago

MODERATORS

GreatAlbatross@feddit.uk

greatalbatross@lemmy.world

Which HA voice assistant ollama model are you using? (feddit.online)

submitted 6 days ago* (last edited 5 days ago) by smashing3606@feddit.online to c/homeassistant@lemmy.world

17 comments fedilink hide all child comments

What is everyone using for the LLM model for HA voice when selfhosting ollama? I've tried llama and qwen with varying degrees of understanding my commands. I'm currently on llama as it appears a little better. I just wanted to see if anyone found a better model.

Edit: as pointed out, this is more of a speech to text issue than llm model. I'm looking into the alternatives to whisper

you are viewing a single comment's thread
view the rest of the comments

[–] Rhaedas@fedia.io 2 points 5 days ago

The model isn't going to help there, then. I've been messing with some of the whisper variants like faster-whisper, also tried an older one called nerd-dictation, haven't yet found one that doesn't creep in garbage from time to time. And of course you have to make sure the data the VR is getting is clean of noise and a good level. It's tough to troubleshoot. The advantage is that LLMs might be able to pick through the crap and figure out what you really want, if there's enough trigger words there. I even had an uncensored one once call me out on a typo I made, which I thought was hilarious. But getting 100% accuracy with so many places that can error is a challenge. It's why I suggested finding or making (!) a fine tuned version that self limits what it responds to, to help put another filter to catch the problems. Ironic that the dumber things work better by just not doing anything when the process breaks.

Having used Voice Attack on the Windows side, the same thing applied. It wasn't that VA or Windows were better at picking up a voice command, but a matter of setting the probability of a match for a command low enough to catch a partial hit, while high enough to weed out the junk. So that's probably the goal here, but that gets into the coding for the voice recognition models, and I'm not good enough to go that deep.