
Forget Alexa! This is MUCH Better! (Command EVERYTHING)
video description
Date: 2025-07-20
Related videos
Comments and reviews: 20
antor44
Great video, as always!
However, it would have been great to mention the significant differences between the various solutions. First of all, microcontrollers can process very small Artificial Intelligence models on their own using the open-source library TensorFlow Lite for Microcontrollers (a special version that's even smaller than the standard Lite version. There are also many other alternatives; one of the most well-known is the online service Edge Impulse, which is a commercial platform but is free for certain use cases, like voice processing. In reality, Edge Impulse is mostly a web interface for TensorFlow, with the option to use its own, more optimized library for microcontrollers.
The issue is that the processing power of a microcontroller is very limited. For example, with an Arduino Nano 33 BLE, it's not recommended to train a model with more than two words (like yes and no, as each additional word will significantly decrease the accuracy rate. With an ESP32, you can train a few more words, but not that many. The advantage of this method is that the model can be trained with any word or language, and it's all done locally. Of course, with a Raspberry Pi 5, the possibilities expand considerably.
The offline speech recognition module is an ideal solution, but its main drawback is that it only supports English, in addition to any custom words you add.
As for the solution of sending voice data to a online AI for processing, this approach supports many languages with a high accuracy rate. However, this processing is done online. This is how Alexa and most voice assistants work: only the wake word is processed locally.
reply
Great video, as always!
However, it would have been great to mention the significant differences between the various solutions. First of all, microcontrollers can process very small Artificial Intelligence models on their own using the open-source library TensorFlow Lite for Microcontrollers (a special version that's even smaller than the standard Lite version. There are also many other alternatives; one of the most well-known is the online service Edge Impulse, which is a commercial platform but is free for certain use cases, like voice processing. In reality, Edge Impulse is mostly a web interface for TensorFlow, with the option to use its own, more optimized library for microcontrollers.
The issue is that the processing power of a microcontroller is very limited. For example, with an Arduino Nano 33 BLE, it's not recommended to train a model with more than two words (like yes and no, as each additional word will significantly decrease the accuracy rate. With an ESP32, you can train a few more words, but not that many. The advantage of this method is that the model can be trained with any word or language, and it's all done locally. Of course, with a Raspberry Pi 5, the possibilities expand considerably.
The offline speech recognition module is an ideal solution, but its main drawback is that it only supports English, in addition to any custom words you add.
As for the solution of sending voice data to a online AI for processing, this approach supports many languages with a high accuracy rate. However, this processing is done online. This is how Alexa and most voice assistants work: only the wake word is processed locally.
reply
adder2523
Home Assistant has a device for it (Voice Assistant Preview Edition, but it is mainly for forwarding the voice to the HA server to either process it locally with a locally hosted LLM or forward it to ChatGPT or something else, allowing it to give accurate response or answer questions. Using this way you can also change the wake word if you want, which on the ESP32 is fixed.
The ESP32-S3 on the other hand is for local processing directly on the ESP32, which it does VERY FAST. For those don't wanna put it together, there is a pre-made version called ESP32-S3-BOX-3 / ESP32-S3-BOX-3B but of course is pretty expensive. You just flash it with esphome and it works.
If you want very long and very complex commands (and also asking it questions, then getting the Voice PE is probably better.
But if you just want really short (and direct) commands, the ESP32-S3 does fantastic.
reply
Home Assistant has a device for it (Voice Assistant Preview Edition, but it is mainly for forwarding the voice to the HA server to either process it locally with a locally hosted LLM or forward it to ChatGPT or something else, allowing it to give accurate response or answer questions. Using this way you can also change the wake word if you want, which on the ESP32 is fixed.
The ESP32-S3 on the other hand is for local processing directly on the ESP32, which it does VERY FAST. For those don't wanna put it together, there is a pre-made version called ESP32-S3-BOX-3 / ESP32-S3-BOX-3B but of course is pretty expensive. You just flash it with esphome and it works.
If you want very long and very complex commands (and also asking it questions, then getting the Voice PE is probably better.
But if you just want really short (and direct) commands, the ESP32-S3 does fantastic.
reply
chris-tal
There are also I2S MEMS mics (in the same tiny package) on the market which can be pre-programmed with a few wake words as an extra. Then that feature can be used as an interrupt source to wake a much more current chugging MCU or a more advanced embedded system. The ESP-IDF/ADF/SKAINET, etc are not that easy to use. Those frameworks need some exploration before starting development on them. They're built on each other so one has to carefully observe their intercompatility matrix and patch the IDF (or FreeRTOS behind it. They also have a few closed source precompiled libs in them. On the other hand if I compare them to let's say starting from scratch using CMSIS-DSP, they're much quicker to implement on.
reply
There are also I2S MEMS mics (in the same tiny package) on the market which can be pre-programmed with a few wake words as an extra. Then that feature can be used as an interrupt source to wake a much more current chugging MCU or a more advanced embedded system. The ESP-IDF/ADF/SKAINET, etc are not that easy to use. Those frameworks need some exploration before starting development on them. They're built on each other so one has to carefully observe their intercompatility matrix and patch the IDF (or FreeRTOS behind it. They also have a few closed source precompiled libs in them. On the other hand if I compare them to let's say starting from scratch using CMSIS-DSP, they're much quicker to implement on.
reply
snopz
Nice project!
I actually used ESP-Skainet in an AI robot project I made at school. It only worked for me after I switched to Linux, as no matter how hard I tried on Windows, nothing worked.
I used ESP-IDF 5. 4 along with ESP-ADF to output audio through the speaker. The project wasn't related to Home Assistantit was a simple AI robot that could answer questions and move using voice commands or Bluetooth.
I might make a video in the future to show how I built it and share the code, but that will take some time since the project is complex and involves many libraries and frameworks all linked together.
reply
Nice project!
I actually used ESP-Skainet in an AI robot project I made at school. It only worked for me after I switched to Linux, as no matter how hard I tried on Windows, nothing worked.
I used ESP-IDF 5. 4 along with ESP-ADF to output audio through the speaker. The project wasn't related to Home Assistantit was a simple AI robot that could answer questions and move using voice commands or Bluetooth.
I might make a video in the future to show how I built it and share the code, but that will take some time since the project is complex and involves many libraries and frameworks all linked together.
reply
DigitalIP
I have a number of Alexa products including 4 Echo Shows, i have to say ive never had disconnecting issues with any of them, even with the Echo Dots in the past, but its possible my Internet/WIFI is more stable than yours. As far as Alexa not always hearing properly, that can happen with background or other noise like a TV being loud enough to interfere with what Alexa hears. But i have encountered it as well even without other sound being present, so i usually wake her up again to cancel the current command and try again, or make sure the microphone holes arent dusty which can affect recognition as well.
reply
I have a number of Alexa products including 4 Echo Shows, i have to say ive never had disconnecting issues with any of them, even with the Echo Dots in the past, but its possible my Internet/WIFI is more stable than yours. As far as Alexa not always hearing properly, that can happen with background or other noise like a TV being loud enough to interfere with what Alexa hears. But i have encountered it as well even without other sound being present, so i usually wake her up again to cancel the current command and try again, or make sure the microphone holes arent dusty which can affect recognition as well.
reply
PrinsGamar22
This was an insightful look into DIY options for voice assistants. I've struggled with my own Alexa setup, so seeing how someone has transitioned to a more reliable home solution is refreshing. The technical breakdown of voice recognition was particularly helpful, and I appreciate you sharing the specific challenges with ESP-Skainet. It can be frustrating to deal with compatibility issues, but your experience shows there's definitely a potential for a better alternative. I'm curious to see how your custom voice assistant performs in everyday use compared to Alexa.
reply
This was an insightful look into DIY options for voice assistants. I've struggled with my own Alexa setup, so seeing how someone has transitioned to a more reliable home solution is refreshing. The technical breakdown of voice recognition was particularly helpful, and I appreciate you sharing the specific challenges with ESP-Skainet. It can be frustrating to deal with compatibility issues, but your experience shows there's definitely a potential for a better alternative. I'm curious to see how your custom voice assistant performs in everyday use compared to Alexa.
reply
phinok. m. 628
Uhm. Ok, why don't you use home assistants built in voice assistant.
Personally I don't think it's quite there yet. It's not as natural as Alexa. Although there have been some people combining it with locally running LLMs etc. which is cool. I think it's definitely the future. Obviously things like Alexa that rely on an internet connection are nonsense. Anyway, this video seems to be some side quest on how get worse results than home assistants built in solution for more effort. So I'm not quite sure what this is about. :D
reply
Uhm. Ok, why don't you use home assistants built in voice assistant.
Personally I don't think it's quite there yet. It's not as natural as Alexa. Although there have been some people combining it with locally running LLMs etc. which is cool. I think it's definitely the future. Obviously things like Alexa that rely on an internet connection are nonsense. Anyway, this video seems to be some side quest on how get worse results than home assistants built in solution for more effort. So I'm not quite sure what this is about. :D
reply
MichaelKrommen
Hi Scott, about a year ago I also experimented with the ESP-Home setup and yes, it has many advantages: no cloud, faster response, more control, privacy, etc.
But as soon as the ESP-Home voiceassistant is running in a noisy environment, the voice recognition is poor, which makes it more or less useless. Unfortunately, Alexa is unbeatable when it comes to filtering out spoken commands from the background of television or radio. How did you solve this problem
reply
Hi Scott, about a year ago I also experimented with the ESP-Home setup and yes, it has many advantages: no cloud, faster response, more control, privacy, etc.
But as soon as the ESP-Home voiceassistant is running in a noisy environment, the voice recognition is poor, which makes it more or less useless. Unfortunately, Alexa is unbeatable when it comes to filtering out spoken commands from the background of television or radio. How did you solve this problem
reply
Johnnii360
I'm also unsing Home Assistants Voice Assistant. But I connected a microphone directly to my HA and using OpenWakeword in combination with Speech-to-Phrases. What annoy's me a lot here are the many fals voice recognitions. I use Hey Mycroft at the moment and when I watch Anime or Star Trek - Next Generations OpenWakeword react - but there was no Hey Mycroft. It also occures on simply music or - like in one of the last nights - when someone snores in a special way.
reply
I'm also unsing Home Assistants Voice Assistant. But I connected a microphone directly to my HA and using OpenWakeword in combination with Speech-to-Phrases. What annoy's me a lot here are the many fals voice recognitions. I use Hey Mycroft at the moment and when I watch Anime or Star Trek - Next Generations OpenWakeword react - but there was no Hey Mycroft. It also occures on simply music or - like in one of the last nights - when someone snores in a special way.
reply
gravidar
exactly the same project I am building right now, for the first half of the video I was saying, dude. ESPHome! Glad you got there in the end; ) My project will also add music streaming to the device as I'm installing it into and old CD/FM device with decent enough amp/speakers as AUX input - haven't seen anyone combine these two features yet (like a nest mini does) so hope it works. thanks for this info, it reassures me I'm on the right track.
reply
exactly the same project I am building right now, for the first half of the video I was saying, dude. ESPHome! Glad you got there in the end; ) My project will also add music streaming to the device as I'm installing it into and old CD/FM device with decent enough amp/speakers as AUX input - haven't seen anyone combine these two features yet (like a nest mini does) so hope it works. thanks for this info, it reassures me I'm on the right track.
reply
cynic5581
Been in home/business automation for near a decade now. It’s very important to incorporate parallel systems.
If a relative can’t walk into a room and figure out how to turn the light on or another basic task like open the curtains then we consider that a failure.
I’m in the middle of a job where the previous owner used smart bulbs in every light and the new owner was using flashlights to goto the bathroom.
reply
Been in home/business automation for near a decade now. It’s very important to incorporate parallel systems.
If a relative can’t walk into a room and figure out how to turn the light on or another basic task like open the curtains then we consider that a failure.
I’m in the middle of a job where the previous owner used smart bulbs in every light and the new owner was using flashlights to goto the bathroom.
reply
stevenA44
I've been wanting to do away with all my Alexa devices and smart outlets due to when the internet goes out, I can control anything! I just want to be able to use stuff locally if needed. Having it to be able to connect to the internet if I'm away from home would be nice as well, but if the internet is out when I'm home, being able to control stuff is what I'm mostly after.
reply
I've been wanting to do away with all my Alexa devices and smart outlets due to when the internet goes out, I can control anything! I just want to be able to use stuff locally if needed. Having it to be able to connect to the internet if I'm away from home would be nice as well, but if the internet is out when I'm home, being able to control stuff is what I'm mostly after.
reply
damiensorel6300
I think it is important to note (Sorry if you actually mentionned it) that for the ESPHome version to work you need to have Speech-to-Text and Text-to-Speech agents. And not everyone wants to use Home Assistant Cloud. NetworkChuck has a great video about setup Wyoming services for STT and TTS. Though doing full STT locally requires some beefy machine.
reply
I think it is important to note (Sorry if you actually mentionned it) that for the ESPHome version to work you need to have Speech-to-Text and Text-to-Speech agents. And not everyone wants to use Home Assistant Cloud. NetworkChuck has a great video about setup Wyoming services for STT and TTS. Though doing full STT locally requires some beefy machine.
reply
micromem
People who can create such electronic gizmos, I applaud them. I can just about do very basic repairs or such other basic circuits, which school kids probably know these days ha. Still trying to wrap my head around home assistant, although it can be costly, so this little project of yours is rather good and seems something I could do. Thanks.
reply
People who can create such electronic gizmos, I applaud them. I can just about do very basic repairs or such other basic circuits, which school kids probably know these days ha. Still trying to wrap my head around home assistant, although it can be costly, so this little project of yours is rather good and seems something I could do. Thanks.
reply
ChriDDel
I use M5Stack Atom Echo with a streaming output to Denon 150. And also a HA Voice preview edition wich is a esp32-s3-devkitc-1.
As STT i use Whisper cloud and as agend Chat GPT. Both with openAI API. The costs are very low. A few cents per day.
Whisper cloud (HACS Plugin) works great. Whisper localy on a Pi5 makes too many errors.
reply
I use M5Stack Atom Echo with a streaming output to Denon 150. And also a HA Voice preview edition wich is a esp32-s3-devkitc-1.
As STT i use Whisper cloud and as agend Chat GPT. Both with openAI API. The costs are very low. A few cents per day.
Whisper cloud (HACS Plugin) works great. Whisper localy on a Pi5 makes too many errors.
reply
greatscott
I’m disappointed GreatScott made the same mistake as all the other clickbait videos with this claim.
Alexa is $20 decent Bluetooth SPEAKERS with synced audio. Voice control alone does not compete with that. I HATE that Alexa is such a good value. But to beat it, we need to do it without a bait and switch title like this.
reply
I’m disappointed GreatScott made the same mistake as all the other clickbait videos with this claim.
Alexa is $20 decent Bluetooth SPEAKERS with synced audio. Voice control alone does not compete with that. I HATE that Alexa is such a good value. But to beat it, we need to do it without a bait and switch title like this.
reply
Robbedoes2
For making it sounding better you could seal the box, add weight and sturdyness to the front, use a higher quality full range driver, use a higher quality amplifier, and last you may add a tuned port for more bass. But maybe you can also pair it with another sound system Or even use a sound system directly with home assistant
reply
For making it sounding better you could seal the box, add weight and sturdyness to the front, use a higher quality full range driver, use a higher quality amplifier, and last you may add a tuned port for more bass. But maybe you can also pair it with another sound system Or even use a sound system directly with home assistant
reply
ivovass195
Good stuff. If you have home assistant then they have the Home Assistant Voice Preview Edition device to make easier use of the already existent voice commands service of H. A. also if the wife doesn't want the use the HA app why not a control panels customised for the needed tasks As usual buy vs diy format is a favourite
reply
Good stuff. If you have home assistant then they have the Home Assistant Voice Preview Edition device to make easier use of the already existent voice commands service of H. A. also if the wife doesn't want the use the HA app why not a control panels customised for the needed tasks As usual buy vs diy format is a favourite
reply
jpardoa94
As someone who has tried to get into learning ESP-IDF multiple times, I feel your initial pain. There's very little info out there other than the official docs, and some basic setups involve so many prep steps that it becomes really hard to not work with the official docs, which are also very painful to read.
reply
As someone who has tried to get into learning ESP-IDF multiple times, I feel your initial pain. There's very little info out there other than the official docs, and some basic setups involve so many prep steps that it becomes really hard to not work with the official docs, which are also very painful to read.
reply
N1ghtR1der666
Very nice result certainly very light weight, although I would like to know how customizable are the avilable commands, in my own system the have a LLM access a small database that I update with the commands I want to give it access, still not sure this is the best way though
reply
Very nice result certainly very light weight, although I would like to know how customizable are the avilable commands, in my own system the have a LLM access a small database that I update with the commands I want to give it access, still not sure this is the best way though
reply
Add a review, comment
Other channel videos















