The keyword generating application is free for personal use only and requires a license for commercial use, but there's no pricing available and it only provides an email address for contacting about purchasing a license.
If anyone from the team is reading this, the information above should be front and center on the landing page. I would guess that 99% of your site visitors are going to bounce on your landing page because the relevant information is buried so deeply.
This is Alireza. I am the founder of Picovoice. Thanks a lot for the information and also sharing. I 100% agree we do need to work on the website and its content. It is definitely in our TODO list. Will prioritize. In the meantime happy to answer any commercial questions via email [email protected]
The reason the quote is not present is that there are just too many factors involved. What is the platform you are using (iOS, Android, ARM Cortex-M, DSP, etc)? What is the scale? Are you deploying to ten devices or ten million? Do you have specific runtime requirements? Some people have limited RAM (e.g. I only have 32KB of RAM), some limited CPU, some limited FLASH, etc. How many models do you need? Are there proper words or brand/made up names? Is it English? How much engineering support do you need?
In reality, it is a lengthy decision tree. I do not want to put that here or on the company's website as it will just maximize the confusion. But it makes sense to put the common easy cases and then ask for contact in the rest. Which is probably the path we take as it will save us some time.
That being said I challenge you to find a company who offers similar tech and have pricing on their website. I suspect what I mentioned is the reason. I could be wrong.
I see it may be a bit more complex than I had intially thought. Might I suggest a pricing table kind of thing showing some basic plans and usage and price points?
Something as simple as a few basic use cases, android app, no engineering support, under 10,000 unite, x price for example maybe.
I'm a small developer and tinkerer, so my choice to explore more on not is really price sensitive. However I do also consult with other groups, and may suggest your product as a fit if it meets other criteria I look for "private voice AI" - you've already checked a few boxes!
However, any time I see a "contact us for price inquiries" - I shut down. I know if they can't tell me the price on the page I can't afford it. At that point I don't bookmark the site, I don't research any further, and it reinforces the
awesomeness of other projects I've put into my memory for use.
This is not just you, it's a lot of projects/ site on the web.
This is really good feedback. Agreed. What I have seen work is to put common cases. It probably makes our life on Picovoice side easier as answering repeating price questions is not a fun day to spend our time. We will address this soon. I promise.
I think it will work better for ya. Certainly you've run across sites that show some small access for cheap, and more access for more money, and "requests over X calls / transaction / users, call for enterprise quote" kind of thing..
I assume these businesses are mainly looking for those high volume / high price clients and or people with that kind of money to buy them out completely.. all the while making a cheap plan for people to tinker with, build an MVP and maybe scale up.. and perhaps get enough mid range sales to prove they are worth something to others..
Of course that's not the game plan for every service out there. Anyhow good luck to ya, glad to see people working on ways to do things more privately one way or another.
> But it makes sense to put the common easy cases and then ask for contact in the rest. Which is probably the path we take as it will save us some time.
That would indeed be great.
Also, if complex decision tree is your real objection, then consider putting a price range. "Depending on X, Y and other factors, the price ranges from A for minimal deployment low on Z, to B for large-scale solutions.". Or something like that.
Exact prices are always the best, but estimates for typical cases and a price range are second-best. It lets a potential user decide whether to even bother checking your service out.
How easy would your tech adjust to recognising sound patterns, for example the idea I have would be an intelligent baby monitor that would identify the various noises a baby makes and alert you accordingly to the babies needs from food, nappy change, distress, pain, etc.
Would that type of stuff be viable with your technology?
Though would be a way of automated nappy changing, but an automated rattle or such toy triggered by the baby may well make some tasks less impacting and equally more engaging for the baby. Though identification of needs and with that recognition via audio would be the start.
It's clear you're trying to discover the right price and of course it's complex. Please be upfront about it, what you've said here could be copy-pasta to your site already.
I've been watching Pico for a while, it's very interesting but I'm feeling like you keep teasing more FOSS and being unclear on costs.
I want to like your project too but currently the others are doing a bit better on presentation.
> Picovoice is a team of applied scientists and engineers who strive to build a future where our lives are enhanced with ambient voice AIs, while respecting your privacy.
Hello. Thanks for the comment. We have plans to open source two more products this year. with similar licensing compared to Porcupine.
1- Speech-to-Intent: It allows you to issue complex voice commands in a specific domain and in turn returns the intent. For example, in the case of a coffee maker you can say "Please may I have a single shot espresso with no milk and two sugars". The engine returns a JSON-like object with
{"product": "espresso", "milk": "no", "sugar", "two", "# shots": "2"}. It is a tightly coupled domain-specific speech recognition and NLU. It is small (less than 3MB and 8% CPU usage on RPi3) and ideal for home automation, industrial application, service industry, etc.
2- Speech-to-Text: It is large vocabulary speech recognition software that runs locally. It will support all platforms currently being supported. It allows you to do large vocabulary transcription with high accuracy locally on an embedded platform.
I wonder how this compares to snips[1]. I recently connected snips to my lights and it works pretty flawlessly. Picovoice does look like its easier to integrate into an app though.
The cool thing about this engine is that it is tiny. It uses less than 8% CPU on RPi3 and altogether it is less than 2MB (code, model, etc). Technically you can run it on something much smaller and cheaper than RPi.
Alternatively, the speech-to-intent engine could be a good candidate. More information on this along with an interactive demo will be released this weekend.
I would love to see some benchmark numbers on Picovoice. The small size and low CPU is definitely interesting but I'm worried that performance is hindered because of this. Also, is 8% peak usage? Having this loaded on some small IoT chip will be awesome to see.
We do work with a couple of SoC manufacturers and will disclose some of the results when our partners are ready. In general, we can run on any MCU with a C compiler and 200KB of RAM (maybe less if there is fast FLASH available). We already of models working on ARM Cortex-M and Cadence's HiFi4.
I would love to see Picovocie on ESP32. We've looked into this as we get many requests for Picovoice on ESP32. The challenge is to find a commercial request at a reasonable scale to cover the porting effort. I suspect it should be less than a month of work on our side.
The voice control comes in two variations standard and tiny. The tiny one consumes even fewer resources. I provided metrics for the standard one. You can check the benchmark repo on benchmark it yourself as well :)
https://github.com/Picovoice/wakeword-benchmark
So I had a quick poke around on the wake word github repo but it looks like you cant generate custom keywords for the raspberry pi. Is that correct? So you have to use the pre-built ones in the resources directory? There seem to be a lot files in there but how do I know what each one does?
Yeh I think I understand now. The filename of each file in the resources folder represents the hot-word that's detected. When you specify multiple files it'll give the index corresponding to the file and hence the word that was detected. So the problem is that since you don't support generating your own hot-words for raspberry pi, you are stuck with the small random set of words in the repo. That's kind of a huge limitation. So while I'm sure this is a great project for use with x86 and mac, it's a non-starter for me. Presumably proper support for raspberry pi is coming at some point so I'll be sure to check back.
I think whichever privacy centric voice assistant will come to market first will build an unstoppable first mover advantage. There are so many people just waiting for an alternative to the Googles/Amazons of the world.
So you said in your only other comment. Hacker News is a community site and it's not just to promote something, so could you please participate community-style instead?
Just took a closer look at the examples and played around with some code. It works really well and a surprisingly low footprint. Basically doing exactly what is promised.
However i wonder why there is no way (visible?) to generate words for Javascript? Or at least a documentation on how the format for those byte arrays is build.
Assuming this is a licensing thing, i would really suggest to not put limit in that way. On first impression i assumed that this is useable for free for everything except commerical projects.
Came here to say this.. Had a great idea for using this to help train pilots to do the RT (i'm doing my PPL now, and this would be perfect..)
The lack of JS optimizer means i can't really hack on what, could have become a nice passive income site.
You can use this for free under Apache 2 license for a personal non-commercial use. No licensing needed. You don't need to pay us a penny as long as you don't make money off it.
This is my question too. I looked at Snips but they had no timeline on Windows support and were more interested in talking about their crypto coin than answering basic questions about their software.
I had pretty good luck reaching out to the developers on Discord these past couple weeks. Not sure when you tried but I recommend trying it again. I personally I think snips is way more appropriate on something embedded like a RPi. I was up and running with their new sam package within minutes. Their new update (about two months?) really made it more user friendly. Unfortunately, their windows packages is still very broken.
E: You should probably try in the late night or early morning time for the States (EST) since they are located in Europe I believe.
I'm currently running snips on a RPi 2 (Model B) and it's working so far so good! Not too sure about the original Rpi though. I'm debating on getting a RPi 3 to see if the performance is better.
I was on the Discord channel. Nobody answered my question. In fact, here was my chain of asking questions on Snips:
I went to the website, and had questions, so I tried to email them. I got a reply to the email telling me to join the Discord. I did, and asked in the Discord, and the only person who bothered to reply told me to read the website. ...I wouldn't have asked the questions if they were answered on the website.
the project you mentioned is wake-word for ARM only. Great project, BTW. For wake word, we provide on-demand model generation. We do also more than wake word. Finally, we can run on other CPUs/OSs as well.
How about numbers or complex/unknown words? I would love to proxy everything that isn't known AFTER the offline trigger word to Google voice recognition or something. Is that possible?
Numbers and complex words (I am assuming you mean something like "ok blah"?) are doable easily. I am not sure what you mean by unknown words. Could you elaborate?
Obviously, you can just grab the audio stream after the phrase is detected and route it to whatever you like. Google ASR or even a local one running on the device!
This seems really neat, but my main, blocking issue is that there is absolutely no way to add pronunciation for a wake word. Therefore, if the developers have not explicitly added the word to their vocabulary, you are completely out of luck.
If you have a commercial application we can build you the model for any wake word. Since this requires some engineering work on our side we can only offer it to commercial customers at this point.
Why not make it possible for anyone to train it instead of you having to do engineering? Is that the profit model? If so, totally valid, but I'm wondering if I'm understanding this right.
Agreed. The reason for not supporting grammar in this product is that we want to keep it extremely lightweight. We do have upcoming products that support grammar as well.
Which would you vote for picovoice or snips as a voice assistant AI product? I've been meaning to do more research and maybe your comments can help gain more insights.. thanks
The demo is using WebAssembly which is supported by Chrome. It also uses Web Audio API which I believe is again supported by Chrome. I just used the demo on my Android phone using Chrome. I wonder what could be a problem. It the mic on? :) If yes, maybe provide me the version of chrome you are using. I will look into it.
The keyword generating application: https://github.com/Picovoice/Porcupine/tree/master/tools/opt...
The keyword generating application is free for personal use only and requires a license for commercial use, but there's no pricing available and it only provides an email address for contacting about purchasing a license.
If anyone from the team is reading this, the information above should be front and center on the landing page. I would guess that 99% of your site visitors are going to bounce on your landing page because the relevant information is buried so deeply.