LongCut logo

AI Dictation: It's About YOU, Not The App. A Look at Wispr Flow, Superwhisper, VoiceInk & Spokenly

By A Fading Thought

Summary

Topics Covered

  • No Single Best Dictation App
  • Speed Accuracy Now Standard
  • Simplicity Sacrifices Control
  • Context Awareness Powers Workflows
  • Mode Switching Enables Flow

Full Transcript

Hey guys, my name is Robert. A bit over a year ago, I made a video giving an overview of some of the most popular options available for AI dictation on Mac. At that time, this was a fairly new

on Mac. At that time, this was a fairly new subject, just gaining popularity, but now, one year later, it has absolutely exploded and it can get very confusing navigating everything we now have available, especially because everyone claims their app is the absolute best.

I want to share my thoughts on the current state of this topic and take a look at some popular and not so popular options. I do not believe there is a single app which is the best of all. In my

opinion, there is a certain standard that needs to be reached for a dictation app to be considered solid, but above that standard, everything else is a matter of little details in the workflow or implementation of features.

One point that many seem to miss is that this is about the user most of all, how you like to work, how you like to get things done, and what do you need.

So with this video, I want to help you figure out what kind of user you are. I will also share what features matter to me personally, and based on that, I will try to help you find which tool can actually make your life easier.

Just to set the stage, I want to quickly take a look at two very popular but very different options.

Whisper Flow, which is known for its extreme user-friendliness, and then there's Super Whisper, which leans into being more powerful and giving you a lot of control.

If you take a look at this on the screen right now, you can see that I'm dictating the same thing in both apps simultaneously.

I will switch to the larger window of Super Whisper so you can see this better.

The moment that I stop my dictation, I've set this up so that the result from Whisper Flow goes to my note and the result from Super Whisper will stay on my recording window.

I am not doing any cuts here. I want you to see everything in real time so you get a feel for what's happening in terms of speed.

Both can be super fast. Both are also very good in terms of accuracy.

But let's try to push this in a different direction.

Let me dictate with lots of filler phrases and mistakes.

Actually, when I'm dictating, many times I do not know that, no, many times I'm not exactly sure what I will be saying, or so I make many mistakes, and sometimes I stop to think,

or I decide to change my words, but in the end, what really matters is to have a good result.

Make that bold. Happy face emoji.

I dictated this with many mistakes on purpose.

This is something more realistic.

Both apps not only transcribed, but used AI to understand my message and clean it up.

Both took slightly different approaches, but overall, I'm okay with them.

I also inserted a formatting command in the middle that Super Whisper executed better.

And the emoji, well, it didn't work on Flow.

Let's test our detection of formats.

Hi Lucy.

Thank you so much for sending the project.

I will check with the team, and I will let you know what everyone thinks.

How about we have a meeting?

No, how about we meet again next week to discuss the following steps and maybe drink a cup of coffee.

Put that in parenthesis.

I'm available on Monday at 3 p.m. if that works for you.

Actually, on Tuesday at 5 p.m.

Thanks.

Robert.

Change Lucy to Jessica.

Okay.

Both applications detected the pattern, the reading, the message, and the closing.

They both corrected my mistakes, like the day and the time.

They both tried to follow my command on the change on the name and the parenthesis.

Well, flow again didn't do very well there.

But how about asking more editing?

Let's command a change of tone on something that I dictate.

I want you to notice that I am using the exact same setup for all of these tests that I'm performing.

I am not changing anything at all.

Dictation is amazing for productivity, but as creative, I still think that typing has its place.

Make this sound more informal.

As you can see, Whisper Flow starts to fall behind when it comes to follow commands mid dictation.

Specifically, when it comes to text generation or more complex editing.

I could use this example to tell you that Super Whisper is better, but actually that's not the case here, it's just that they are both being prompted differently.

If you are just getting started with this, what happens behind the scenes look like this.

Both apps go through a transcription process first, where your voice is transformed to text, and then this transcription is passed through a layer of AI processing.

The AI processing is the one that defines if commands from the user should be detected and followed, and how they should be followed.

Whisper Flow actually does have a command mode that allows for more text generation, and I am sure that it would have given a similar result as what I got from Super Whisper, but that's not my point here.

What I just wanted to show you is that in terms of speed and accuracy, they can both perform roughly about the same.

When it comes to results, they both can do an excellent job for intelligent cleanup of dictation.

Flow's dictation mode is not designed to handle some commands.

I threw at it, that's totally fine, but the crazy thing here is that all of this that I have just shown you, which for many users is a test to say that one app is better than the other, has very little to do with the true differentiation aspects between these apps.

To show you this, I can do the same kind of a test with awkward dictation in another tool.

No, in another application called Spokenly.

And that's what you are seeing.

No, that's what you are now seeing here in the bottom.

One, in terms of accuracy, it's great.

Two, in terms of speed, it can also be fantastic.

Three, in terms of following commands, it can do the same.

I must clarify, there are some minor differences when it comes to speed, a matter of milliseconds actually, but all of this at zero cost.

Make zero cost bold and in caps.

Now, Whisper Flow costs $12 a month, Super Whisper costs $8.25 a month, and it offers a lifetime option for $250.

But how is it that Spokenly can be used for free when it can perform about the same in all the dictation tasks that I showed you?

And just to clarify, yes, there's also paid options in Spokenly, but in this case, I was using open source models, which you don't have to pay for at all.

Zero.

Nada.

What I'm trying to say here is that AI dictation is now much more about what happens beyond just getting your words on the screen in a cleanly formatted way.

So that aspect that everyone is trying to sell you, saying that their app is faster than typing and accurate and that it understands what you're trying to say or that it is amazing for productivity, all of that is just part of the expected standard right now.

Knowing this, let's talk about what makes an app the right fit or not for you.

I want to tell you about the two main approaches this app takes because that's the first thing that sets them apart.

I want to explain this with two of the apps I just showed you.

First, we got Whisperflow.

This app is built for simplicity.

The whole idea is making AI dictation so easy that anyone can use it to replace typing.

You get some customization options, keyboard shortcuts, dictionary words, replacements, but this app is minimal on purpose.

It just has the absolute essentials when it comes to settings.

They are aiming for the widest market possible, so everything stays pretty straightforward.

At the same time, everything is a bit of a mystery.

It truly feels magical if you jump to this for the first time without having checked out any of the other alternatives, because you don't know what's going on.

I do have to say something important here.

A bit over a year ago, when Whisperflow was pretty new, there were a lot of questions and complaints about privacy regarding this app.

What kind of data was being captured?

How was it being used?

At that time, the team was not clear about this as well, and their privacy policy was a confusing mess.

Recently, however, I have noticed they have been proactively improving in this area.

And look, none of the existing apps is perfect.

What matters to me is that right now, at least, it seems like the team at Whisperflow is listening.

They are trying to communicate more, and I can see the improvements.

The app now has more settings that allow opt-out for data retention or training, and you can also turn off context awareness if you prefer.

They have also obtained certifications that ensure privacy for health professionals or corporations.

All of this to say that, even though I still find a lot of their marketing strategy too spammy, pushy, and I do not agree that AI dictation will or should replace old typing, I think that as a product that aims for the most user-friendliness, it's very solid.

On the other side of the spectrum, you have Super Whisper.

This app is a big contrast.

It's built for control, transparency, and power.

Instead of hiding things, it actually shows you everything that is being captured and sent.

It lets you run with local models or cloud models, which allow for a higher degree of privacy if you care about that.

You can toggle on and off a lot of options that will personalize your interactions with the app and the results that you get, which means you also have an incredible amount of flexibility.

With custom modes where you are in full control of the instructions sent to AI and with unlimited usage of AI included for the latest OpenAI or cloud models, Super Whisper can be used for more than simple dictation.

It can truly be a powerful AI assistant that you can deeply integrate into more advanced workflows.

In short, this app puts the user in control, letting you customize things to an impressive degree.

That's the fundamental difference.

I must say that I still think simplicity is a characteristic of good app design.

I am not against this.

It's just that apps that follow the line of Whisperflow put user-friendliness at the very core of everything and sacrifice some customization to achieve it.

Apps that follow the line of Super Whisper start from a different place.

They build for power and flexibility first, then they try to make that complexity manageable.

Both approaches are valid, it just depends on what you value more.

Having things work without any tweaking right out of the box, or having the freedom to shape the tool around your own particular workflow.

So when you are looking at all of these AI dictation tools I will mention, the first thing is to figure out what kind of user you actually are.

This is more important than you may think, and it can save you lots of time and frustrations down the road.

So what do we have in the first group?

Whisper Flow is the most popular in this category, but there are other good alternatives.

Willow Voice keeps the same minimal approach, and what sets it apart is that it learns from your corrections.

It captures edits that you do on the results and uses this to adapt to your style.

Aqua Voice takes things a step further with an optional mode where you can see the text streaming in real time as you dictate.

I really like that the default dictation of Aqua Voice allows for the kind of formatting commands as I was showing you in the beginning of the video.

And all of that can optionally show up right in this floating panel if you have it active.

It's actually pretty cool, put that in parenthesis.

It's just a bit slower when used like this.

It also uses Avalon, their own transcription model that claims to be more accurate with technical terms. AquaVoice also lets you enter custom instructions that apply based on which app you are using.

It feels like this app is stepping a little bit into advanced territory because of the features it has.

On the other hand, Aqua does not currently have a command mode that you can find in both Whisperflow and WillowVoice.

This is what allows you to do stuff like, hey, write me an outline for one article on AI dictation.

Now, an important thing to consider is that Whisperflow, AquaVoice, and WillowVoice, they are all subscription-based, and it makes sense because they use cloud models for transcription and processing.

This is what allows them to perform with the speed and accuracy they need.

Actually, a lot of what they do behind the scenes, like I was saying before, it's still quite a bit of a mystery, so there's not a lot of information out there about how processing happens, what specifically is being captured, and how.

This means that when it comes to privacy or handling of data, you basically just have to trust them.

Because of that, I was happy to find an open-source alternative that takes a similar approach but with way more transparency.

It's called E2AI.

Right now, the app is in a very early stage.

It only does basic clean-up and has a command mode, but it's getting better fast and it should catch up to the others soon.

The main thing here is that, since E2AI is open-source, you can see exactly what's happening under the hood.

Currently, it's free and there's instructions available so you can even build it yourself.

Now, let's switch to the other group.

I have been using Super Whisper for over a year and I should probably admit upfront that I am a bit biased here.

My whole setup already depends on it so it's tough for me to be completely neutral.

It's not perfect though.

I actually have specific frustrations and that's what has made me look around at other options.

I'll get into those later.

But for now, I want to focus on what matters to me as someone who needs more control and customization.

For users new to this, maybe apps like Spokenly or Voicing that I will cover in this section could be enough, but I'm looking at this from a different angle.

I am testing this against what I already know that worked for me, so I am using SuperWhisper as a standard here.

First, let's talk about context awareness.

This for me is one of the most important advanced features.

It's about the app knowing not just what you are saying, but also receiving information about what you are working on and providing that to AI so you can get better results.

Whisper is very strong here and it handles context capture in a very sophisticated way.

It captures three main types of context: selective text, application context, and clipboard context.

The amazing part here is that it captures this at different specific timings.

I have a video that explains all of this in detail, but in summary, this allows you to craft pretty complex requests.

For example, let's say: Hey, I need you to grab my selected text and give it the same kind of formatting you find in my application context.

As you noticed, I started my dictation with a selected text and stopped dictating in another app.

Then I got the result I expected.

Everything formatted in bullets.

After testing with voice ink, I was surprised to find that there is a bit of a difference with timing, but it can actually do the same.

Hey, I needed to grab my selected text and give it the same format you find in my application context.

With voice ink, I have to end my dictation where I have my selected text, which feels a little bit strange because the result is gone and it's only on my clipboard.

It's okay.

I just have to paste and the result is about the same.

One thing I personally don't like is how voice ink deals with clipboard context.

It will always include it, whereas in Super Whisper, I am in control of whether I want to copy something or not during dictation.

That's just a minor detail.

Something that is more important to me is that application context in Super Whisper is more powerful for many of my use cases, which have to do with grinding and developing content.

Both voicing and spokenly will only capture a screenshot of your current app window, but Super Whisper will capture all the content that you have in your active input field.

What do I mean by this?

Okay, let me show you.

Hey, tell me the name of the person in the first sentence.

See, it grabbed my name, and above is actually part of the script of the video you are watching now, and there is my name on the first sentence.

Super Whisperer can grab all of this, but now let's try the same with voicing.

Hey, tell me the name of the person in the first sentence.

It fails to give me a good response because it's only seeing what I am seeing.

The way Super Whisperer does it is powerful because it lets you do things like summarize a long document you are working on, give you a critique on a piece you are writing.

You can be dictating an article and AI will understand everything else you have dictated so far so it can help you make a more cohesive piece of work.

When I'm coming up with ideas for the videos like the one you are watching now, many times I dictate a long stream of consciousness and after I get that in my Notes app, I ask: "Hey, make me a simple outline from what you see in my app context.

I need to structure my thoughts better."

By the way, guess what?

At the time of making this video, the only other A dictation app I have tested that currently allows me to do something like this is Whisperflow in Command Mode.

And you can say: "But Robert, you are not using this for dictation anymore."

And that's a very good point.

I'm the kind of person who looks at a tool and tries to think beyond what was this designed to do.

Instead, I look at this from a "what can I make this do" perspective.

I think Superwisper is the top option for me because of all the possibilities it presents when approached like this.

In this case, the app is great not only for dictation, but also for AI assistance.

This is also the reason why I was a bit heartbroken when I found that spokenly the other app I briefly introduced earlier was part of the Mac App Store.

That is because this app, which I also want to tell you a bit more about, has a philosophy that I really connect with.

It has so many advanced settings and customization options, perhaps even more than Voicing and Super Whisper.

Sadly, by being inside the App Store, it means that it is sandboxed and it doesn't have access to all of these other APIs for accessibility that allow for deeper integration with the system.

I know that for some users, downloading an application like this from the App Store will give them a lot of peace of mind, but when it comes to context capture, this is also the reason it's very limited, but it can do one thing very well: it gives the user absolute control over the system prompt.

This is another aspect that really matters to me.

The system prompt is basically the set of instructions that tells AI how to behave, what to do with your words, how to format them, what rules to follow.

It's the behind-the-scenes guide that shapes every result you get.

This is probably the biggest reason I lean towards these three apps over the simpler ones.

With Whisperflow and similar tools, the main system prompt is locked down.

You cannot see it or change it.

Yes, some of those apps do allow for a bit of custom instructions, but you are still very limited on what you can do, even with the ones that offer a command or assistant mode.

Spokenly, Voicing, and Super Whisper all give you this flexibility in a bit of a different way.

Spokenly could potentially give you the most power in this area because while editing the system prompt, you can insert placeholders for the different context types available.

This is pretty awesome.

Unfortunately, the context limitations I mentioned earlier because of the App Store sandboxing hold it back.

In Voicing, when you edit a prompt, you have a toggle that will define if you want your instructions to be attached to the main dictation prompt or if you want total control.

I really like this approach. It's very simple and I think that it avoids a lot of confusion that I see from users in Super Whisper.

Super Whisper takes a different approach. It

offers some presets when you create a mode.

You can choose between Messages for basic cleanup, email formatting, meeting summaries.

This is meant to make the app more user-friendly for people who don't like to mess with custom instructions.

But you can also build custom modes from scratch, which is what I do for the most part.

Something that users have to know is that, unlike spokenly or voicing, where you have the ability to use your instructions attached to the original dictation system prompt, here we do not have that yet.

Let me show you what that means in action.

In spokenly, I go to my custom instructions and add it.

Always output the dictation in a very professional tone.

Now let me dictate with this, and it shows up with something.

The result is transformed as expected, but let's try the same with Super Whisper.

I create a custom mode, and I add the exact same instructions.

Now I will switch to that mode, and I will dictate, and it shows up with something.

What's happening here? The application is working as intended, it's just that the AI model doesn't know that I'm trying to get this text transformed into something else.

In spokenly, I would get a similar result if I go into Advanced Settings and change the actual system prompt. This is what trips some users off, because yes, you have an amazing amount of flexibility, and honestly, with the behavior of context captured, like I showed you, plus this flexibility with prompting, I think Super Whisper does give you more power at this point of time than any of the other apps.

By setting your custom instructions correctly, you can also imitate some of the behavior of the apps in the first group, as I was showing you in the beginning. But if you don't know how this works, you can also end up very frustrated because you may dictate something that sounds like a request, and the AI will try to answer you, or you will not be getting the results that you expect. I made

a short class about this on Skillshare. In case

you want to master the basics of prompting Indictation apps, I will give you the link to that in the description. I also wrote a short guide that is up on Super Whisper documentation.

In short, all three apps I'm covering in this group handle this differently, but in Super Whisper, you may need to put in a bit more effort. I know there are plans to improve this in

effort. I know there are plans to improve this in the future. By the way, I will be giving

the future. By the way, I will be giving you the assistant prompt that I have been showing you so far. It's a work in progress, but what I have been trying to experiment with is creating one flexible prompt that I can use for multiple things. For example, I am sorry, but I cannot

things. For example, I am sorry, but I cannot make it tomorrow. How about we reschedule, Robert?

Make this an email for John. In this case, I dictated my main content, I commanded a specific type of formatting, and I got my result. Now, I

can jump in there and add a little bit more without changing anything in my setup. Something came up with my family. So this prompt that I have created is pretty good at recognizing whenever I am commanding something about the format or when I am dictating content. It does it all according

to instructions I have set myself. And since we got selected text context in Super Whisper, I can even do this. Make this more casual.

Now, for that example I just showed you, I did not switch between different instructions at all, but normally you will want to be switching. This

will allow you to set targeted instructions for better, more consistent results. It will allow you to set specific AI models for specific tasks, or you can even customize more options within each mode. That is why I think it's very

each mode. That is why I think it's very important that I talk a little bit about the user interaction when switching modes. This is where I found the most friction trying to adapt my workflow to VoiceInc or Spokenly. All three

apps let you set automatic rules for auto-switching based on which application you are in. With Spokenly,

you need to have more than one AI prompt for this option to appear. In VoiceInc, you set this when you are creating a new power mode, and in Super Whisperer, you also got that under each mode settings. Let's say you want a specific rule for

settings. Let's say you want a specific rule for a specific tone formatting when you are dictating in Slack, or you have a different way of communicating in WhatsApp. Maybe you always want dictations formatted

WhatsApp. Maybe you always want dictations formatted as emails in your mail app. You can set up stuff like that. I normally do not use this feature

like that. I normally do not use this feature because in the same app that I use for note-taking, I write emails, I write outlines, I summarize stuff, so I prefer switching manually when I have to. I like to be in absolute control. In Spokenly,

to. I like to be in absolute control. In Spokenly,

you can set keyboard shortcuts for each separate AI prompt, but here's the problem that I have with that. Dictation should empower your flow

with that. Dictation should empower your flow state.

By that, I mean that it should be as non-disruptive as possible. If I'm about to start dictating, I

as possible. If I'm about to start dictating, I don't want to think. What's the shortcut for writing emails? What's the shortcut for outlining

writing emails? What's the shortcut for outlining an article? So, I do have a personal issue with

an article? So, I do have a personal issue with the way that Spokenly does it.

Voicing improves this with the ability to set one keyboard shortcut to always start or stop dictation.

Once you start, you switch with modifiers plus numbers. I really like that it's the same one

numbers. I really like that it's the same one action to start or stop dictating.

I also like that if you don't set a default power mode, your choice stays active. This means that whatever mode you used the last time will stay for the next time. But switching with modifiers and numbers feels very awkward to me. You also

have a very confusing separation between AI enhancements and power modes. On the other hand, in this app, you have something cool, which is trigger words. This means you can start dictating

trigger words. This means you can start dictating with a specific word and your text will be automatically processed by the AI prompt you have associated with that, but you run into the same issue as keyboard shortcuts that I was telling you with spokenly, having to decide what word you will have to say the exact same moment you are about to start dictating. This breaks a

lot of my concentration and flow. So, we have many different options to set this up, but the overall functionality design could be much better.

I think that Super Whisper made a genius decision setting mode switching as separate from dictation.

It's two separate processes. So, if you want, you can start and stop dictating with your custom keyboard shortcut in a similar way as VoidSync.

You can also switch between modes after you start dictating. Your last chosen mode will carry over

dictating. Your last chosen mode will carry over to the next interaction. Same thing. What makes

all the difference for me is that you can switch modes before you start dictating. You can do it directly within the interface of Super Whisper, or you can do it via deep links, which allow for a lot of automation possibilities that I will not be getting into in this video.

But this means that you can switch modes with Alfred, Raycast, or stuff like Better Touch Tool, Keyboard Maestro, and more. The reason this is very important for me as someone who cares about focus and productivity is that every time my brain decides to start working on something, my fingers already did the change. Not when I'm about to start talking in the microphone, but the

of making that internal switch, my creative side connects with my hands. It's super intuitive and I haven't found that anywhere else. This also

gives me some more advanced flexibility in my workflow because this means that I can dictate one time and test that recording through different modes. This is me and I am dictating

different modes. This is me and I am dictating without any AI processing and you can see, I mean, you can notice how the raw dictation looks like. When I'm using AI directly in something like ChatGPT or Altar, I

use this without any extra processing because it's the fastest. It is messy, but AI has no trouble

the fastest. It is messy, but AI has no trouble understanding me. But with Super Whisperer, once

understanding me. But with Super Whisperer, once I dictate it, I can switch modes with one key press. If I start a new dictation right now, it

press. If I start a new dictation right now, it will be cleaned up, but I can also reprocess what I just dictated. Okay, we've got a cleaned up version, but say that I want to run it through another prompt to translate it into Spanish. I

just do the switch, and I run the same audio through it.

None of the other applications allow you to do this, and it's all related to the disconnection between the mode switching and dictating actions.

It's super handy because sometimes you can run the same dictation against multiple prompts until you get what you need, or this also allows you to quickly transcribe the same thing with different models to check their speed or accuracy. With

other apps, you have to open the application, go to the history panel, and choose how you want to reprocess a specific dictation. Having this at your fingertips is an incredible time saver. So when

it comes to the main interaction with the app, which is mode switching, for me, Super Whisper wins here. Now,

I want to briefly touch on one more aspect that matters to power users, AI models. There's two sides to this, transcription models and LLM processing.

When it comes to transcription models, both spokenly and voicing offer way more options than Super Whisper. Let me tell you that technology

Super Whisper. Let me tell you that technology in this field is moving incredibly fast, and I expect we will eventually reach a point where all models are fast, smart, and accurate across multiple languages. We are getting close, but we

multiple languages. We are getting close, but we are not there yet. Right now, different models are good

there yet. Right now, different models are good at different things. Some transcribe better in certain languages, and some prioritize speed over accuracy. Throughout this video, I have been

accuracy. Throughout this video, I have been using Super Whisper's S1 voice model for transcription.

It's a cloud model based on Whisper that's incredibly fast, almost instant for quick dictation.

If you are on voicing or spokenly, you've got a local model called Parakeet, which is also crazy fast, but the accuracy is worse. I would say, for people who are okay with Whisper models and their language support, or who are fine with Nova models, which, by the way, include real-time transcription and speaker separation, then you will be fine with Super Whisper. But the

truth is, there's constant competition with new models coming out all the time, and this app does not incorporate them as quickly as voicing or spokenly. This area that has to do with AI models also plays a role with the pricing of each app. Voicing is a one-time payment at $25. It doesn't include cloud models

usage in this pricing. You pay for that with your own API keys. After purchase, you could also use it with local models at no cost, of course. Spokenly can be used totally for free

course. Spokenly can be used totally for free in this same way, with local models and with your own API keys. The difference here is that you don't have to pay anything if you go this route. This app will only require a subscription

route. This app will only require a subscription payment if you wanted to manage the access to all the AI being used. That way you don't need to deal with API keys and stuff like that.

Finally, SuperWhisper offers both a subscription and a one-time payment option at $250, which sounds like a crazy difference compared to the other two, but here's the thing. SuperWhisper

is the only option in AI dictation that provides true unlimited access to the latest models from OpenAI and Anthropic for processing your dictation or transcribed audio files. That can be a massive advantage. So I'll say that right now, SuperWhisper is my personal top choice because of the combination of features that I have talked about. There's a lot more I haven't covered, like

about. There's a lot more I haven't covered, like all the automation possibilities if you get MacroWhisper that I built for this. I got other videos on that topic. Something to consider is that even if Voice Inc. eventually matches SuperWhisper's features for app context and makes the interaction more intuitive, or if spokenly, leaves the app store and becomes more powerful, I really doubt they will

ever offer unlimited AI processing at a one-time payment the way SuperWhisper does. For many users who know what unlimited SONE or unlimited CHGPT means, this starts to sound too good to be true.

And it is. But here's another important thing.

Most people don't actually need these premium models in a dictation app. Throughout this video for LLM processing, I have been using a model called Kimi K2 through Grok, which is totally free because it's open source. For quick dictation tasks, I don't generally go for open AI or anthropic models at all because, yes, they are smarter,

but slower. The unlimited AI processing becomes

but slower. The unlimited AI processing becomes most useful when you combine it with SuperWhisper's other features and start to use it as an assistant or for more complex tasks. It's also

great if you are processing longer dictations and don't mind sacrificing some speed for better results. Finding this app has been quite life-changing

results. Finding this app has been quite life-changing for my learning of AI and it has revolutionized how I use my computer, but if you are just planning to do quick dictations and format emails or bullet lists, maybe, just maybe you'll be fine with something else. As I start to grab up this video, I want to give you a quick summary

or rather a guide to help you figure out what right for you. If you are still watching this far, but feel like all you need is simplicity and user friendliness above anything else, then go with apps in that first group. Go with AquaVoice if you need some customization with instructions. WillowVoice

may be a great alternative if you want some of that personalization in terms of writing style, but you want it to happen automatically. I would say WhisperFlow also is a very solid choice, especially now that they have fixed a lot of the communication or privacy issues that I saw in the past. I also really like the context

the past. I also really like the context awareness that's available for command mode in WhisperFlow. You can do powerful stuff with that.

WhisperFlow. You can do powerful stuff with that.

Any of these apps will work well if you want something super straightforward and don't mind the subscription tag. There's also the open source Ito AI that I mentioned. It might be enough for you.

At this point, it's very basic, but it's free and it will give you fast dictation with a general cleanup.

The main limitation here is that it doesn't have as many extra tricks compared to the other apps yet, but I think you will get there soon. Another

option for users that are okay with something simple is Alter.

This app is primarily a very powerful AI assistant and you can get access to its voice features for $29 a year.

I would say that the dictation itself is more basic than anything I have mentioned so far, but if you are interested in the assistant capabilities or need transcription with speaker separation, then the dictation becomes a nice little extra.

I have covered more about Alter in a few other videos if you want to learn more about it.

If you want to put in a little bit more effort and level up from the basic options, I think Spokenly would be a good entry-level app for this category.

If you want to use your own API keys and local models, you can use it for free.

If you pay for it, you don't have to deal with any of that.

This app actually does have some more advanced features that I didn't mention in this video, like the triggering of scripts, which allows for some automation and you can set more parameters for the LLMs. It's just that for me, it always hits those sandboxing limits.

If this app eventually implements AppContext correctly or comes up with a version outside of the App Store, it could truly expand its capabilities and jump to the next level.

But I'm not sure if we will ever see that happen.

For now, though, I think it's the best middle ground.

For power users who love to customize, Voice Ink is solid, but it's not my personal top pick right now.

The workflow choices with the interface feel awkward, the separation between enhancements and power modes is confusing, and design-wise, it just doesn't feel well thought out.

There are too many small friction points that add up, things that make you pause or second-guess yourself when you should be focused on your work.

At this point, I would find it very difficult to switch.

If you haven't used Super Whisper, maybe it's good enough for you, especially if you don't need the unlimited AI models that Super Whisper offers.

Super Whisper continues to be my recommendation for users who care about personalizing their workflow.

The unlimited AI side definitely sets it apart, but it's all the other details in the implementation of features that actually allow you to take advantage of that power, and it can expand into so many different areas.

If you care about productivity, automation around your system by voice, you are into deep work, or if you are interested in exploring the possibilities of AI across anything that has to do with voice or text, I still think that there's nothing that has reached this level yet.

Let me be real with you about something.

Part of the reason I have been testing all these other apps is because Super Whisper went through some changes that really frustrated me.

The settings panel got redesigned a few months back and it made everything harder to navigate.

Some things that I have always considered little secret features have ended up being non-intentional and removed.

Plus, now there are these notifications that show up every time you switch modes.

I hate them so much and find them really distracting when I'm trying to focus.

See, I don't mind changes, but I think that they should not affect the overall workflow that people are already used to.

If changes will be implemented, then there should be the option to preserve the previous behavior and stuff like this.

Sometimes I get the impression that this app is caught between being a power tool and trying to compete with simpler options like Whisperflow.

I don't think that should happen.

They are very different markets.

What's unfortunate is that because of this identity crisis, it seems like Super Whisper sometimes struggles with giving users more settings and control.

That's a shame because power and flexibility are its biggest strengths.

Super Whisper is also a small team working on Windows and iOS versions at the same time.

So the focus and communication with users could be much better.

That said, I have seen several improvements lately.

There's been fixes to the interface.

The app feels more stable.

I have talked with Neil, the developer, a few times, and I feel like he is listening more.

The last few months have definitely been bumpy, but I still believe where this is headed.

After spending time looking at other alternatives and seeing how they do things, I can definitely see how Super Whisperer is not the best option for everyone, and I hope that everything I have shared here can help you figure this out for yourself.

As for me, I'm happy with this app and hope to continue making more content about how I use it in the future.

I just really need those stupid notifications gone.

Please.

Thank you so much for watching guys.

If this video was useful to you and you decide to try Super Whisper, I hope you all consider using my affiliate link.

It would support me in making more content like this.

Remember, you can always sign up to my newsletter for more updates on what I'm up to.

I'll see you in the next one.

Loading...

Loading video analysis...