Audio to Text

Turn any audio into an accurate text transcript

Voice memos, lecture recordings, seminars, podcasts, interviews — upload the file and Scholarly converts your audio to text: an accurate, timestamped transcript you can read, search, and study from.

Explore audio recordings

Free to start · No credit card · 70+ languages

Updated June 2026

Used by 150,000+ students worldwide
Quick answer

How do I convert audio to text?

Upload your audio to Scholarly — MP3, M4A (iPhone voice memos), WAV, WebM, or OGG, plus video files like MP4. Scholarly's speech-to-text turns the recording into an accurate text transcript, with timestamps and separate speaker turns so you can see who said what. The transcript stays linked to the original audio, so you can click any line to hear it back — and from there one click turns the same recording into study notes, flashcards, or a quiz.

  1. 1Upload the audio file (or a video — Scholarly converts its audio track to text).
  2. 2Scholarly transcribes the speech into accurate text with timestamps and speaker turns.
  3. 3Read, search, or correct the transcript, then turn it into notes, flashcards, or a quiz.
What audio works

One tool to convert any recording you collect as a student to text

If it's spoken and you can save it as a file, Scholarly can convert the audio to text.

Voice memos

The voice memo you recorded walking home — converted to readable text you can skim in seconds instead of scrubbing through audio.

Lecture recordings

Full class sessions, including 2–3 hour lectures, turned into a complete, timestamped transcript you can search for any term.

Seminars & tutorials

Discussion-heavy sessions where the exact wording of an argument or counterexample matters — captured verbatim, with speaker turns.

Podcasts

Course-relevant episodes become searchable text, so you can quote the exact passage instead of re-listening to find it.

Interviews & fieldwork

Research interviews and oral histories transcribed with speaker labels — the readable text you actually cite in a write-up.

Office hours & study groups

The places where the real explanations happen — converted to text so a half-remembered answer becomes a line you can read back.

Example output

What does the transcript look like?

Here's the shape of the text Scholarly produces from a typical recording — a 38-minute lecture clip on enzyme kinetics, transcribed with timestamps and speaker turns.

enzyme-kinetics-lecture.m4a

38:12 · lecture recording · converted to text by Scholarly

[00:00] Professor02:10

  • "As substrate concentration rises, reaction velocity climbs and then levels off at Vmax — the active sites are saturated."
  • "Km is the substrate concentration at half of Vmax. A low Km means the enzyme binds its substrate tightly."
  • "We're assuming a steady state — the enzyme–substrate complex forms and breaks down at the same rate."

[14:45] Student14:45

  • "So with a competitive inhibitor, if I just add more substrate, does Vmax come back?"
  • Professor: "Right — Vmax is unchanged, but the apparent Km goes up because you have to outcompete the inhibitor."

[27:30] Professor27:30

  • "The Lineweaver–Burk plot is just the double-reciprocal — it straightens the curve so you can read Vmax and Km off the intercepts."
  • "Competitive inhibition shares the y-intercept; non-competitive shares the x-intercept. That's how you tell them apart on the plot."

Every line keeps its timestamp and speaker label, so you can click to hear the exact moment back. From this transcript, one click generates structured notes, flashcards, or a practice quiz from the same recording.

Deep dive

What audio formats can I convert to text?

MP3, M4A (the format iPhone voice memos use), WAV, WebM, and OGG upload directly. Video files like MP4 and MOV work too — Scholarly converts the audio track to text and ignores the picture. Long recordings are fine, including multi-hour lectures, and you don't need to split them first.

Should I keep the text as a transcript or turn it into notes?

Converting audio to text gives you the verbatim record — every sentence, in order, with timestamps and speaker turns. That's what you want when the exact wording matters: a quote for an essay, the precise phrasing of a definition, or who said what in a seminar. If you'd rather have the recording condensed and organized by topic instead, use Audio to Notes — it runs the same transcription underneath, then summarizes it into structured study notes. Either way the full transcript stays attached, so you never lose the source text.

How accurate is the text on messy, real-world audio?

On clear audio, accuracy is high — clean lecture and interview recordings come back close to word-for-word. The transcript is timestamped, so when a single word looks off you can click straight to that second and confirm it against the audio in a couple of taps, rather than re-reading the whole thing.

What genuinely hurts accuracy is the recording, not the model: a microphone far from the speaker, heavy crosstalk, thick accents on top of background noise, or lots of specialized jargon all degrade the text. For anything high-stakes — a number, a name, a technical term you'll quote — verify it against the audio at its timestamp. It takes seconds, and you're checking against what was actually said.

What can I do once my audio is text?

The recording becomes a source in your Scholarly workspace, so everything builds on the transcript without re-uploading: search the full text for any term, turn it into clean study notes, generate spaced-repetition flashcards or a practice quiz, or ask the AI chat questions and get answers that cite the exact passage of the audio. Text is more useful than audio precisely because you can act on it — and reading a transcript while you study reinforces understanding far better than passively replaying a recording.

Need the verbatim text first? Our lecture transcription tool is built for exactly that — a searchable, timestamped transcript of a full class, with speaker turns kept intact.

FAQ

Audio to text FAQ

What audio files can I convert to text?

MP3, M4A, WAV, WebM, and OGG audio files, plus video files like MP4 — Scholarly converts the audio track to text. That covers voice memos, lecture recordings, seminar audio, podcasts saved as files, and interview recordings.

How do I convert audio to text?

Upload your audio file to Scholarly and it does the rest: the recording is transcribed into accurate, timestamped text with separate speaker turns. There's nothing to install — upload from your computer or your phone's browser, and the text is ready to read, search, and study from.

Do iPhone voice memos convert to text?

Yes. iPhone voice memos are M4A files, which upload directly and convert to text with no extra steps. Share the memo to your computer or upload it from the phone's browser — no format conversion needed.

How long can the audio be?

Multi-hour recordings are fine, including 2–3 hour lectures. Longer files take a bit more time to transcribe, but you don't need to split them — the whole recording becomes one continuous, timestamped transcript.

Does the audio-to-text transcript label different speakers?

Yes. The transcript separates speaker turns, so a seminar discussion, interview, or meeting reads as a clear back-and-forth instead of one undifferentiated block of text — which makes it easy to follow who said what.

How accurate is the transcript?

On clear audio it's high — close to word-for-word. Distance from the microphone, crosstalk, heavy accents, and background noise reduce accuracy, so for high-stakes details click the timestamp to verify a word against the original audio in seconds.

Can I edit or correct the text?

Yes. The transcript is fully editable, so you can fix a misheard name or term, and every line stays linked to its moment in the audio for quick checking.

What languages are supported?

Scholarly converts audio to text in 70+ languages, and you can generate study notes from the transcript in a different language than the recording — useful when you're studying in your second language.

Is converting audio to text free?

Scholarly is free to start with no credit card required. Paid plans (from about $12/month) raise limits for longer files and more uploads per day.

Pricing

Convert your audio to text

Free to start. Upload any audio file and get an accurate, timestamped transcript — searchable, editable, and one click from notes, flashcards, or a quiz.

Explore audio recordings
Save 60% with annual

Free

$0/month
  • 3 AI Chat messages per day
  • 3 AI creations per day
  • 1 file upload per day (8MB)
  • 5 quiz questions per day
  • 1 exam attempt per day
  • 15 voice minutes per day
  • 32-page PDF to flashcards
  • 500 autocomplete words per day

Use it to generate flashcards, improve a deck, make a podcast, create a video lecture or infographic, build slides, make a mind map or study guide, or process a recording.

Most Popular

Ultimate

$12/month

$144 billed yearly

Everything in Free, plus:

  • Unlimited normal chat & autocomplete
  • Unlimited premium model messages
  • Unlimited AI creations
  • Unlimited file uploads (up to 300MB)
  • Unlimited study sessions
  • Unlimited exams & quizzes
  • 1000-page PDF to flashcards
  • Export to Anki
  • Priority support

Pricing in USD. Local currency available in app.

Compare plans

Feature

Free

Ultimate

Normal chat

3/day

Unlimited

Premium chat

Unlimited

AI creations

3/day total

Unlimited

Video lectures

Uses AI creations

Unlimited

File uploads

1/day (8MB)

Unlimited (300MB)

PDF to flashcards

32 pages

1000 pages

Practice questions

5/day

Unlimited

Practice exams

1/day

Unlimited

Voice mode

15 min/day

1 hr/day

Autocomplete

500 words/day

Unlimited

Export to Anki

Included

Support

Standard

Priority

What students say

Scholarly has been a valuable tool for my studies. The AI-generated flashcards and intuitive features make organizing and retaining information much easier.

Briana

Briana

Student

This app is great for studying for big test. Drop your PDF's in the system and it'll do the trick. You can organize it specifically for your needs.

Kelvin

Kelvin

Student

I am currently preparing for a test that covers a substantial amount of material, and I've found that not having to physically write out my flashcards has been incredibly beneficia...

Isabelle

Isabelle

Student

Scholarly is great for students. I am enrolled in online university and my classes are all PDF based. All I do is upload the PDF and it creates flashcards decks for me. The greate...

Alexandra

Alexandra

Student

Your questions, answered

Is Scholarly free to use?

Yes! The free plan includes core study tools with daily limits: AI Chat messages, 3 AI creations per day, research reports, file uploads, quizzes, practice exams, and manual flashcard creation. Upgrade to Ultimate when you want unlimited AI creations and higher limits.

What uses my daily AI creation?

Generating flashcards, improving a flashcard deck, making a podcast, creating a video lecture or infographic, building slides, making a mind map or study guide, or processing a recording each use the same daily free AI creation allowance. AI Chat messages, uploads, quizzes, and exams have their own separate daily limits.

Can I cancel anytime?

Absolutely. There are no contracts or commitments. You can cancel your subscription at any time from your account settings, and you'll keep access until the end of your billing period.

What payment methods do you accept?

We accept all major credit and debit cards through Stripe. Pricing is displayed in USD by default, but local currency is available in the app.

Do you offer discounts for educators?

Yes, we offer special pricing for educators and educational institutions. Contact us at hello@scholarly.so for details.

What happens when I hit a free plan limit?

You'll see a prompt to upgrade. Your existing work is never lost — limits only apply to new daily actions like AI Chat messages, uploads, quiz questions, and new AI creations. Limits reset every day.

For Educators or Schools

Contact us for special pricing at hello@scholarly.so.