Questions about apps using voice recognition APIs

Asked 2 years ago, Updated 2 years ago, 136 views

I'm a beginner in programming.

I'm currently trying to create an app that I've done with Google Cloud Speech API.

Use this API to display the resulting words you have pronounced.
If there are multiple candidates (for example, when a user says save, the pronunciation is bad and I don't know if it's save or save), I'd like to display both save and save as a result.

For example, Google translate displays either save or save.
However, in an app called "Pronunciation Check" (using Google's API (probably speech API), bad pronunciation will show both save and save.

I would like to display the same results as the pronunciation check.

How can I return multiple candidates as a result, such as a pronunciation check?

java api google-cloud

2022-09-30 20:16

1 Answers

I'm not sure if it's the same as the Google Cloud Speech API you asked, but the following page provides an overview, so please refer to it.

Basic Cloud Speech-to-Text

Voice Synchronization Recognition Request

maxAlternates —(Optional, default 1) Indicates the number of voice transcription candidates to display in the response.By default, Speech-to-Text APIs display one of the most likely voice transcription candidates.To display other translation candidates, set maxAlternates to a value greater than 1.Speech-to-Text returns only translation candidates that are considered to be of sufficient quality.Conversion candidates are typically for streaming recognition requests because they are suitable for real-time requests that require user feedback (such as voice commands).

Speech-to-Text API Response

alternates contains a list of speech transcription text candidates of type SpeechRecognitionAlternates.Whether or not multiple translation candidates are displayed depends on whether the user requested multiple translation candidates (with maxAlternates set to a value greater than 1), and whether Speech-to-Text generated sufficiently high quality translation candidates.Each candidate conversion consists of the following fields:

  • transcript contains voice transcription text.See the audio transcription text processing below.
  • confidence contains values from 0 to 1 that indicate the reliability of a specific speech-to-text transcription text.See the confidence interpretation below.

Select Alternatives

Each result in a successful synchronous recognition response may contain one or more alternatives (if the maxAlternatives value of the request is greater than 1.If Speech-to-Text determines that a translation candidate has sufficiently high confidence values, the translation candidate is included in the response.The first possible translation of a response is usually the best (most likely) translation candidate.

Setting the maxAlternates to a value greater than 1 does not always return multiple translation candidates.In general, multiple conversion candidates are suitable for providing real-time options for users who obtain results through streaming awareness requests.

Asynchronous Requests and Responses

Asynchronous Speech-to-Text API requests for the LongRunningRecognize method have the same format as synchronous Speech-to-Text API requests.

The following are all responses after the request has been completed:~ Abbreviated ~ This type is the same as the type returned by the synchronous Speech-to-Text API recognition request.

Streaming Speech-to-Text API Authentication Request

config - (Required) Contains audio configuration information of type RecognitionConfig.This is the same as what you specify in synchronous or asynchronous requests.

Streaming Response

alternates contains a candidate voice transcription list.


2022-09-30 20:16

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.