I'm a beginner in programming.
I'm currently trying to create an app that I've done with Google Cloud Speech API.
Use this API to display the resulting words you have pronounced.
If there are multiple candidates (for example, when a user says save, the pronunciation is bad and I don't know if it's save or save), I'd like to display both save and save as a result.
For example, Google translate displays either save or save.
However, in an app called "Pronunciation Check" (using Google's API (probably speech API), bad pronunciation will show both save and save.
I would like to display the same results as the pronunciation check.
How can I return multiple candidates as a result, such as a pronunciation check?
java api google-cloud
I'm not sure if it's the same as the Google Cloud Speech API you asked, but the following page provides an overview, so please refer to it.
Voice Synchronization Recognition Request
maxAlternates
—(Optional, default 1
) Indicates the number of voice transcription candidates to display in the response.By default, Speech-to-Text APIs display one of the most likely voice transcription candidates.To display other translation candidates, set maxAlternates
to a value greater than 1.Speech-to-Text returns only translation candidates that are considered to be of sufficient quality.Conversion candidates are typically for streaming recognition requests because they are suitable for real-time requests that require user feedback (such as voice commands).
Speech-to-Text API Response
alternates
contains a list of speech transcription text candidates of type SpeechRecognitionAlternates
.Whether or not multiple translation candidates are displayed depends on whether the user requested multiple translation candidates (with maxAlternates set to a value greater than 1
), and whether Speech-to-Text generated sufficiently high quality translation candidates.Each candidate conversion consists of the following fields:
transcript
contains voice transcription text.See the audio transcription text processing below.confidence
contains values from 0 to 1 that indicate the reliability of a specific speech-to-text transcription text.See the confidence interpretation below.Select Alternatives
Each result in a successful synchronous recognition response may contain one or more alternatives
(if the maxAlternatives
value of the request is greater than 1
.If Speech-to-Text determines that a translation candidate has sufficiently high confidence values, the translation candidate is included in the response.The first possible translation of a response is usually the best (most likely) translation candidate.
Setting the maxAlternates
to a value greater than 1
does not always return multiple translation candidates.In general, multiple conversion candidates are suitable for providing real-time options for users who obtain results through streaming awareness requests.
Asynchronous Requests and Responses
Asynchronous Speech-to-Text API requests for the LongRunningRecognize method have the same format as synchronous Speech-to-Text API requests.
The following are all responses after the request has been completed:~ Abbreviated ~ This type is the same as the type returned by the synchronous Speech-to-Text API recognition request.
Streaming Speech-to-Text API Authentication Request
config
- (Required) Contains audio configuration information of type RecognitionConfig.This is the same as what you specify in synchronous or asynchronous requests.
Streaming Response
alternates
contains a candidate voice transcription list.
© 2024 OneMinuteCode. All rights reserved.