I'd like to extract the feature amount from the microphone in real time.
I am trying to stream using pyaudio, but libraries such as HTK and Torchaudio allow me to extract it from a wav file loaded.
Is there a way to extract feature quantities without going through a wav file?
import pyaudio
P=pyaudio.PyAudio()
RATE=44100
CHUNK = 1024
stream=P.open (format=pyaudio.paInt16, channels=1, rate=RATE, frames_per_buffer=CHUNK, input=True, output=False)
while stream.is_active():
input=stream.read(CHUNK)
# handling of filterbanks
torchaudio.io.StreamReader added in torchaudio v0.12, allows you to read input directly from the microphone into torch.Tensor.You will need a corresponding FFmpeg library (if you are using conda, you can install it with conda install 'ffmpeg<4.4')
browsing:
https://pytorch.org/audio/stable/tutorials/device_asr.html
The following is an example of macOS:
#StreamReader Initialization
streamer=torchaudio.io.StreamReader(
src=":default",# Use the default audio input device.
format = "avfoundation", # device driver
)
# Configure Audio Input
streamer.add_basic_audio_stream(
frames_per_chunk = 8000, # 8000 frames at once
Resampling to sample_rate=8000, #8kHz
)
# stream
#
# timeout is the amount of time allowed for the audio device to generate sufficient data.
# -1 waits for data to be ready.Units: Seconds
#
# backoff specifies the interval between retries within the allowed wait time.Units: Seconds
for (audio_chunk,) in streamer.stream (timeout=-1, backoff=1.0):
# audio_chunk is 8000 frame torch.Tensor
pass
The device drivers passed to the format
argument depend on the OS and FFmpeg library type, but "avfoundation"
is standard for macOS and "dshow"
for Windows.
The types of devices each driver can handle can be determined by the ffmpeg
command.
$ffmpeg-favfoundation-list_devices true-idummy
...
AVFoundation indev@0x126e049d0AVFoundation video devices:
AVFoundation indev@0x126e049d0 [0] FaceTime HD Camera
AVFoundation indev@0x126e049d0 [1] Capture screen 0
AVFoundation indev@0x126e049d0AVFoundation audio devices:
AVFoundation indev@0x126e049d0 [0] ZoomAudioDevice
AVFoundation indev@0x126e049d0 [1] MacBook Pro Microphone
Timing is important for retrieving data using a microphone, so you should start a subprocess and continue to turn the streaming for loop.
© 2024 OneMinuteCode. All rights reserved.