I want to extract the feature quantity from the microphone input in real time.

I'd like to extract the feature amount from the microphone in real time.

I am trying to stream using pyaudio, but libraries such as HTK and Torchaudio allow me to extract it from a wav file loaded.

Is there a way to extract feature quantities without going through a wav file?

import pyaudio
P=pyaudio.PyAudio()
RATE=44100
CHUNK = 1024
stream=P.open (format=pyaudio.paInt16, channels=1, rate=RATE, frames_per_buffer=CHUNK, input=True, output=False)
while stream.is_active():
    input=stream.read(CHUNK)
    # handling of filterbanks

python

2022-09-30 19:56

1 Answers

torchaudio.io.StreamReader added in torchaudio v0.12, allows you to read input directly from the microphone into torch.Tensor.You will need a corresponding FFmpeg library (if you are using conda, you can install it with conda install 'ffmpeg<4.4')

browsing:
https://pytorch.org/audio/stable/tutorials/device_asr.html

The following is an example of macOS:

#StreamReader Initialization
streamer=torchaudio.io.StreamReader(
    src=":default",# Use the default audio input device.
    format = "avfoundation", # device driver
)

# Configure Audio Input
streamer.add_basic_audio_stream(
    frames_per_chunk = 8000, # 8000 frames at once
    Resampling to sample_rate=8000, #8kHz
)

# stream
#
# timeout is the amount of time allowed for the audio device to generate sufficient data.
# -1 waits for data to be ready.Units: Seconds
#
# backoff specifies the interval between retries within the allowed wait time.Units: Seconds
for (audio_chunk,) in streamer.stream (timeout=-1, backoff=1.0):
    # audio_chunk is 8000 frame torch.Tensor
    pass

The device drivers passed to the format argument depend on the OS and FFmpeg library type, but "avfoundation" is standard for macOS and "dshow" for Windows.

The types of devices each driver can handle can be determined by the ffmpeg command.

$ffmpeg-favfoundation-list_devices true-idummy
...
AVFoundation indev@0x126e049d0AVFoundation video devices:
AVFoundation indev@0x126e049d0 [0] FaceTime HD Camera
AVFoundation indev@0x126e049d0 [1] Capture screen 0
AVFoundation indev@0x126e049d0AVFoundation audio devices:
AVFoundation indev@0x126e049d0 [0] ZoomAudioDevice
AVFoundation indev@0x126e049d0 [1] MacBook Pro Microphone

Timing is important for retrieving data using a microphone, so you should start a subprocess and continue to turn the streaming for loop.

2022-09-30 19:56

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656