I would like to automatically generate and generate IDs that can uniquely identify music files.
This is an mp3 file with ID3v2.3 or ID3v2.4 tags.
The reason why I want an ID is because I want to create a database based on the generated ID and build my own music management system that doesn't depend on ID3 tags at all.
If you do not have enough information about the question, I will supplement it, so I would appreciate it if you could give me.
*Questioner does not know how the music rippled from the CD matches the DB.
mp3
How to calculate MD5 for MP3 voice data frames, for example, in the shell script of bash:I am using the head and tail commands (-c
options) included in GNU coreutils.
mp3="hoge.mp3"
len = 0
for offset in {1..4}
do
# Calculate Syncsafe Integer
len=$(len+)
$((
$(cat"$mp3"|head-c$((6+offset))|tail-c1|od-An-td1)
<<$(7*(4-offset)))
))
))
done
# 10 bytes:size of ID3v2 header, 1 byte:offset of "tail-c+n"
tail-c+$(len+11)) "$mp3" | md5sum
0ea78f4e6687ac5fdbb979cc06c5c34a -
# Strip all ID3v2 tags
mid3v2-D "$mp3"
# Calculate MD5 again
md5sum "$mp3"
0ea78f4e6687ac5fdbb979cc06c5c34a hoge.mp3
The ID3v2 extension header and the ID3v1 header that might be at the end of the file are not considered.
I think it would be better to implement it in a language (such as Python) that actually has a package that handles MP3 and ID3 tags.
If you just extract waveform data from the WAV file and hash it, you can use python.
#!/usr/bin/env python
# -*-coding:utf-8-*-
import wave, hashlib
wr = wave.open('tmp.wav', 'r')
data=wr.readframes(wr.getnframes())
wr.close();
hash_data=hashlib.sha256(data).hexdigest()
can be found in
So I don't feel like it's going to work.but), using ffmpeg, the original mp3 file...
$ffmpeg-in.mp3-ac1-ar8000-acodec pcm_s16letmp.wav
If you use the above script after roughly deteriorating the monaural framerate8000 samplewidth2 (16 bit-little), you may be able to create a slightly better ID.
If you want to work harder,
#!/usr/bin/env python
# -*-coding:utf-8-*-
import wave, structure, hashlib
wr = wave.open('tmp.wav', 'r')
# debug..
# print "channels:", wr.getnchannels()
# print "sampwidth:", wr.getsampwidth()
# print "framerate:", wr.getframerate()
# print "frame num:", wr.getnframes()
# print "prams:", wr.getparams()
# print "sec:", float(wr.getnframes())/wr.getframerate()
tmp = [ ]
length = wr.getnframes()
for i in range (0, length):
data=wr.readframes(1)
nn=struct.unpack("<H", data[0:2])[0]
# break violently
nn = nn / 32
# ignore wildly small waveforms
if nn>8:
tmp.append(nn)
wr.close();
hash_data=hashlib.sha256(str(tmp)) .hexdigest()
It might be good to do so.(Very appropriate...)
I feel a little happy when I think about the rough part, the data I put out in debug
, and everything else.
I want to build my own music management system that doesn't depend on ID3 tags at all
Do you mean to build something like grace note?If so, the Grace Note itself has an API and SDK, so you can get a unique ID (GN_ID) for the song from ID3 and audio files.
However, only those released in the major (in the sense that they are not indie) are registered.Also, it is only available for non-commercial use.
If you obtain the data size of the ID3 tag and calculate the rest of the MD5, it will not be affected by the ID3 tag.
© 2024 OneMinuteCode. All rights reserved.