What should I use to generate IDs for music files?

I would like to automatically generate and generate IDs that can uniquely identify music files.

This is an mp3 file with ID3v2.3 or ID3v2.4 tags.

The reason why I want an ID is because I want to create a database based on the generated ID and build my own music management system that doesn't depend on ID3 tags at all.

I expect ID only to be identifiable, and I want to manage meta information such as music titles by myself. I want to automatically calculate ID from music file
I don't want my ID to be affected by ID3 tag updates (I can't simply use the entire file md5)
If possible, I would be happy if the same thing is calculated even if irreversible compression is performed from the original sound file (although I don't know if it's technically possible).

If you do not have enough information about the question, I will supplement it, so I would appreciate it if you could give me.

*Questioner does not know how the music rippled from the CD matches the DB.

mp3

2022-09-30 20:30

4 Answers

How to calculate MD5 for MP3 voice data frames, for example, in the shell script of bash:I am using the head and tail commands (-c options) included in GNU coreutils.

mp3="hoge.mp3"
len = 0
for offset in {1..4}
do
 # Calculate Syncsafe Integer
 len=$(len+)
       $((
         $(cat"$mp3"|head-c$((6+offset))|tail-c1|od-An-td1)
           <<$(7*(4-offset)))
         ))
       ))
done

# 10 bytes:size of ID3v2 header, 1 byte:offset of "tail-c+n" 
tail-c+$(len+11)) "$mp3" | md5sum
0ea78f4e6687ac5fdbb979cc06c5c34a -

# Strip all ID3v2 tags
mid3v2-D "$mp3"

# Calculate MD5 again
md5sum "$mp3"
0ea78f4e6687ac5fdbb979cc06c5c34a hoge.mp3

The ID3v2 extension header and the ID3v1 header that might be at the end of the file are not considered.
I think it would be better to implement it in a language (such as Python) that actually has a package that handles MP3 and ID3 tags.

2022-09-30 20:30

If you just extract waveform data from the WAV file and hash it, you can use python.

#!/usr/bin/env python
# -*-coding:utf-8-*-
import wave, hashlib

wr = wave.open('tmp.wav', 'r')
data=wr.readframes(wr.getnframes())
wr.close();

hash_data=hashlib.sha256(data).hexdigest()

can be found in

So I don't feel like it's going to work.but), using ffmpeg, the original mp3 file...

$ffmpeg-in.mp3-ac1-ar8000-acodec pcm_s16letmp.wav

If you use the above script after roughly deteriorating the monaural framerate8000 samplewidth2 (16 bit-little), you may be able to create a slightly better ID.

If you want to work harder,

#!/usr/bin/env python
# -*-coding:utf-8-*-
import wave, structure, hashlib

wr = wave.open('tmp.wav', 'r')

# debug..
# print "channels:", wr.getnchannels()
# print "sampwidth:", wr.getsampwidth()
# print "framerate:", wr.getframerate()
# print "frame num:", wr.getnframes()
# print "prams:", wr.getparams()
# print "sec:", float(wr.getnframes())/wr.getframerate()

tmp = [ ]
length = wr.getnframes()
for i in range (0, length):
    data=wr.readframes(1)
    nn=struct.unpack("<H", data[0:2])[0]        
    # break violently        
    nn = nn / 32
    # ignore wildly small waveforms
    if nn>8:
        tmp.append(nn)
wr.close();

hash_data=hashlib.sha256(str(tmp)) .hexdigest()

It might be good to do so.(Very appropriate...)

I feel a little happy when I think about the rough part, the data I put out in debug, and everything else.

2022-09-30 20:30

I want to build my own music management system that doesn't depend on ID3 tags at all

Do you mean to build something like grace note?If so, the Grace Note itself has an API and SDK, so you can get a unique ID (GN_ID) for the song from ID3 and audio files.

However, only those released in the major (in the sense that they are not indie) are registered.Also, it is only available for non-commercial use.

2022-09-30 20:30

If you obtain the data size of the ID3 tag and calculate the rest of the MD5, it will not be affected by the ID3 tag.

2022-09-30 20:30

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656