There is something I don't understand about base64 decoding (code below) in C on Wikibooks
Algorithm Implementation/Miscellaneous/Base64-Wikibooks, open books for an open world
What does the data in charge[] represent?
Why is 66 (INVALID) more common?
Also, when I summarize the values of 6 bits into 24 bits, buf=buf<<6|c;
, what is the significance of the logical sum with c
?
#define WHITESPACE64
# define EQUALS 65
#define INVALID66
static unsigned card[] = {
66,66,66,66,66,66,66,66,66,66,64,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,62,66,66,66,63,52,53,
54,55,56,57,58,59,60,61,66,66,66,65,66,66,66, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,66,66,66,66,66,66,26,27,28,
29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66
};
int base64decode(char*in, size_tinLen, unsigned char*out, size_t*outLen){
char*end=in+inLen;
character = 0;
uint32_t buf = 0;
size_tlen=0;
while(in<end){
unsigned charc=d[*in++];
switch(c){
case WHITESPACE: continue; /*skip whitespace*/
case INVALID: return1; /* invalid input, return error*/
case EQUALS: /*pad character, end of data*/
in = end;
continue;
default:
buf=buf<<6|c;
enter++; // increment the number of the occurrence
/* If the buffer is full, split it into bytes*/
if(iter==4){
if((len+=3)>*outLen)return1;/*buffer overflow*/
*(out++)=(buf>>16)&255;
*(out++)=(buf>>8)&255;
*(out++) = buf&255;
buf=0;iter=0;
}
}
}
if(iter==3){
if((len+=2)>*outLen)return1;/*buffer overflow*/
*(out++)=(buf>10)&255;
*(out++)=(buf>2)&255;
}
else if(iter==2){
if(++len>* outLen)return1;/* buffer overflow*/
*(out++)=(buf>>4)&255;
}
* outLen=len; /* modify to reflect the actual output size */
return 0;
}
c
decode
base64
Basic knowledge of Base64 and ASCII code is required.
Base64 represents 64 bit patterns of 6 bit values 000000
(=0) ... 1111
(=63) using 64 ASCII characters according to the following rules:
A(0x41=65)->000000(=0)
B(0x42=66)->000001(=1)
:
Z (0x5A=90) - > 011001(=25)
a(0x61=97) - > 011010(=26)
b(0x62=98)->011011(=27)
:
z(0x7A=122)->110011(=51)
0(0x30=48) - > 110100(=52)
1 (0x31=49) - > 110101 (=53)
:
9(0x39=57) - > 111101(=61)
+ (0x2B=43) - > 111110(=62)
/ (0x2F=47) - > 111111(=63)
For example, the ASCII character B
represents 000001
(=1) in Base 64, so if you access d[66]
with the character code (0x42=66) of B
, the value is 1
.If you add a character representing a value of 0...63 as a comment, you can write d[]
like this.
static unconsigned card[] ={
// \n
66,66,66,66,66,66,66,66,66,66,64,66,66,66,66,66,
//
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
// + /
66,66,66,66,66,66,66,66,66,66,66,62,66,66,66,63,
// 0 1 2 3 4 5 6 7 8 9 =
52,53,54,55,56,57,58,59,60,61,66,66,66,65,66,66,
// ABC DEF G HI J KLM NO
66, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,
// PQ R ST U V W X Y Z
15,16,17,18,19,20,21,22,23,24,25,66,66,66,66,66,
// abc de fgh jklmno
66,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
// pq r st u v w x y z
41,42,43,44,45,46,47,48,49,50,51,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66
};
What does the chard[] data represent?
In other words, if you use the character code as an index to find the value of d[]
, you will find the value that the character represents in Base 64.
Why are there so many 66 (INVALID)?
Only a few of the 1-byte character codes are significant in Base64 (for example, =
), so the location of the other character codes will inevitably be invalid (INVALID).
What does it mean to have a logical sum with c?
In Base 64, one character represents 6 bits, or 24 bits, as shown above. After reading ABCD
to ABCD
when converting four characters to 24 bits of data,
buf=0000000000000000000000000001000010
<--A-><--B-><--C->
so
buf<<6=00000000000000000001000000000001000000000
<--A-><--B-><--C->
c=00000011<-c is converted from `D`
<--D->
By taking these two ORs, the calculated value of buf
is:
buf (new) = 0000000000000000000001000010000011
<--A-><--B-><--C-><--D->
As shown in , A
, B
, C
, and D
can be a 24-bit value connected to each other.
d
represents the correspondence between ASCII code and 64 digits.
Therefore, the array has one byte of 256
and indexes A
to Z
, a
to z
, 0
- 9
, +
, and to
/code>.
© 2024 OneMinuteCode. All rights reserved.