About base64 decoding on C

Asked 2 years ago, Updated 2 years ago, 96 views

There is something I don't understand about base64 decoding (code below) in C on Wikibooks

Algorithm Implementation/Miscellaneous/Base64-Wikibooks, open books for an open world

What does the data in charge[] represent?
Why is 66 (INVALID) more common?

Also, when I summarize the values of 6 bits into 24 bits, buf=buf<<6|c;, what is the significance of the logical sum with c?


#define WHITESPACE64
# define EQUALS 65
#define INVALID66

static unsigned card[] = {
    66,66,66,66,66,66,66,66,66,66,64,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,62,66,66,66,63,52,53,
    54,55,56,57,58,59,60,61,66,66,66,65,66,66,66, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,66,66,66,66,66,66,26,27,28,
    29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66
};

int base64decode(char*in, size_tinLen, unsigned char*out, size_t*outLen){ 
    char*end=in+inLen;
    character = 0;
    uint32_t buf = 0;
    size_tlen=0;

    while(in<end){
        unsigned charc=d[*in++];

        switch(c){
        case WHITESPACE: continue; /*skip whitespace*/
        case INVALID: return1; /* invalid input, return error*/
        case EQUALS: /*pad character, end of data*/
            in = end;
            continue;
        default:
            buf=buf<<6|c;
            enter++; // increment the number of the occurrence
            /* If the buffer is full, split it into bytes*/
            if(iter==4){
                if((len+=3)>*outLen)return1;/*buffer overflow*/
                *(out++)=(buf>>16)&255;
                *(out++)=(buf>>8)&255;
                *(out++) = buf&255;
                buf=0;iter=0;

            }   
        }
    }

    if(iter==3){
        if((len+=2)>*outLen)return1;/*buffer overflow*/
        *(out++)=(buf>10)&255;
        *(out++)=(buf>2)&255;
    }
    else if(iter==2){
        if(++len>* outLen)return1;/* buffer overflow*/
        *(out++)=(buf>>4)&255;
    }

    * outLen=len; /* modify to reflect the actual output size */
    return 0;
}

c decode base64

2022-09-30 19:22

2 Answers

Basic knowledge of Base64 and ASCII code is required.

Base64 represents 64 bit patterns of 6 bit values 000000 (=0) ... 1111 (=63) using 64 ASCII characters according to the following rules:

A(0x41=65)->000000(=0)
B(0x42=66)->000001(=1)
:
Z (0x5A=90) - > 011001(=25)
a(0x61=97) - > 011010(=26)
b(0x62=98)->011011(=27)
:
z(0x7A=122)->110011(=51)
0(0x30=48) - > 110100(=52)
1 (0x31=49) - > 110101 (=53)
:
9(0x39=57) - > 111101(=61)
+ (0x2B=43) - > 111110(=62)
/ (0x2F=47) - > 111111(=63)

For example, the ASCII character B represents 000001 (=1) in Base 64, so if you access d[66] with the character code (0x42=66) of B, the value is 1.If you add a character representing a value of 0...63 as a comment, you can write d[] like this.

static unconsigned card[] ={
//                                \n
    66,66,66,66,66,66,66,66,66,66,64,66,66,66,66,66,
//
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
//                                    +           /
    66,66,66,66,66,66,66,66,66,66,66,62,66,66,66,63,
//   0  1  2  3  4  5  6  7  8  9           =
    52,53,54,55,56,57,58,59,60,61,66,66,66,65,66,66,
//      ABC DEF G HI J KLM NO
    66, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,
//   PQ R ST U V W X Y Z
    15,16,17,18,19,20,21,22,23,24,25,66,66,66,66,66,
//      abc de fgh jklmno
    66,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
//   pq r st u v w x y z
    41,42,43,44,45,46,47,48,49,50,51,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
    66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66
};

What does the chard[] data represent?

In other words, if you use the character code as an index to find the value of d[], you will find the value that the character represents in Base 64.

Why are there so many 66 (INVALID)?

Only a few of the 1-byte character codes are significant in Base64 (for example, =), so the location of the other character codes will inevitably be invalid (INVALID).

What does it mean to have a logical sum with c?

In Base 64, one character represents 6 bits, or 24 bits, as shown above. After reading ABCD to ABCD when converting four characters to 24 bits of data,

 buf=0000000000000000000000000001000010
                       <--A-><--B-><--C->

so

buf<<6=00000000000000000001000000000001000000000
                 <--A-><--B-><--C->
  c=00000011<-c is converted from `D`
                                   <--D->

By taking these two ORs, the calculated value of buf is:

 buf (new) = 0000000000000000000001000010000011
                 <--A-><--B-><--C-><--D->

As shown in , A, B, C, and D can be a 24-bit value connected to each other.


2022-09-30 19:22

d represents the correspondence between ASCII code and 64 digits.
Therefore, the array has one byte of 256 and indexes A to Z, a to z, 0 - 9, +, and to /code>.


2022-09-30 19:22

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.