Understanding the Open Address Method Using Double Hash

Asked 2 years ago, Updated 2 years ago, 270 views

I am currently running AIZU ONLINE JUDEGE ALDS1_4_C.
Among them, I am learning how to open addressing using double hashing.

Here's the code for open addressing:

#include<stdio.h>
# include <string.h>

using namespace std;
using ll = long long;

#define M1000003
#define L14

char H[M][L]; /* Hash Table*/

int GetChar(charge){
    if(ch=='A')return1;
    else if(ch=='C') return2;
    else if(ch=='G') return3;
    else if(ch=='T') return 4;
    return 0;
}

ll GetKey(charstr[]){
    ll sum = 0, p = 1, i;
    for(i=0;i<strlen(str);i++){
        sum+=p*(GetChar(str[i])));
        p* = 5;
    }
    return sum;
}

int Hash1(ll key) {return key %M;}
int Hash2(ll key) {return(1+(key%(M-1));}

int find(char str[]){
    ll key = GetKey(str);
    intk = 0;
    while(true){
        int hash_index=(Hash1(key)+k*Hash2(key))%M;
        if(strcmp(H[hash_index], str)==0)return1;
        if(strlen(H[hash_index])==0){
            return 0;
        }
        k++;
    }
    return 0;
}

insert(charstr[]){
    ll key = GetKey(str);
    intk = 0;
    while(true){
        ll hash_index=(Hash1(key)+k*Hash2(key))%M;
        if(strcmp(H[hash_index], str)==0)return1;
        if(strlen(H[hash_index])==0){
            strcpy(H[hash_index], str);
            return1;
        }
        k++;
    }
    return 0;
}

int main() {
    inti,n;
    char str [L], com [9];
    for(i=0;i<M;i++)H[i][0]='\0';

    scanf("%d", & n);

    for(i=0;i<n;i++){
        scanf("%s%s",com,str);

        if(com[0]=='i'){
            insert(str);
        } else {
            if(find(str)){
                printf("yes\n");
            } else {
                printf("no\n");
            }
        }
    }

    return 0;
}

I don't know what I'm doing in the GetKey function that changes characters to numbers.
Specifically,

sum+=p*(GetChar(str[i])));
p* = 5;

I don't know what this p means.
If the character is similar, the index value that you changed to hash function will be close
To avoid this, do you multiply p by the number of strings or multiply p by 5 for each loop that ends?
On the Internet, I saw that is multiplying 5 by a number from 1 to 4, so I don't understand what it means.

I look forward to your kind cooperation.

algorithm data-structure

2022-10-22 09:15

1 Answers

Simply put, the ACGT string is [no characters]It is considered as a key by considering that = 0, A = 1, C = 2, G = 3, and T = 4.The string shown in the sample input is expressed as a decimal number (0pXXX):

AAA=0p111=31
AAC=0p211=56#Note that the first character equals the first digit.
CCC = 0p222 = 62
AGA = 0p131 = 41
AGG = 0p331 = 91
TTT = 0p444 = 124
T = 0p4 = 4

As shown in the example above, the first character is the first digit, so multiply by 1, but the second character is 5, the third character is 25(5*5) and the n-th character is multiplied by the n-th power of 5, where p*=5; is the n-th power p.

In this method, strings and keys are one-to-one.In the code, Key is used only to find hash values, but it should be faster to use Key directly in the table.Also, since up to 12 characters TTTTTTTTTT=0p44444444444444=244140624, long should be enough instead of long, so you can optimize that much.


2022-10-22 09:15

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.