I want to find common elements among ascending arrays without duplication of their own elements.

Asked 2 years ago, Updated 2 years ago, 74 views

I have a question about C language.
There are two arrays A, B, each storing an integer, the number in the array is sorted in ascending order, and there are no overlapping numbers in the array A, which is the same for the array B.
However, arrays A and B are not necessarily the same size.

Based on that, please tell me which program outputs duplicate numbers in arrays A and B, including duplicate numbers๐Ÿ™

c algorithm array

2022-09-30 21:34

5 Answers

Waste Method

First of all, if you want to solve it in a straightforward way, you can check one by one whether the elements of array a are included in array b.Consider, for example, the following:

inta[]={1,5,9};
intb[] = {2,3,5,7,11};

In this example, first check to see if a[0] is included in b, is included in b.Not included, then check for 5, which is a[1].Yes, check to see if 9, which is a[2], is included at the end.Not included. Counting the number of times included gives you the number of overlapping elements a and b.

But there is a waste of time in this method.This method explores the b element from scratch for each element of a, but the assumption that the two arrays are sorted makes it more efficient.

More efficient way

The two arrays are sorted together, so you can examine the two arrays simultaneously from the first element.Specifically, the algorithms are as follows:

1.count=0 initializes
2. Initialize as i = 0, j = 0
3. Repeat the following until either a[i] or b[j] passes the end of the array:
    // If you are looking at the same place, increase the count by 1 and move on.
    2-1 If a[i]==b[j]:
        2-1-1. count++
        2-1-2.i++, j++// The elements are not duplicated in each array, so this is fine.
    // If either is really small, move on to the smaller one
    2-2 If not a[i]<b[j]:
        2-2-1.i++
    2-3 If not (i.e., a[i]>b[j]):
        2-3-1.j++
4. Output count

This algorithm compares the smaller elements of an array in order, where i and j are "where you are looking" at each array and gradually move the place you are looking back.Because the two arrays are sorted, the key point is that the elements that have already passed are never equal to the elements where you are looking.

Let's see how this algorithm works in the same example as before.

inta[]={1,5,9};
intb[] = {2,3,5,7,11};
 i | j | a[i] | b[j] |
----------------------------------------------
 0 | 0 | 1 | 2 | a[i] is smaller, so i++
 1 | 0 | 5 | 2 | b[j] is smaller, so j++
 1 | 1 | 5 | 3 | b[j] is smaller, so j++
 1 | 2 | 5 | 5 | a[i] == b[j]!
 2 | 3 | 9 | 7 | b[j] is smaller, so j++
 2 | 4 | 9 | 11 | a[i] is smaller and terminates i++โ†’a

Leave the actual implementation of this algorithm in Wandbox.


2022-09-30 21:34

Since the elements of both arrays are not duplicated and the elements are sorted, prepare the index variables for array A and the index variables for array B (initial value is 0), and

Compare the elements of interest in array A with those of interest in array B and
ใ€€If there is a match,
ใ€€ใ€€ใ€€As the number is duplicated, the number of duplicated numbers is +1.An index (position of the element to be noticed) of both arrays is set to +1.
ใ€€If the element in array A is larger,
ใ€€ใ€€ใ€€+1 index of array B (location of element of interest) (try to compare by the next largest number after array B)
ใ€€If the element in array B is larger,
ใ€€ใ€€ใ€€+1 index of array A (location of element of interest) (try to compare by the next largest number after array A)

I think it would be better to turn it around in a loop.
(End the loop when the index of either array exceeds the maximum.)


2022-09-30 21:34

To do it simply, you can write a simple double loop, but if you control the inner loop well by taking advantage of the ascending order of both arrays, you'll get an O(N) calculation.An order of 1000000 with a value of N is quite different from a simple double loop.

#include<stdio.h>
# include <stdlib.h>

#define N10

// Input data
intarr1[N];
intarr2[N];
// Result
result [N];
int resultCount = 0;

int main(intargc, const char*argv[]){

    // Create input data in random numbers
    int num = 0;
    for(inti=0;i<N;++i){
        num+=land()%3+1;
        arr1[i] = num;
    }
    num = 0;
    for(inti=0;i<N;++i){
        num+=land()%3+1;
        arr2[i] = num;
    }

    // Partial body for duplicate parts
    int j = 0;
    // `arr1' side reads from the beginning in order
    for(inti=0;i<N;++i){
        int n1 = arr1[i];
        // Proceed with `arr2` reading position until `arr2` side values line up or pass `n1`
        while(j<N&n1>arr2[j]){
            ++j;
        }
        // If they are aligned (overlapping), add them to the results
        if(j<N&n1==arr2[j]){
            result[resultCount++] = n1;
            j+=1;
        }
    }

    // Input data and result output
    for(inti=0;i<N;++i){
        if(i>0){
            printf(", ");
        }
        printf("%d",arr1[i]);
    }
    printf("\n");

    for(inti=0;i<N;++i){
        if(i>0){
            printf(", ");
        }
        printf("%d",arr2[i]);
    }
    printf("\n");

    for(inti=0;i<resultCount;++i){
        if(i>0){
            printf(", ");
        }
        printf("%d", result[i]);
    }
    printf("\n");
    printf("%d\n", resultCount);

    return 0;
}

The input data has been created and the output part of the result has become longer, but it is simply controlling i to lick arr1 and j to lick arr2 separately (but not to miss duplicate elements).I could have used the for statement on the inner loop, but I set it to while because it looks like I'm not used to it.

If you don't understand how it works, follow the changes in i, j in an environment where you can perform steps with a smaller N value.

If you're looking for a simple double loop to compare, be sure to complete it yourself and compare the speed differences between the two.

(Additional)
It is essentially the same as Fumu7 and Nekketsuuuu's "more efficient method."However, Nekketsuuuu's code seems easier to understand.


2022-09-30 21:34

If the range of elements that appear is determined to some extent and the memory can be used to some extent, there is also a way to use a bucket.
Another advantage is that if you know the maximum and minimum values, you can see the duplication without sorting both arrays.

#include<stdio.h>
# include <stdlib.h>

#define MIN(a,b)(a)<(b)?(a):(b))
#define MAX(a,b)(a)>(b)?(a):(b))

int main()
{
    inti = 0;
    int cnt = 0;

    inta[] = {1,5,9,11,12,15,16,17,18};
    int b[ ] = {3,5,7,11,15,18};

    inta_len=sizeof(a)/sizeof(int); // Number of elements in array a
    int b_len=sizeof(b)/sizeof(int); // Number of elements in array b

    int min = MIN(a[0], b[0]); // Minimum value
    int max = MAX (a[a_len-1], b[b_len-1]); // Maximum value

    int*bucket=calloc(max-min+1,sizeof(int));//bucket ready

    for (i=0; i<a_len;i++)
    {
        bucket[a[i]-min]++;
    }

    for (i=0; i<b_len;i++)
    {
        bucket[b[i]-min]++;
    }

    printf("Duplicate is \n";
    for (i=min; i<=max;i++)
    {
        if(bucket[i-min]==2)
        {
            printf("%d\n", i);
            cnt++;
        }
    }
    %d of printf("".\n", cnt);


    free (bucket); // Memory release

    return 0;
}

It has been verified to work on the following sites.
http://www.tutorialspoint.com/compile_c_online.php


2022-09-30 21:34

I think the use of binary search may be considered depending on the trend of the elements in the array, so I'll give you an example.

Basically, it is the same answer using the loop that already exists (1, 2, 3.Some of these answers may have +1 index A or B, but we are trying to speed it up by using the function search_greater_or_eq at once.

The use of binary search speeds up when, roughly speaking, the distribution of elements in arrays A and B is very different, and the range of elements does not overlap very much, or the number of elements on one side is extremely small and sparse compared to the other.A similar distribution (for example, if you include an even number of numbers in a range as A and an odd number as B) is conversely slow.

#include<stdio.h>
# include <stdlib.h>

// Returns the index of the smallest element among n or more elements in a binary search
// Note! If not found **Returns out-of-range index (length)**
// checking at the caller
size_t search_greater_or_eq(intn, intarr[], size_t start, size_t length){
    size_tlow=start;
    size_high = length-1;
    size_t greater = length;
    while(low<=high){
        size_tmid=(low+high)/2;
        if(arr[mid]>n){
            high = mid-1;
            greater=mid;
        } else if(arr[mid]<n){
            low = mid+1;
        } else {//arr[mid]==n
            return mid;
        }
    }
    return greater;
}


int count_duplicate(inta[], size_ta_length, intb[], size_tb_length){
    int count = 0;
    size_ta_pos = 0;
    size_tb_pos = 0;

    while(a_pos<a_length&b_pos<b_length){
        if(a[a_pos] == b[b_pos]){
            // printf("%d\n", a[a_pos]);
            count++;
            a_pos++;
            b_pos++;
        } else if(a[a_pos]>b[b_pos]){
            b_pos = search_greater_or_eq(a[a_pos], b, b_pos, b_length);
        } else {
            a_pos = search_greater_or_eq(b[b_pos], a, a_pos, a_length);
        }
    }

    return count;
}


int main() {
    size_t A_length = 100000000;
    int*A=malloc(sizeof(int)*A_length);
    for(inti=0;i<A_length;i++){
        A[i] = i+1;
    }
    size_t B_length = 6;
    int B[ ] = {0,1,A_length/4,A_length/2,A_length,A_length+1};

    printf("A:%d to %d\n", A[0], A[A_length-1]);
    printf("B:%d to %d\n", B[0], B[B_length-1]);
    printf("Total duplicates: %d\n", count_duplicate(A,A_length,B,B_length));

    free(A);
}

Output:

 Aโ€”1 to 10000000000
Bโ€”0 to 100000001
Total Duplicates: 4

If search_greater_or_eq is to be viewed sequentially instead of a binary search, it should be almost the same as the algorithm described in the other answers.Again, if the distribution of the arrays is very different, it will increase the percentage of the search_greater_or_eq going around the compact loop, so it will be a little faster, but in the case of C language, it will have little impact.

//Linear Search Version
size_t search_greater_or_eq(intn, intarr[], size_t start, size_t length){
    size_ti;
    for(i=start;i<length;i++){
        if(arr[i]>=n){
            returni;
        }
    }
    return length;
}


2022-09-30 21:34

If you have any answers or tips


ยฉ 2024 OneMinuteCode. All rights reserved.