The result of external coupling is different than expected, but the reason is not clear.

Asked 2 years ago, Updated 2 years ago, 107 views

I'm having a hard time studying SQL Server.

What we tried to do was to combine 'table_a' in two columns of no, timestamp with 'table_b' in three columns of no, datetime, value (both tables have no, datetime as the primary key).
The bonding conditions are as follows:

  • no:'='comparison
  • timestamp: the largest timestamp of 'table_b' below 'table_a'timestamp

This is not a very good explanation, so please see the code below as an example.

IF object_id('table_a')IS NOT NULL
  DROP TABLE table_a;

US>CREATE TABLE table_a(
  no INTEGER not null
 , timestamp DATE not null
 ,CONSTRAINT pk_table_a PRIMARY KEY CLUSTERED (no, timestamp)
);

IF object_id('table_b') IS NOT NULL
  DROP TABLE table_b;


US>CREATE TABLE table_b(
  no INTEGER not null
 , timestamp DATE not null
 , value INTEGER not null
 ,CONSTRAINT pk_table_b PRIMARY KEY CLUSTERED (no, timestamp)
);

BEGIN TRANSACTION;

INSERT INTO table_a(no, timestamp) 
VALUES (1,'2018-01-01'))
           ,(1, '2018-01-04')
           ,(1, '2018-01-05')
           ,(2, '2018-01-01')
           ,(2, '2018-01-03')
           ,(2, '2018-01-09')
           ,(3, '2018-01-10')
           ,(3, '2018-01-11')
           ,(3, '2018-01-12')
;

INSERT INTO table_b (no, timestamp, value) 
VALUES (1, '2018-01-01', 101)
           ,(1, '2018-01-02', 102)
           ,(2, '2018-01-02', 202)
           ,(3, '2017-01-01', 301)
           ,(3, '2018-01-12', 312)
;
COMMIT;

SELECT*
  FROM table_a
;

SELECT*
  FROM table_b
;


-- Query A: I can get what I want.
SELECT A.no
      , A. timestamp
      , B.value
  FROM(
    SELECT A.no
          , A. timestamp
          , (SELECT MAX (B.timestamp) 
              FROM table_bas B 
             WHERE B.no = A.no
               AND B.timestamp<=A.timestamp) as a_b_timestamp
      FROM table_a ASA
      as A
        LEFT OUTER JOIN TABLE_b ASB
              ON A.no = B.no
              ANDA.a_b_timestamp = B.timestamp
 WHERE A.no in (1, 2)
;

 -- Query B: This query's result set is the same with A.
 SELECT A.no
      , A. timestamp
      , B.value
  FROM table_a ASA
         LEFT OUTER JOIN TABLE_b ASB 
           ON A.no = B.no
          AND B.timestamp= (SELECT MAX (B.timestamp)    
                               FROM table_bas B 
                              WHERE B.no = A.no
                                AND B.timestamp<=A.timestamp)
 WHERE A.no in (1, 2)
;

-- Query C: The result set is far from what I expected, but I don't know reason.
SELECT A.no
      , A. timestamp
      , B.value
  FROM table_a ASA
         LEFT OUTER JOIN TABLE_b ASB 
           ON A.no = B.no
          ANDA.timestamp= (SELECT MAX (B.timestamp)    
                               FROM table_bas B 
                              WHERE B.no = A.no
                                AND B.timestamp<=A.timestamp)
 WHERE A.no in (1, 2)
 ;

The following data will be prepared in the table:

table_a:
no timestamp
1 2018-01-01
1 2018-01-04
1 2018-01-05
2 2018-01-01
2 2018-01-03
2 2018-01-09
3 2018-01-10
3 2018-01-11
3 2018-01-12

table_b:
no timestamp value
1 2018-01-01101
1 2018-01-02102
2 2018-01-02 202
3 2017-01-01301
3 2018-01-12312

There is a query in the code that commented Query A, B and C, but the result set is as follows:What I wanted was A, B, but only C returned a different set of results.

Query A, B result set (I wanted this!):
no timestamp value
1 2018-01-01101
1 2018-01-04102
1 2018-01-05102
2 2018-01-01 NULL
2 2018-01-03 202
2 2018-01-09 202

Query C result set (wanted different):
no timestamp value
1 2018-01-01101
1 2018-01-01102
1 2018-01-04 NULL
1 2018-01-05 NULL
2 2018-01-01 NULL
2 2018-01-03 NULL
2 2018-01-09 NULL

The problem is that I can't explain well why the result of C is as above.
Could someone please explain the reason clearly?

sql sql-server

2022-09-29 22:57

2 Answers

You should be aware that if you do not have enough conditions for table_b to bind table_b externally, you will not get the expected results.

Query C subqueries are table_a-dependent correlation subqueries, so each row of table_a has a different value.Finally, A.no in(1,2) will narrow it down, so if you omit anything else, it will look like this.

MAX(B.timestamp)
(1) 1 2018-01-01  2018-01-01
(2) 1 2018-01-04  2018-01-02
(3) 1 2018-01-05  2018-01-02
(4) 2 2018-01-01 NULL
(5) 2 2018-01-03  2018-01-02
(6) 2 2018-01-09  2018-01-02

Therefore, the external coupling condition for record (1) is

A.no=B.no ANDA.timestamp='2018-01-01'

That's what it means.From table_b, combine all records that match this condition with this record in table_a.A.timestamp='2018-01-01' has been established, so you will combine all records that match A.no=B.no from table_b externally.

That's this part of the Query C result.

1 2018-01-01101
1 2018-01-01 102

On the other hand, for records (2) through (6), the A.timestamp=(...MAX(B.timestamp)...) part is never established, so the join condition is always false and there is no record of B to be externally joined.As a result, the value corresponding to column B is NULL.

If you could comment on which part was difficult to understand, I might be able to write in more detail.


2022-09-29 22:57

The ON clause in Query B and C is

  • no:'='comparison
  • timestamp: the largest timestamp of 'table_b' below 'table_a'timestamp

The last timestamp is not the timestamp of table_a, but the timestamp of table_b.

In other words, the conditions are for table_a like Query C

ANDA.timestamp=(...)

but for table_b like Query B

AND B.timestamp=(...)

should be .

Also, I thought that using the alias name B was confusing (it looks like only two characters).
If you write Query B in a more Japanese style, it will look like the following.

SELECT*
FROM table_a A
LEFT JOIN TABLE_bB1
ON B1.no = A.no
AND
B1.timestamp=(
    SELECT MAX (B2.timestamp)
    FROM table_b B2
    WHERE B2.no = B1.no
    AND
    B2.timestamp<=A.timestamp
    GROUP BY B2.no
)
;


2022-09-29 22:57

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.