BINARY_CHECKUM - different result depending on number of rows












9














I wonder why the BINARY_CHECKSUM function returns different result for the same:



SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL),
(3, 1, 2)) s(id,a,b);

SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL)) s(id,a,b);


Ouput:



+-----+----+------+-------------+
| id | a | b | bc |
+-----+----+------+-------------+
| 1 | | 100 | -109 |
| 2 | | | -2147483640 |
| 3 | 1 | 2 | 18 |
+-----+----+------+-------------+

-- -109 vs 100
+-----+----+------+------------+
| id | a | b | bc |
+-----+----+------+------------+
| 1 | | 100 | 100 |
| 2 | | | 2147483647 |
+-----+----+------+------------+




And for second sample I get what I would anticipate:



SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, 1, 100),
(2, 3, 4),
(3,1,1)) s(id,a,b);

SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, 1, 100),
(2, 3, 4)) s(id,a,b);


Ouptut for both first two rows:



+-----+----+------+-----+
| id | a | b | bc |
+-----+----+------+-----+
| 1 | 1 | 100 | 116 |
| 2 | 3 | 4 | 52 |
+-----+----+------+-----+


db<>fiddle demo





It has strange consequences when I want to compare two tables/queries:



WITH t AS (
SELECT 1 AS id, NULL AS a, 100 b
UNION ALL SELECT 2, NULL, NULL
UNION ALL SELECT 3, 1, 2 -- comment this out
), s AS (
SELECT 1 AS id ,100 AS a, NULL as b
UNION ALL SELECT 2, NULL, NULL
UNION ALL SELECT 3, 2, 1 -- comment this out
)
SELECT t.*,s.*
,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s
FROM t
JOIN s
ON s.id = t.id
WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b);


db<>fiddle demo2



For 3 rows I get single result:



+-----+----+----+-----+----+----+--------------+-------------+
| id | a | b | id | a | b | bc_t | bc_s |
+-----+----+----+-----+----+----+--------------+-------------+
| 2 | | | 2 | | | -2147483640 | -2147483640 |
+-----+----+----+-----+----+----+--------------+-------------+


but for 2 rows I get also id = 1:



+-----+----+------+-----+------+----+-------------+------------+
| id | a | b | id | a | b | bc_t | bc_s |
+-----+----+------+-----+------+----+-------------+------------+
| 1 | | 100 | 1 | 100 | | 100 | 100 |
| 2 | | | 2 | | | 2147483647 | 2147483647 |
+-----+----+------+-----+------+----+-------------+------------+


Remarks:




  • I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)


  • I am aware that BINARY_CHECKSUM could lead to collisions(two different calls produce the same output) here scenario is a bit different




For this definition, we say that null values, of a specified type,
compare as equal values. If at least one of the values in the
expression list changes, the expression checksum can also change.
However, this is not guaranteed. Therefore, to detect whether values
have changed, we recommend use of BINARY_CHECKSUM only if your
application can tolerate an occasional missed change.




It is strange for me that hash function returns different result for the same input arguments.
Is this behaviour by design or it is some kind of glitch?



EDIT:



As @scsimon
points out it works for materialized tables but not for cte.
db<>fiddle actual table



Metadata for cte:



SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL),
(3, 1, 2)) s(id,a,b)', NULL,0);

SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL)) s(id,a,b)', NULL,0)

-- working workaround
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, cast(NULL as int), 100),
(2, NULL, NULL)) s(id,a,b)', NULL,0)


For all cases all columns are INT but with explicit CAST it behaves as it should.



db<>fidde metadata










share|improve this question





























    9














    I wonder why the BINARY_CHECKSUM function returns different result for the same:



    SELECT *, BINARY_CHECKSUM(a,b) AS bc
    FROM (VALUES(1, NULL, 100),
    (2, NULL, NULL),
    (3, 1, 2)) s(id,a,b);

    SELECT *, BINARY_CHECKSUM(a,b) AS bc
    FROM (VALUES(1, NULL, 100),
    (2, NULL, NULL)) s(id,a,b);


    Ouput:



    +-----+----+------+-------------+
    | id | a | b | bc |
    +-----+----+------+-------------+
    | 1 | | 100 | -109 |
    | 2 | | | -2147483640 |
    | 3 | 1 | 2 | 18 |
    +-----+----+------+-------------+

    -- -109 vs 100
    +-----+----+------+------------+
    | id | a | b | bc |
    +-----+----+------+------------+
    | 1 | | 100 | 100 |
    | 2 | | | 2147483647 |
    +-----+----+------+------------+




    And for second sample I get what I would anticipate:



    SELECT *, BINARY_CHECKSUM(a,b) AS bc
    FROM (VALUES(1, 1, 100),
    (2, 3, 4),
    (3,1,1)) s(id,a,b);

    SELECT *, BINARY_CHECKSUM(a,b) AS bc
    FROM (VALUES(1, 1, 100),
    (2, 3, 4)) s(id,a,b);


    Ouptut for both first two rows:



    +-----+----+------+-----+
    | id | a | b | bc |
    +-----+----+------+-----+
    | 1 | 1 | 100 | 116 |
    | 2 | 3 | 4 | 52 |
    +-----+----+------+-----+


    db<>fiddle demo





    It has strange consequences when I want to compare two tables/queries:



    WITH t AS (
    SELECT 1 AS id, NULL AS a, 100 b
    UNION ALL SELECT 2, NULL, NULL
    UNION ALL SELECT 3, 1, 2 -- comment this out
    ), s AS (
    SELECT 1 AS id ,100 AS a, NULL as b
    UNION ALL SELECT 2, NULL, NULL
    UNION ALL SELECT 3, 2, 1 -- comment this out
    )
    SELECT t.*,s.*
    ,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s
    FROM t
    JOIN s
    ON s.id = t.id
    WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b);


    db<>fiddle demo2



    For 3 rows I get single result:



    +-----+----+----+-----+----+----+--------------+-------------+
    | id | a | b | id | a | b | bc_t | bc_s |
    +-----+----+----+-----+----+----+--------------+-------------+
    | 2 | | | 2 | | | -2147483640 | -2147483640 |
    +-----+----+----+-----+----+----+--------------+-------------+


    but for 2 rows I get also id = 1:



    +-----+----+------+-----+------+----+-------------+------------+
    | id | a | b | id | a | b | bc_t | bc_s |
    +-----+----+------+-----+------+----+-------------+------------+
    | 1 | | 100 | 1 | 100 | | 100 | 100 |
    | 2 | | | 2 | | | 2147483647 | 2147483647 |
    +-----+----+------+-----+------+----+-------------+------------+


    Remarks:




    • I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)


    • I am aware that BINARY_CHECKSUM could lead to collisions(two different calls produce the same output) here scenario is a bit different




    For this definition, we say that null values, of a specified type,
    compare as equal values. If at least one of the values in the
    expression list changes, the expression checksum can also change.
    However, this is not guaranteed. Therefore, to detect whether values
    have changed, we recommend use of BINARY_CHECKSUM only if your
    application can tolerate an occasional missed change.




    It is strange for me that hash function returns different result for the same input arguments.
    Is this behaviour by design or it is some kind of glitch?



    EDIT:



    As @scsimon
    points out it works for materialized tables but not for cte.
    db<>fiddle actual table



    Metadata for cte:



    SELECT name, system_type_name
    FROM sys.dm_exec_describe_first_result_set('
    SELECT *
    FROM (VALUES(1, NULL, 100),
    (2, NULL, NULL),
    (3, 1, 2)) s(id,a,b)', NULL,0);

    SELECT name, system_type_name
    FROM sys.dm_exec_describe_first_result_set('
    SELECT *
    FROM (VALUES(1, NULL, 100),
    (2, NULL, NULL)) s(id,a,b)', NULL,0)

    -- working workaround
    SELECT name, system_type_name
    FROM sys.dm_exec_describe_first_result_set('
    SELECT *
    FROM (VALUES(1, cast(NULL as int), 100),
    (2, NULL, NULL)) s(id,a,b)', NULL,0)


    For all cases all columns are INT but with explicit CAST it behaves as it should.



    db<>fidde metadata










    share|improve this question



























      9












      9








      9


      1





      I wonder why the BINARY_CHECKSUM function returns different result for the same:



      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL),
      (3, 1, 2)) s(id,a,b);

      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL)) s(id,a,b);


      Ouput:



      +-----+----+------+-------------+
      | id | a | b | bc |
      +-----+----+------+-------------+
      | 1 | | 100 | -109 |
      | 2 | | | -2147483640 |
      | 3 | 1 | 2 | 18 |
      +-----+----+------+-------------+

      -- -109 vs 100
      +-----+----+------+------------+
      | id | a | b | bc |
      +-----+----+------+------------+
      | 1 | | 100 | 100 |
      | 2 | | | 2147483647 |
      +-----+----+------+------------+




      And for second sample I get what I would anticipate:



      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, 1, 100),
      (2, 3, 4),
      (3,1,1)) s(id,a,b);

      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, 1, 100),
      (2, 3, 4)) s(id,a,b);


      Ouptut for both first two rows:



      +-----+----+------+-----+
      | id | a | b | bc |
      +-----+----+------+-----+
      | 1 | 1 | 100 | 116 |
      | 2 | 3 | 4 | 52 |
      +-----+----+------+-----+


      db<>fiddle demo





      It has strange consequences when I want to compare two tables/queries:



      WITH t AS (
      SELECT 1 AS id, NULL AS a, 100 b
      UNION ALL SELECT 2, NULL, NULL
      UNION ALL SELECT 3, 1, 2 -- comment this out
      ), s AS (
      SELECT 1 AS id ,100 AS a, NULL as b
      UNION ALL SELECT 2, NULL, NULL
      UNION ALL SELECT 3, 2, 1 -- comment this out
      )
      SELECT t.*,s.*
      ,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s
      FROM t
      JOIN s
      ON s.id = t.id
      WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b);


      db<>fiddle demo2



      For 3 rows I get single result:



      +-----+----+----+-----+----+----+--------------+-------------+
      | id | a | b | id | a | b | bc_t | bc_s |
      +-----+----+----+-----+----+----+--------------+-------------+
      | 2 | | | 2 | | | -2147483640 | -2147483640 |
      +-----+----+----+-----+----+----+--------------+-------------+


      but for 2 rows I get also id = 1:



      +-----+----+------+-----+------+----+-------------+------------+
      | id | a | b | id | a | b | bc_t | bc_s |
      +-----+----+------+-----+------+----+-------------+------------+
      | 1 | | 100 | 1 | 100 | | 100 | 100 |
      | 2 | | | 2 | | | 2147483647 | 2147483647 |
      +-----+----+------+-----+------+----+-------------+------------+


      Remarks:




      • I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)


      • I am aware that BINARY_CHECKSUM could lead to collisions(two different calls produce the same output) here scenario is a bit different




      For this definition, we say that null values, of a specified type,
      compare as equal values. If at least one of the values in the
      expression list changes, the expression checksum can also change.
      However, this is not guaranteed. Therefore, to detect whether values
      have changed, we recommend use of BINARY_CHECKSUM only if your
      application can tolerate an occasional missed change.




      It is strange for me that hash function returns different result for the same input arguments.
      Is this behaviour by design or it is some kind of glitch?



      EDIT:



      As @scsimon
      points out it works for materialized tables but not for cte.
      db<>fiddle actual table



      Metadata for cte:



      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL),
      (3, 1, 2)) s(id,a,b)', NULL,0);

      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL)) s(id,a,b)', NULL,0)

      -- working workaround
      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, cast(NULL as int), 100),
      (2, NULL, NULL)) s(id,a,b)', NULL,0)


      For all cases all columns are INT but with explicit CAST it behaves as it should.



      db<>fidde metadata










      share|improve this question















      I wonder why the BINARY_CHECKSUM function returns different result for the same:



      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL),
      (3, 1, 2)) s(id,a,b);

      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL)) s(id,a,b);


      Ouput:



      +-----+----+------+-------------+
      | id | a | b | bc |
      +-----+----+------+-------------+
      | 1 | | 100 | -109 |
      | 2 | | | -2147483640 |
      | 3 | 1 | 2 | 18 |
      +-----+----+------+-------------+

      -- -109 vs 100
      +-----+----+------+------------+
      | id | a | b | bc |
      +-----+----+------+------------+
      | 1 | | 100 | 100 |
      | 2 | | | 2147483647 |
      +-----+----+------+------------+




      And for second sample I get what I would anticipate:



      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, 1, 100),
      (2, 3, 4),
      (3,1,1)) s(id,a,b);

      SELECT *, BINARY_CHECKSUM(a,b) AS bc
      FROM (VALUES(1, 1, 100),
      (2, 3, 4)) s(id,a,b);


      Ouptut for both first two rows:



      +-----+----+------+-----+
      | id | a | b | bc |
      +-----+----+------+-----+
      | 1 | 1 | 100 | 116 |
      | 2 | 3 | 4 | 52 |
      +-----+----+------+-----+


      db<>fiddle demo





      It has strange consequences when I want to compare two tables/queries:



      WITH t AS (
      SELECT 1 AS id, NULL AS a, 100 b
      UNION ALL SELECT 2, NULL, NULL
      UNION ALL SELECT 3, 1, 2 -- comment this out
      ), s AS (
      SELECT 1 AS id ,100 AS a, NULL as b
      UNION ALL SELECT 2, NULL, NULL
      UNION ALL SELECT 3, 2, 1 -- comment this out
      )
      SELECT t.*,s.*
      ,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s
      FROM t
      JOIN s
      ON s.id = t.id
      WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b);


      db<>fiddle demo2



      For 3 rows I get single result:



      +-----+----+----+-----+----+----+--------------+-------------+
      | id | a | b | id | a | b | bc_t | bc_s |
      +-----+----+----+-----+----+----+--------------+-------------+
      | 2 | | | 2 | | | -2147483640 | -2147483640 |
      +-----+----+----+-----+----+----+--------------+-------------+


      but for 2 rows I get also id = 1:



      +-----+----+------+-----+------+----+-------------+------------+
      | id | a | b | id | a | b | bc_t | bc_s |
      +-----+----+------+-----+------+----+-------------+------------+
      | 1 | | 100 | 1 | 100 | | 100 | 100 |
      | 2 | | | 2 | | | 2147483647 | 2147483647 |
      +-----+----+------+-----+------+----+-------------+------------+


      Remarks:




      • I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)


      • I am aware that BINARY_CHECKSUM could lead to collisions(two different calls produce the same output) here scenario is a bit different




      For this definition, we say that null values, of a specified type,
      compare as equal values. If at least one of the values in the
      expression list changes, the expression checksum can also change.
      However, this is not guaranteed. Therefore, to detect whether values
      have changed, we recommend use of BINARY_CHECKSUM only if your
      application can tolerate an occasional missed change.




      It is strange for me that hash function returns different result for the same input arguments.
      Is this behaviour by design or it is some kind of glitch?



      EDIT:



      As @scsimon
      points out it works for materialized tables but not for cte.
      db<>fiddle actual table



      Metadata for cte:



      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL),
      (3, 1, 2)) s(id,a,b)', NULL,0);

      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, NULL, 100),
      (2, NULL, NULL)) s(id,a,b)', NULL,0)

      -- working workaround
      SELECT name, system_type_name
      FROM sys.dm_exec_describe_first_result_set('
      SELECT *
      FROM (VALUES(1, cast(NULL as int), 100),
      (2, NULL, NULL)) s(id,a,b)', NULL,0)


      For all cases all columns are INT but with explicit CAST it behaves as it should.



      db<>fidde metadata







      sql sql-server tsql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 51 mins ago

























      asked 1 hour ago









      Lukasz Szozda

      78.7k1061104




      78.7k1061104
























          2 Answers
          2






          active

          oldest

          votes


















          4














          For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).



          This seems to happen very early on. The initial tree representation output from



          SELECT *, BINARY_CHECKSUM(a,b) AS bc
          FROM (VALUES(1, NULL, 100),
          (2, NULL, NULL)) s(id,a,b)
          OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);


          Already shows that it has decided to just use one column as input to the function



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1008


          This compares with the following output for your first query



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1011

          ScaOp_Identifier COL: Union1010


          If you try and get the BINARY_CHECKSUM with



          SELECT *, BINARY_CHECKSUM(a) AS bc
          FROM (VALUES(1, NULL, 100)) s(id,a,b)


          It gives the error




          Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
          no comparable columns in the binarychecksum input.




          This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.



          Another case is



          SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))


          vs



          SELECT COALESCE(NULL,NULL)


          I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.



          SELECT COALESCE(a,b)
          FROM (VALUES(NULL, NULL)) s(a,b)


          Does work as expected without this glitch.






          share|improve this answer



















          • 1




            Good catch with COALESCE
            – Lukasz Szozda
            34 mins ago



















          4














          This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.



          In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.



          The rest of the answer illustrates what is happening.



          The code behaves as expected with type conversions:



          SELECT *, BINARY_CHECKSUM(a, b) AS bc
          FROM (VALUES(1, cast(NULL as int), 100),
          (2, NULL, NULL)
          ) s(id,a,b);


          Here is a db<>fiddle.



          Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.



          This is really, really strange. Consider the following:



          select v.*, checksum(a, b), checksum(c,b)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, 1, 2, 1.0)
          ) v(id, a, b, c);


          The change in type for "d" affects the binary_checksum() for the second row, but not for the first.



          This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.



          You can validate this by seeing the error when you run:



          select v.*, binary_checksum(a)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, NULL, 2, 1.0)
          ) v( id,a, b, c);


          It complains:




          Argument data type NULL is invalid for argument 1 of checksum function.




          Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.



          The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.



          I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.






          share|improve this answer























          • In SQL Server default type for NULL is INT. It is a good observation
            – Lukasz Szozda
            1 hour ago








          • 1




            This doesn't happen when you use an actual table oddly with int datatypes so +1
            – scsimon
            1 hour ago








          • 3




            Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
            – Lukasz Szozda
            1 hour ago






          • 1




            @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
            – Gordon Linoff
            59 mins ago










          • @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
            – Lukasz Szozda
            35 mins ago











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988972%2fbinary-checkum-different-result-depending-on-number-of-rows%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).



          This seems to happen very early on. The initial tree representation output from



          SELECT *, BINARY_CHECKSUM(a,b) AS bc
          FROM (VALUES(1, NULL, 100),
          (2, NULL, NULL)) s(id,a,b)
          OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);


          Already shows that it has decided to just use one column as input to the function



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1008


          This compares with the following output for your first query



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1011

          ScaOp_Identifier COL: Union1010


          If you try and get the BINARY_CHECKSUM with



          SELECT *, BINARY_CHECKSUM(a) AS bc
          FROM (VALUES(1, NULL, 100)) s(id,a,b)


          It gives the error




          Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
          no comparable columns in the binarychecksum input.




          This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.



          Another case is



          SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))


          vs



          SELECT COALESCE(NULL,NULL)


          I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.



          SELECT COALESCE(a,b)
          FROM (VALUES(NULL, NULL)) s(a,b)


          Does work as expected without this glitch.






          share|improve this answer



















          • 1




            Good catch with COALESCE
            – Lukasz Szozda
            34 mins ago
















          4














          For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).



          This seems to happen very early on. The initial tree representation output from



          SELECT *, BINARY_CHECKSUM(a,b) AS bc
          FROM (VALUES(1, NULL, 100),
          (2, NULL, NULL)) s(id,a,b)
          OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);


          Already shows that it has decided to just use one column as input to the function



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1008


          This compares with the following output for your first query



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1011

          ScaOp_Identifier COL: Union1010


          If you try and get the BINARY_CHECKSUM with



          SELECT *, BINARY_CHECKSUM(a) AS bc
          FROM (VALUES(1, NULL, 100)) s(id,a,b)


          It gives the error




          Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
          no comparable columns in the binarychecksum input.




          This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.



          Another case is



          SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))


          vs



          SELECT COALESCE(NULL,NULL)


          I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.



          SELECT COALESCE(a,b)
          FROM (VALUES(NULL, NULL)) s(a,b)


          Does work as expected without this glitch.






          share|improve this answer



















          • 1




            Good catch with COALESCE
            – Lukasz Szozda
            34 mins ago














          4












          4








          4






          For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).



          This seems to happen very early on. The initial tree representation output from



          SELECT *, BINARY_CHECKSUM(a,b) AS bc
          FROM (VALUES(1, NULL, 100),
          (2, NULL, NULL)) s(id,a,b)
          OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);


          Already shows that it has decided to just use one column as input to the function



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1008


          This compares with the following output for your first query



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1011

          ScaOp_Identifier COL: Union1010


          If you try and get the BINARY_CHECKSUM with



          SELECT *, BINARY_CHECKSUM(a) AS bc
          FROM (VALUES(1, NULL, 100)) s(id,a,b)


          It gives the error




          Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
          no comparable columns in the binarychecksum input.




          This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.



          Another case is



          SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))


          vs



          SELECT COALESCE(NULL,NULL)


          I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.



          SELECT COALESCE(a,b)
          FROM (VALUES(NULL, NULL)) s(a,b)


          Does work as expected without this glitch.






          share|improve this answer














          For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).



          This seems to happen very early on. The initial tree representation output from



          SELECT *, BINARY_CHECKSUM(a,b) AS bc
          FROM (VALUES(1, NULL, 100),
          (2, NULL, NULL)) s(id,a,b)
          OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);


          Already shows that it has decided to just use one column as input to the function



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1008


          This compares with the following output for your first query



          ScaOp_Intrinsic binary_checksum

          ScaOp_Identifier COL: Union1011

          ScaOp_Identifier COL: Union1010


          If you try and get the BINARY_CHECKSUM with



          SELECT *, BINARY_CHECKSUM(a) AS bc
          FROM (VALUES(1, NULL, 100)) s(id,a,b)


          It gives the error




          Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
          no comparable columns in the binarychecksum input.




          This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.



          Another case is



          SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))


          vs



          SELECT COALESCE(NULL,NULL)


          I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.



          SELECT COALESCE(a,b)
          FROM (VALUES(NULL, NULL)) s(a,b)


          Does work as expected without this glitch.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 21 mins ago

























          answered 47 mins ago









          Martin Smith

          341k58572681




          341k58572681








          • 1




            Good catch with COALESCE
            – Lukasz Szozda
            34 mins ago














          • 1




            Good catch with COALESCE
            – Lukasz Szozda
            34 mins ago








          1




          1




          Good catch with COALESCE
          – Lukasz Szozda
          34 mins ago




          Good catch with COALESCE
          – Lukasz Szozda
          34 mins ago













          4














          This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.



          In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.



          The rest of the answer illustrates what is happening.



          The code behaves as expected with type conversions:



          SELECT *, BINARY_CHECKSUM(a, b) AS bc
          FROM (VALUES(1, cast(NULL as int), 100),
          (2, NULL, NULL)
          ) s(id,a,b);


          Here is a db<>fiddle.



          Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.



          This is really, really strange. Consider the following:



          select v.*, checksum(a, b), checksum(c,b)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, 1, 2, 1.0)
          ) v(id, a, b, c);


          The change in type for "d" affects the binary_checksum() for the second row, but not for the first.



          This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.



          You can validate this by seeing the error when you run:



          select v.*, binary_checksum(a)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, NULL, 2, 1.0)
          ) v( id,a, b, c);


          It complains:




          Argument data type NULL is invalid for argument 1 of checksum function.




          Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.



          The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.



          I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.






          share|improve this answer























          • In SQL Server default type for NULL is INT. It is a good observation
            – Lukasz Szozda
            1 hour ago








          • 1




            This doesn't happen when you use an actual table oddly with int datatypes so +1
            – scsimon
            1 hour ago








          • 3




            Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
            – Lukasz Szozda
            1 hour ago






          • 1




            @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
            – Gordon Linoff
            59 mins ago










          • @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
            – Lukasz Szozda
            35 mins ago
















          4














          This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.



          In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.



          The rest of the answer illustrates what is happening.



          The code behaves as expected with type conversions:



          SELECT *, BINARY_CHECKSUM(a, b) AS bc
          FROM (VALUES(1, cast(NULL as int), 100),
          (2, NULL, NULL)
          ) s(id,a,b);


          Here is a db<>fiddle.



          Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.



          This is really, really strange. Consider the following:



          select v.*, checksum(a, b), checksum(c,b)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, 1, 2, 1.0)
          ) v(id, a, b, c);


          The change in type for "d" affects the binary_checksum() for the second row, but not for the first.



          This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.



          You can validate this by seeing the error when you run:



          select v.*, binary_checksum(a)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, NULL, 2, 1.0)
          ) v( id,a, b, c);


          It complains:




          Argument data type NULL is invalid for argument 1 of checksum function.




          Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.



          The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.



          I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.






          share|improve this answer























          • In SQL Server default type for NULL is INT. It is a good observation
            – Lukasz Szozda
            1 hour ago








          • 1




            This doesn't happen when you use an actual table oddly with int datatypes so +1
            – scsimon
            1 hour ago








          • 3




            Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
            – Lukasz Szozda
            1 hour ago






          • 1




            @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
            – Gordon Linoff
            59 mins ago










          • @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
            – Lukasz Szozda
            35 mins ago














          4












          4








          4






          This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.



          In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.



          The rest of the answer illustrates what is happening.



          The code behaves as expected with type conversions:



          SELECT *, BINARY_CHECKSUM(a, b) AS bc
          FROM (VALUES(1, cast(NULL as int), 100),
          (2, NULL, NULL)
          ) s(id,a,b);


          Here is a db<>fiddle.



          Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.



          This is really, really strange. Consider the following:



          select v.*, checksum(a, b), checksum(c,b)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, 1, 2, 1.0)
          ) v(id, a, b, c);


          The change in type for "d" affects the binary_checksum() for the second row, but not for the first.



          This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.



          You can validate this by seeing the error when you run:



          select v.*, binary_checksum(a)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, NULL, 2, 1.0)
          ) v( id,a, b, c);


          It complains:




          Argument data type NULL is invalid for argument 1 of checksum function.




          Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.



          The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.



          I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.






          share|improve this answer














          This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.



          In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.



          The rest of the answer illustrates what is happening.



          The code behaves as expected with type conversions:



          SELECT *, BINARY_CHECKSUM(a, b) AS bc
          FROM (VALUES(1, cast(NULL as int), 100),
          (2, NULL, NULL)
          ) s(id,a,b);


          Here is a db<>fiddle.



          Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.



          This is really, really strange. Consider the following:



          select v.*, checksum(a, b), checksum(c,b)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, 1, 2, 1.0)
          ) v(id, a, b, c);


          The change in type for "d" affects the binary_checksum() for the second row, but not for the first.



          This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.



          You can validate this by seeing the error when you run:



          select v.*, binary_checksum(a)
          FROM (VALUES(1, NULL, 100, NULL),
          (2, NULL, 2, 1.0)
          ) v( id,a, b, c);


          It complains:




          Argument data type NULL is invalid for argument 1 of checksum function.




          Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.



          The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.



          I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 20 mins ago

























          answered 1 hour ago









          Gordon Linoff

          758k35291399




          758k35291399












          • In SQL Server default type for NULL is INT. It is a good observation
            – Lukasz Szozda
            1 hour ago








          • 1




            This doesn't happen when you use an actual table oddly with int datatypes so +1
            – scsimon
            1 hour ago








          • 3




            Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
            – Lukasz Szozda
            1 hour ago






          • 1




            @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
            – Gordon Linoff
            59 mins ago










          • @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
            – Lukasz Szozda
            35 mins ago


















          • In SQL Server default type for NULL is INT. It is a good observation
            – Lukasz Szozda
            1 hour ago








          • 1




            This doesn't happen when you use an actual table oddly with int datatypes so +1
            – scsimon
            1 hour ago








          • 3




            Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
            – Lukasz Szozda
            1 hour ago






          • 1




            @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
            – Gordon Linoff
            59 mins ago










          • @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
            – Lukasz Szozda
            35 mins ago
















          In SQL Server default type for NULL is INT. It is a good observation
          – Lukasz Szozda
          1 hour ago






          In SQL Server default type for NULL is INT. It is a good observation
          – Lukasz Szozda
          1 hour ago






          1




          1




          This doesn't happen when you use an actual table oddly with int datatypes so +1
          – scsimon
          1 hour ago






          This doesn't happen when you use an actual table oddly with int datatypes so +1
          – scsimon
          1 hour ago






          3




          3




          Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
          – Lukasz Szozda
          1 hour ago




          Datatypes for all cases demo As I thouht all ints - so it looks like a glitch for me
          – Lukasz Szozda
          1 hour ago




          1




          1




          @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
          – Gordon Linoff
          59 mins ago




          @scsimon . . . I am perplexed. According to the information schema tables, the types are exactly the same for all NULL values and when there is a number constant. However, the cast() works in this example.
          – Gordon Linoff
          59 mins ago












          @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
          – Lukasz Szozda
          35 mins ago




          @GordonLinoff Thank you for your insight Gordon. I will definitely return to this in 2019.
          – Lukasz Szozda
          35 mins ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988972%2fbinary-checkum-different-result-depending-on-number-of-rows%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Plaza Victoria

          In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

          How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...