Adding two IEEE754 floating-point representations and interpreting the result.












0














This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.



Per this online tool:




  • $1.111 = 00111111100011100011010100111111$

  • $2.222 = 01000000000011100011010100111111$


Summing these two together using signed binary addition, I get:



$0111 1111 1001 1100 0110 1010 0111 1110$



In hexadecimal, this is:



$7F9C6A7E$



And according to this other version of the tool, that corresponds to $NaN$.



What's going on here?










share|cite|improve this question






















  • You can't expect doing integer addition on floating-point representations to give meaningful results.
    – Henning Makholm
    Nov 25 at 1:01










  • How would I go about trying to do what I want to do here?
    – AleksandrH
    Nov 25 at 1:06










  • I have no idea what it is you want to do. Use floating-point addition rather than integer?
    – Henning Makholm
    Nov 25 at 1:07










  • Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
    – AleksandrH
    Nov 25 at 1:12










  • You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
    – saulspatz
    Nov 25 at 1:12
















0














This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.



Per this online tool:




  • $1.111 = 00111111100011100011010100111111$

  • $2.222 = 01000000000011100011010100111111$


Summing these two together using signed binary addition, I get:



$0111 1111 1001 1100 0110 1010 0111 1110$



In hexadecimal, this is:



$7F9C6A7E$



And according to this other version of the tool, that corresponds to $NaN$.



What's going on here?










share|cite|improve this question






















  • You can't expect doing integer addition on floating-point representations to give meaningful results.
    – Henning Makholm
    Nov 25 at 1:01










  • How would I go about trying to do what I want to do here?
    – AleksandrH
    Nov 25 at 1:06










  • I have no idea what it is you want to do. Use floating-point addition rather than integer?
    – Henning Makholm
    Nov 25 at 1:07










  • Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
    – AleksandrH
    Nov 25 at 1:12










  • You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
    – saulspatz
    Nov 25 at 1:12














0












0








0







This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.



Per this online tool:




  • $1.111 = 00111111100011100011010100111111$

  • $2.222 = 01000000000011100011010100111111$


Summing these two together using signed binary addition, I get:



$0111 1111 1001 1100 0110 1010 0111 1110$



In hexadecimal, this is:



$7F9C6A7E$



And according to this other version of the tool, that corresponds to $NaN$.



What's going on here?










share|cite|improve this question













This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.



Per this online tool:




  • $1.111 = 00111111100011100011010100111111$

  • $2.222 = 01000000000011100011010100111111$


Summing these two together using signed binary addition, I get:



$0111 1111 1001 1100 0110 1010 0111 1110$



In hexadecimal, this is:



$7F9C6A7E$



And according to this other version of the tool, that corresponds to $NaN$.



What's going on here?







binary floating-point






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Nov 25 at 0:53









AleksandrH

1,22221123




1,22221123












  • You can't expect doing integer addition on floating-point representations to give meaningful results.
    – Henning Makholm
    Nov 25 at 1:01










  • How would I go about trying to do what I want to do here?
    – AleksandrH
    Nov 25 at 1:06










  • I have no idea what it is you want to do. Use floating-point addition rather than integer?
    – Henning Makholm
    Nov 25 at 1:07










  • Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
    – AleksandrH
    Nov 25 at 1:12










  • You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
    – saulspatz
    Nov 25 at 1:12


















  • You can't expect doing integer addition on floating-point representations to give meaningful results.
    – Henning Makholm
    Nov 25 at 1:01










  • How would I go about trying to do what I want to do here?
    – AleksandrH
    Nov 25 at 1:06










  • I have no idea what it is you want to do. Use floating-point addition rather than integer?
    – Henning Makholm
    Nov 25 at 1:07










  • Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
    – AleksandrH
    Nov 25 at 1:12










  • You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
    – saulspatz
    Nov 25 at 1:12
















You can't expect doing integer addition on floating-point representations to give meaningful results.
– Henning Makholm
Nov 25 at 1:01




You can't expect doing integer addition on floating-point representations to give meaningful results.
– Henning Makholm
Nov 25 at 1:01












How would I go about trying to do what I want to do here?
– AleksandrH
Nov 25 at 1:06




How would I go about trying to do what I want to do here?
– AleksandrH
Nov 25 at 1:06












I have no idea what it is you want to do. Use floating-point addition rather than integer?
– Henning Makholm
Nov 25 at 1:07




I have no idea what it is you want to do. Use floating-point addition rather than integer?
– Henning Makholm
Nov 25 at 1:07












Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
– AleksandrH
Nov 25 at 1:12




Yes, I was under the impression that once I have the two floating-point numbers represented as binary strings, I could simply add them together bit by bit and then translate the resulting 32-bit string to decimal floating point. The IEEE754 standard defines conversions in both directions (binary to decimal and decimal to binary).
– AleksandrH
Nov 25 at 1:12












You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
– saulspatz
Nov 25 at 1:12




You have to adjust them so they have the same mantissa before you add them. You ought to read about what the IEEE754 representation is actually constructed.
– saulspatz
Nov 25 at 1:12










1 Answer
1






active

oldest

votes


















2














You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.



First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number
$$ 1.11099994182586669921875 $$
which is the closest representable number to $1.111$. This breaks up as



  0      01111111        00011100011010100111111
sign biased exponent fractional part of mantissa


and stands for the number
$$ 1.00011100011010100111111_2 times 2^{127-127} $$



The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:



   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even

0 10000000 10101010100111111011110
sign biased exp fractional mantissa


And the representation 01000000010101010100111111011110 corresponds to the number
$$ 3.332999706268310546875 $$
Note that this is not the closest representable number to $3.333$, which would be the next one,
$$ 3.33329999446868896484375 $$
but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.






share|cite|improve this answer























  • I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
    – AleksandrH
    Nov 25 at 1:43










  • @AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
    – Henning Makholm
    Nov 25 at 1:46










  • Yeah, I don't understand. Sorry for wasting your time.
    – AleksandrH
    Nov 25 at 14:06










  • @AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
    – Henning Makholm
    Nov 25 at 14:15











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3012295%2fadding-two-ieee754-floating-point-representations-and-interpreting-the-result%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.



First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number
$$ 1.11099994182586669921875 $$
which is the closest representable number to $1.111$. This breaks up as



  0      01111111        00011100011010100111111
sign biased exponent fractional part of mantissa


and stands for the number
$$ 1.00011100011010100111111_2 times 2^{127-127} $$



The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:



   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even

0 10000000 10101010100111111011110
sign biased exp fractional mantissa


And the representation 01000000010101010100111111011110 corresponds to the number
$$ 3.332999706268310546875 $$
Note that this is not the closest representable number to $3.333$, which would be the next one,
$$ 3.33329999446868896484375 $$
but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.






share|cite|improve this answer























  • I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
    – AleksandrH
    Nov 25 at 1:43










  • @AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
    – Henning Makholm
    Nov 25 at 1:46










  • Yeah, I don't understand. Sorry for wasting your time.
    – AleksandrH
    Nov 25 at 14:06










  • @AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
    – Henning Makholm
    Nov 25 at 14:15
















2














You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.



First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number
$$ 1.11099994182586669921875 $$
which is the closest representable number to $1.111$. This breaks up as



  0      01111111        00011100011010100111111
sign biased exponent fractional part of mantissa


and stands for the number
$$ 1.00011100011010100111111_2 times 2^{127-127} $$



The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:



   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even

0 10000000 10101010100111111011110
sign biased exp fractional mantissa


And the representation 01000000010101010100111111011110 corresponds to the number
$$ 3.332999706268310546875 $$
Note that this is not the closest representable number to $3.333$, which would be the next one,
$$ 3.33329999446868896484375 $$
but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.






share|cite|improve this answer























  • I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
    – AleksandrH
    Nov 25 at 1:43










  • @AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
    – Henning Makholm
    Nov 25 at 1:46










  • Yeah, I don't understand. Sorry for wasting your time.
    – AleksandrH
    Nov 25 at 14:06










  • @AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
    – Henning Makholm
    Nov 25 at 14:15














2












2








2






You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.



First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number
$$ 1.11099994182586669921875 $$
which is the closest representable number to $1.111$. This breaks up as



  0      01111111        00011100011010100111111
sign biased exponent fractional part of mantissa


and stands for the number
$$ 1.00011100011010100111111_2 times 2^{127-127} $$



The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:



   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even

0 10000000 10101010100111111011110
sign biased exp fractional mantissa


And the representation 01000000010101010100111111011110 corresponds to the number
$$ 3.332999706268310546875 $$
Note that this is not the closest representable number to $3.333$, which would be the next one,
$$ 3.33329999446868896484375 $$
but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.






share|cite|improve this answer














You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.



First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number
$$ 1.11099994182586669921875 $$
which is the closest representable number to $1.111$. This breaks up as



  0      01111111        00011100011010100111111
sign biased exponent fractional part of mantissa


and stands for the number
$$ 1.00011100011010100111111_2 times 2^{127-127} $$



The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:



   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
11.0101010100111111011110 <-- rounded to 1+23 bits mantissa using round-to-even

0 10000000 10101010100111111011110
sign biased exp fractional mantissa


And the representation 01000000010101010100111111011110 corresponds to the number
$$ 3.332999706268310546875 $$
Note that this is not the closest representable number to $3.333$, which would be the next one,
$$ 3.33329999446868896484375 $$
but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Nov 25 at 1:39

























answered Nov 25 at 1:23









Henning Makholm

238k16303537




238k16303537












  • I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
    – AleksandrH
    Nov 25 at 1:43










  • @AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
    – Henning Makholm
    Nov 25 at 1:46










  • Yeah, I don't understand. Sorry for wasting your time.
    – AleksandrH
    Nov 25 at 14:06










  • @AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
    – Henning Makholm
    Nov 25 at 14:15


















  • I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
    – AleksandrH
    Nov 25 at 1:43










  • @AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
    – Henning Makholm
    Nov 25 at 1:46










  • Yeah, I don't understand. Sorry for wasting your time.
    – AleksandrH
    Nov 25 at 14:06










  • @AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
    – Henning Makholm
    Nov 25 at 14:15
















I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
– AleksandrH
Nov 25 at 1:43




I followed this well until we got to the $10.00...$ part. Why did the decimal point move one place to the right?
– AleksandrH
Nov 25 at 1:43












@AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
– Henning Makholm
Nov 25 at 1:46




@AleksandrH: Because the second addend has a biased exponent of 10000000, so it represents the number $1.langlemathit{mantissa}rangle_2 times 2^{128-127}$ -- in other words the binary points is shifted one position to the right.
– Henning Makholm
Nov 25 at 1:46












Yeah, I don't understand. Sorry for wasting your time.
– AleksandrH
Nov 25 at 14:06




Yeah, I don't understand. Sorry for wasting your time.
– AleksandrH
Nov 25 at 14:06












@AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
– Henning Makholm
Nov 25 at 14:15




@AleksandrH: The job of the exponent is to encode where the binary point is. That's what makes the representation "floating point" -- you can move the point! In the $2.22$ representation the exponent is $1$ (after we subtract the fixed bias), meaning that the point is after one of the explicitly represented mantissa bits.
– Henning Makholm
Nov 25 at 14:15


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3012295%2fadding-two-ieee754-floating-point-representations-and-interpreting-the-result%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Plaza Victoria

In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...