Compute ratio of a rectangle seen from an unknown perspective












4












$begingroup$


TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?



Details:



From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:



lines



From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.



My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.



For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.



So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?



For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 9:53










  • $begingroup$
    @dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 10:03










  • $begingroup$
    That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 10:38










  • $begingroup$
    No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
    $endgroup$
    – tpb261
    Jun 26 '15 at 10:56






  • 1




    $begingroup$
    I'm OK with anything giving me the two different possible realities, as they clearly exists :)
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 20:23
















4












$begingroup$


TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?



Details:



From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:



lines



From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.



My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.



For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.



So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?



For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 9:53










  • $begingroup$
    @dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 10:03










  • $begingroup$
    That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 10:38










  • $begingroup$
    No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
    $endgroup$
    – tpb261
    Jun 26 '15 at 10:56






  • 1




    $begingroup$
    I'm OK with anything giving me the two different possible realities, as they clearly exists :)
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 20:23














4












4








4


2



$begingroup$


TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?



Details:



From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:



lines



From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.



My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.



For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.



So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?



For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder










share|cite|improve this question











$endgroup$




TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?



Details:



From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:



lines



From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.



My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.



For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.



So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?



For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder







geometry 3d rectangles






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 24 '18 at 9:24









Glorfindel

3,41381930




3,41381930










asked Jun 26 '15 at 9:41









Julien PalardJulien Palard

1325




1325








  • 1




    $begingroup$
    In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 9:53










  • $begingroup$
    @dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 10:03










  • $begingroup$
    That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 10:38










  • $begingroup$
    No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
    $endgroup$
    – tpb261
    Jun 26 '15 at 10:56






  • 1




    $begingroup$
    I'm OK with anything giving me the two different possible realities, as they clearly exists :)
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 20:23














  • 1




    $begingroup$
    In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 9:53










  • $begingroup$
    @dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 10:03










  • $begingroup$
    That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
    $endgroup$
    – dtldarek
    Jun 26 '15 at 10:38










  • $begingroup$
    No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
    $endgroup$
    – tpb261
    Jun 26 '15 at 10:56






  • 1




    $begingroup$
    I'm OK with anything giving me the two different possible realities, as they clearly exists :)
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 20:23








1




1




$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53




$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53












$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03




$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03












$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38




$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38












$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56




$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56




1




1




$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23




$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23










2 Answers
2






active

oldest

votes


















3












$begingroup$

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.



https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/




Rectifying a Document



We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:



In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:




  • the 3D location of the camera relative to the document (3 degrees of freedom),

  • the 3D orientation of the camera relative to the document (3 degrees of freedom),

  • the dimensions of the document (2 degrees of freedom), and

  • the focal length of the camera (1 degree of freedom).


On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)



Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)



Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.







share|cite|improve this answer









$endgroup$













  • $begingroup$
    Wah, nice link, thanks!
    $endgroup$
    – Julien Palard
    Jan 13 '17 at 10:12



















0












$begingroup$

Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.



Determining measurements in perspective






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
    $endgroup$
    – Hagen von Eitzen
    Jun 26 '15 at 12:54










  • $begingroup$
    I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 13:30










  • $begingroup$
    Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
    $endgroup$
    – David K
    Jun 26 '15 at 20:58










  • $begingroup$
    If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
    $endgroup$
    – amd
    Aug 19 '17 at 6:55










  • $begingroup$
    I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
    $endgroup$
    – shelper
    Aug 29 '17 at 18:05












Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1339924%2fcompute-ratio-of-a-rectangle-seen-from-an-unknown-perspective%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.



https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/




Rectifying a Document



We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:



In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:




  • the 3D location of the camera relative to the document (3 degrees of freedom),

  • the 3D orientation of the camera relative to the document (3 degrees of freedom),

  • the dimensions of the document (2 degrees of freedom), and

  • the focal length of the camera (1 degree of freedom).


On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)



Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)



Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.







share|cite|improve this answer









$endgroup$













  • $begingroup$
    Wah, nice link, thanks!
    $endgroup$
    – Julien Palard
    Jan 13 '17 at 10:12
















3












$begingroup$

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.



https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/




Rectifying a Document



We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:



In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:




  • the 3D location of the camera relative to the document (3 degrees of freedom),

  • the 3D orientation of the camera relative to the document (3 degrees of freedom),

  • the dimensions of the document (2 degrees of freedom), and

  • the focal length of the camera (1 degree of freedom).


On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)



Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)



Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.







share|cite|improve this answer









$endgroup$













  • $begingroup$
    Wah, nice link, thanks!
    $endgroup$
    – Julien Palard
    Jan 13 '17 at 10:12














3












3








3





$begingroup$

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.



https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/




Rectifying a Document



We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:



In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:




  • the 3D location of the camera relative to the document (3 degrees of freedom),

  • the 3D orientation of the camera relative to the document (3 degrees of freedom),

  • the dimensions of the document (2 degrees of freedom), and

  • the focal length of the camera (1 degree of freedom).


On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)



Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)



Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.







share|cite|improve this answer









$endgroup$



Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.



https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/




Rectifying a Document



We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:



In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:




  • the 3D location of the camera relative to the document (3 degrees of freedom),

  • the 3D orientation of the camera relative to the document (3 degrees of freedom),

  • the dimensions of the document (2 degrees of freedom), and

  • the focal length of the camera (1 degree of freedom).


On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)



Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)



Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.








share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jan 12 '17 at 20:34









adiusadius

1314




1314












  • $begingroup$
    Wah, nice link, thanks!
    $endgroup$
    – Julien Palard
    Jan 13 '17 at 10:12


















  • $begingroup$
    Wah, nice link, thanks!
    $endgroup$
    – Julien Palard
    Jan 13 '17 at 10:12
















$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12




$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12











0












$begingroup$

Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.



Determining measurements in perspective






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
    $endgroup$
    – Hagen von Eitzen
    Jun 26 '15 at 12:54










  • $begingroup$
    I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 13:30










  • $begingroup$
    Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
    $endgroup$
    – David K
    Jun 26 '15 at 20:58










  • $begingroup$
    If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
    $endgroup$
    – amd
    Aug 19 '17 at 6:55










  • $begingroup$
    I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
    $endgroup$
    – shelper
    Aug 29 '17 at 18:05
















0












$begingroup$

Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.



Determining measurements in perspective






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
    $endgroup$
    – Hagen von Eitzen
    Jun 26 '15 at 12:54










  • $begingroup$
    I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 13:30










  • $begingroup$
    Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
    $endgroup$
    – David K
    Jun 26 '15 at 20:58










  • $begingroup$
    If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
    $endgroup$
    – amd
    Aug 19 '17 at 6:55










  • $begingroup$
    I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
    $endgroup$
    – shelper
    Aug 29 '17 at 18:05














0












0








0





$begingroup$

Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.



Determining measurements in perspective






share|cite|improve this answer











$endgroup$



Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.



Determining measurements in perspective







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jun 26 '15 at 12:49

























answered Jun 26 '15 at 12:26









Hagen von EitzenHagen von Eitzen

284k23274508




284k23274508












  • $begingroup$
    Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
    $endgroup$
    – Hagen von Eitzen
    Jun 26 '15 at 12:54










  • $begingroup$
    I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 13:30










  • $begingroup$
    Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
    $endgroup$
    – David K
    Jun 26 '15 at 20:58










  • $begingroup$
    If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
    $endgroup$
    – amd
    Aug 19 '17 at 6:55










  • $begingroup$
    I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
    $endgroup$
    – shelper
    Aug 29 '17 at 18:05


















  • $begingroup$
    Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
    $endgroup$
    – Hagen von Eitzen
    Jun 26 '15 at 12:54










  • $begingroup$
    I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
    $endgroup$
    – Julien Palard
    Jun 26 '15 at 13:30










  • $begingroup$
    Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
    $endgroup$
    – David K
    Jun 26 '15 at 20:58










  • $begingroup$
    If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
    $endgroup$
    – amd
    Aug 19 '17 at 6:55










  • $begingroup$
    I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
    $endgroup$
    – shelper
    Aug 29 '17 at 18:05
















$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54




$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54












$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30




$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30












$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58




$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58












$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55




$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55












$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05




$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1339924%2fcompute-ratio-of-a-rectangle-seen-from-an-unknown-perspective%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Plaza Victoria

Brian Clough

Cáceres