Compute ratio of a rectangle seen from an unknown perspective
$begingroup$
TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?
Details:
From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:

From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.
My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.
For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.
So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?
For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder
geometry 3d rectangles
$endgroup$
|
show 3 more comments
$begingroup$
TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?
Details:
From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:

From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.
My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.
For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.
So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?
For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder
geometry 3d rectangles
$endgroup$
1
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
1
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23
|
show 3 more comments
$begingroup$
TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?
Details:
From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:

From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.
My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.
For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.
So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?
For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder
geometry 3d rectangles
$endgroup$
TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?
Details:
From a picture, and some opencv work (canny, hough lines, bucketing to tell appart "lines" and "columns", choosing interesting lines, math to deduce lines intersections), I get this:

From this step, it's easy to warp it to a "from the top" view, using opencv getPerspectiveTransform and wrapPerspective to "remove" the perspective, being on the top of the rectangle.
My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don't know the ratio it should have.
For this I have to give to getPerspectiveTransform the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like (0, 0), (0, 100), (100, 100), (100, 0) leading to a deformation if my 4 red points are not a square.
So is there a known way to compute the width/height ratio, or even better the size, of this "seen thrue a perspective rectangle" ?
For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder
geometry 3d rectangles
geometry 3d rectangles
edited Dec 24 '18 at 9:24
Glorfindel
3,41381930
3,41381930
asked Jun 26 '15 at 9:41
Julien PalardJulien Palard
1325
1325
1
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
1
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23
|
show 3 more comments
1
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
1
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23
1
1
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
1
1
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23
|
show 3 more comments
2 Answers
2
active
oldest
votes
$begingroup$
Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.
https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/
Rectifying a Document
We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:
In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:
- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)
Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)
Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.
$endgroup$
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
add a comment |
$begingroup$
Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

$endgroup$
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1339924%2fcompute-ratio-of-a-rectangle-seen-from-an-unknown-perspective%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.
https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/
Rectifying a Document
We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:
In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:
- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)
Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)
Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.
$endgroup$
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
add a comment |
$begingroup$
Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.
https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/
Rectifying a Document
We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:
In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:
- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)
Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)
Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.
$endgroup$
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
add a comment |
$begingroup$
Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.
https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/
Rectifying a Document
We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:
In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:
- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)
Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)
Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.
$endgroup$
Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.
https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/
Rectifying a Document
We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:
In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:
- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)
Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)
Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.
answered Jan 12 '17 at 20:34
adiusadius
1314
1314
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
add a comment |
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
$begingroup$
Wah, nice link, thanks!
$endgroup$
– Julien Palard
Jan 13 '17 at 10:12
add a comment |
$begingroup$
Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

$endgroup$
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
add a comment |
$begingroup$
Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

$endgroup$
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
add a comment |
$begingroup$
Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

$endgroup$
Yes, here's a pen and pencil method:
Find the points $P,Q$ where "parallel" sides interset. The line through $P,Q$ is the "horizon" of the plane containing the rect. Find $R$ such that $angle QRP=90^circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of "parallels" $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

edited Jun 26 '15 at 12:49
answered Jun 26 '15 at 12:26
Hagen von EitzenHagen von Eitzen
284k23274508
284k23274508
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
add a comment |
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
Somehow this still seems wrong as you should need to know where the center of the image is (i.e., the point the camera looks at). Hold on a bit ...
$endgroup$
– Hagen von Eitzen
Jun 26 '15 at 12:54
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
I'll try to implement it, not in the following hours, but I'll mark it as accepted if it works. However have you any idea why it works ? oO
$endgroup$
– Julien Palard
Jun 26 '15 at 13:30
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
Wouldn't any line parallel to $PQ$ give the same ratio between the two segments cut off by $angle APD$ and $angle AQB$?
$endgroup$
– David K
Jun 26 '15 at 20:58
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
If all you have is the the projected image of a rectangle, only affine properties of the rectangle’s plane can be recovered. To recover metric properties, we’d need to identify the image of the conic at infinity (or, equivalently in this case, the images of the circular points). At the very minimum, we’d need the images of another pair of orthogonal lines that aren’t parallel to the rectangle’s sides to do this.
$endgroup$
– amd
Aug 19 '17 at 6:55
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
$begingroup$
I do not understand the last step "BC,AD in points with distance proportional to the rectangle side lengths", why is that so?
$endgroup$
– shelper
Aug 29 '17 at 18:05
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1339924%2fcompute-ratio-of-a-rectangle-seen-from-an-unknown-perspective%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
In general no, consider an orthographic projection, then you won't be able to distinguish between an ordinary square facing you or some angled rectangle. Now, even for a non-ortographic projection you can have distances/coefficients big enough that the slight differences will vanish because of pixels/rasterization. In other words this will effectively look like an ortographic projection and you won't be able to guess the ratio.
$endgroup$
– dtldarek
Jun 26 '15 at 9:53
$begingroup$
@dtldarek If I understand well, the slightest error in the coordinates of my red points will always yield to "impossible perspectives", forbiding the deduction of it ?
$endgroup$
– Julien Palard
Jun 26 '15 at 10:03
$begingroup$
That was not my point, consider square $(-1,-1,a), (1,-1,a), (1,1,a), (-1,1,a)$ and a rectangle $(-1,-1,a), (1,-1,a), (1,1,a+10), (-1,1,a+10)$ as seen from $(0,0,0)$ in direction $(0,0,1)$ where $a$ is just some positive number. These two are indistinguishable for an orthographic projection. Now, for any other projection that would render the square to a shape of size 100px x 100 px and 32bits of color (or any other constant number), you can make $a$ big enough, that they still won't be distinguishable.
$endgroup$
– dtldarek
Jun 26 '15 at 10:38
$begingroup$
No. That would be predicting "depth" of the from just two dimensional information - if I understand you question right.
$endgroup$
– tpb261
Jun 26 '15 at 10:56
1
$begingroup$
I'm OK with anything giving me the two different possible realities, as they clearly exists :)
$endgroup$
– Julien Palard
Jun 26 '15 at 20:23