Better performance for geohash aggregation with PostGIS
I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:
With bbox AS(
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM bbox
GROUP BY ST_GeoHash((the_geom),2)
output looks like this:
+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm |
+-------+------------+
| 3 | up |
+-------+------------+
| ... | ... |
+-------+------------+
and this is the query plan:
"HashAggregate (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
" Group Key: st_geohash(bbox.the_geom, 2)"
" CTE bbox"
" -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
" Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
" Rows Removed by Filter: 18"
" Heap Blocks: exact=48141"
" -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
" Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"
Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.
{
"aggregations" : {
"zoomed-in" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "64.05, -29.79",
"bottom_right" : "16.38, 90.26"
}
}
},
"aggregations":{
"zoom1":{
"geohash_grid" : {
"field": "location",
"precision": 2,
"size": 100000
}
}
}
}
}
}
postgis postgresql json geohash elasticsearch
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:
With bbox AS(
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM bbox
GROUP BY ST_GeoHash((the_geom),2)
output looks like this:
+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm |
+-------+------------+
| 3 | up |
+-------+------------+
| ... | ... |
+-------+------------+
and this is the query plan:
"HashAggregate (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
" Group Key: st_geohash(bbox.the_geom, 2)"
" CTE bbox"
" -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
" Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
" Rows Removed by Filter: 18"
" Heap Blocks: exact=48141"
" -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
" Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"
Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.
{
"aggregations" : {
"zoomed-in" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "64.05, -29.79",
"bottom_right" : "16.38, 90.26"
}
}
},
"aggregations":{
"zoom1":{
"geohash_grid" : {
"field": "location",
"precision": 2,
"size": 100000
}
}
}
}
}
}
postgis postgresql json geohash elasticsearch
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:
With bbox AS(
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM bbox
GROUP BY ST_GeoHash((the_geom),2)
output looks like this:
+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm |
+-------+------------+
| 3 | up |
+-------+------------+
| ... | ... |
+-------+------------+
and this is the query plan:
"HashAggregate (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
" Group Key: st_geohash(bbox.the_geom, 2)"
" CTE bbox"
" -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
" Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
" Rows Removed by Filter: 18"
" Heap Blocks: exact=48141"
" -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
" Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"
Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.
{
"aggregations" : {
"zoomed-in" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "64.05, -29.79",
"bottom_right" : "16.38, 90.26"
}
}
},
"aggregations":{
"zoom1":{
"geohash_grid" : {
"field": "location",
"precision": 2,
"size": 100000
}
}
}
}
}
}
postgis postgresql json geohash elasticsearch
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:
With bbox AS(
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM bbox
GROUP BY ST_GeoHash((the_geom),2)
output looks like this:
+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm |
+-------+------------+
| 3 | up |
+-------+------------+
| ... | ... |
+-------+------------+
and this is the query plan:
"HashAggregate (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
" Group Key: st_geohash(bbox.the_geom, 2)"
" CTE bbox"
" -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
" Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
" Rows Removed by Filter: 18"
" Heap Blocks: exact=48141"
" -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
" Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"
Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.
{
"aggregations" : {
"zoomed-in" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "64.05, -29.79",
"bottom_right" : "16.38, 90.26"
}
}
},
"aggregations":{
"zoom1":{
"geohash_grid" : {
"field": "location",
"precision": 2,
"size": 100000
}
}
}
}
}
}
postgis postgresql json geohash elasticsearch
postgis postgresql json geohash elasticsearch
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited yesterday
gemo1011
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked yesterday
gemo1011gemo1011
153
153
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)
To allow more optimizations, move bbox into a view, or inline it as a subquery:
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM (
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
) AS bbox
GROUP BY ST_GeoHash((the_geom),2)
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315634%2fbetter-performance-for-geohash-aggregation-with-postgis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)
To allow more optimizations, move bbox into a view, or inline it as a subquery:
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM (
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
) AS bbox
GROUP BY ST_GeoHash((the_geom),2)
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
add a comment |
In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)
To allow more optimizations, move bbox into a view, or inline it as a subquery:
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM (
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
) AS bbox
GROUP BY ST_GeoHash((the_geom),2)
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
add a comment |
In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)
To allow more optimizations, move bbox into a view, or inline it as a subquery:
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM (
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
) AS bbox
GROUP BY ST_GeoHash((the_geom),2)
In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)
To allow more optimizations, move bbox into a view, or inline it as a subquery:
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM (
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
) AS bbox
GROUP BY ST_GeoHash((the_geom),2)
answered yesterday
CL.CL.
512412
512412
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
add a comment |
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
thank you for your answer. this speeds up the query for about 30%
– gemo1011
yesterday
add a comment |
gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Geographic Information Systems Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315634%2fbetter-performance-for-geohash-aggregation-with-postgis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown