Better performance for geohash aggregation with PostGIS

I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:

With bbox AS(

SELECT name, the_geom FROM geonames 

WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

)

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM bbox

   GROUP BY ST_GeoHash((the_geom),2)

output looks like this:

+-------+------------+

| count | st_geohash |

+-------+------------+

| 34200 | tm         |

+-------+------------+

| 3     | up         |

+-------+------------+

| ...   | ...        |

+-------+------------+

and this is the query plan:

    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"

"  Group Key: st_geohash(bbox.the_geom, 2)"

"  CTE bbox"

"    ->  Bitmap Heap Scan on geonames  (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"

"          Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"          Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"

"          Rows Removed by Filter: 18"

"          Heap Blocks: exact=48141"

"          ->  Bitmap Index Scan on idx_geonames_geom  (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"

"                Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"  ->  CTE Scan on bbox  (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"

"Planning time: 0.492 ms"

"Execution time: 5832.977 ms"

Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.

{

    "aggregations" : {

        "zoomed-in" : {

            "filter" : {

                "geo_bounding_box" : {

                    "location" : {

                        "top_left" : "64.05, -29.79",

                        "bottom_right" : "16.38, 90.26"

                    }

                }

            },

            "aggregations":{

                "zoom1":{

                    "geohash_grid" : {

                        "field": "location",

                        "precision": 2,

                        "size": 100000

                    }

                }

            }

        }

    }

}

edited yesterday

asked yesterday

gemo1011

153

New contributor

add a comment |

With bbox AS(

SELECT name, the_geom FROM geonames 

WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

)

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM bbox

   GROUP BY ST_GeoHash((the_geom),2)

output looks like this:

+-------+------------+

| count | st_geohash |

+-------+------------+

| 34200 | tm         |

+-------+------------+

| 3     | up         |

+-------+------------+

| ...   | ...        |

+-------+------------+

and this is the query plan:

    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"

"  Group Key: st_geohash(bbox.the_geom, 2)"

"  CTE bbox"

"    ->  Bitmap Heap Scan on geonames  (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"

"          Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"          Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"

"          Rows Removed by Filter: 18"

"          Heap Blocks: exact=48141"

"          ->  Bitmap Index Scan on idx_geonames_geom  (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"

"                Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"  ->  CTE Scan on bbox  (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"

"Planning time: 0.492 ms"

"Execution time: 5832.977 ms"

Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.

{

    "aggregations" : {

        "zoomed-in" : {

            "filter" : {

                "geo_bounding_box" : {

                    "location" : {

                        "top_left" : "64.05, -29.79",

                        "bottom_right" : "16.38, 90.26"

                    }

                }

            },

            "aggregations":{

                "zoom1":{

                    "geohash_grid" : {

                        "field": "location",

                        "precision": 2,

                        "size": 100000

                    }

                }

            }

        }

    }

}

edited yesterday

asked yesterday

gemo1011

153

New contributor

add a comment |

With bbox AS(

SELECT name, the_geom FROM geonames 

WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

)

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM bbox

   GROUP BY ST_GeoHash((the_geom),2)

output looks like this:

+-------+------------+

| count | st_geohash |

+-------+------------+

| 34200 | tm         |

+-------+------------+

| 3     | up         |

+-------+------------+

| ...   | ...        |

+-------+------------+

and this is the query plan:

    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"

"  Group Key: st_geohash(bbox.the_geom, 2)"

"  CTE bbox"

"    ->  Bitmap Heap Scan on geonames  (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"

"          Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"          Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"

"          Rows Removed by Filter: 18"

"          Heap Blocks: exact=48141"

"          ->  Bitmap Index Scan on idx_geonames_geom  (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"

"                Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"  ->  CTE Scan on bbox  (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"

"Planning time: 0.492 ms"

"Execution time: 5832.977 ms"

Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.

{

    "aggregations" : {

        "zoomed-in" : {

            "filter" : {

                "geo_bounding_box" : {

                    "location" : {

                        "top_left" : "64.05, -29.79",

                        "bottom_right" : "16.38, 90.26"

                    }

                }

            },

            "aggregations":{

                "zoom1":{

                    "geohash_grid" : {

                        "field": "location",

                        "precision": 2,

                        "size": 100000

                    }

                }

            }

        }

    }

}

edited yesterday

asked yesterday

gemo1011

153

New contributor

With bbox AS(

SELECT name, the_geom FROM geonames 

WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

)

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM bbox

   GROUP BY ST_GeoHash((the_geom),2)

output looks like this:

+-------+------------+

| count | st_geohash |

+-------+------------+

| 34200 | tm         |

+-------+------------+

| 3     | up         |

+-------+------------+

| ...   | ...        |

+-------+------------+

and this is the query plan:

    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"

"  Group Key: st_geohash(bbox.the_geom, 2)"

"  CTE bbox"

"    ->  Bitmap Heap Scan on geonames  (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"

"          Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"          Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"

"          Rows Removed by Filter: 18"

"          Heap Blocks: exact=48141"

"          ->  Bitmap Index Scan on idx_geonames_geom  (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"

"                Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"

"  ->  CTE Scan on bbox  (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"

"Planning time: 0.492 ms"

"Execution time: 5832.977 ms"

Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.

{

    "aggregations" : {

        "zoomed-in" : {

            "filter" : {

                "geo_bounding_box" : {

                    "location" : {

                        "top_left" : "64.05, -29.79",

                        "bottom_right" : "16.38, 90.26"

                    }

                }

            },

            "aggregations":{

                "zoom1":{

                    "geohash_grid" : {

                        "field": "location",

                        "precision": 2,

                        "size": 100000

                    }

                }

            }

        }

    }

}

postgis postgresql json geohash elasticsearch

edited yesterday

asked yesterday

gemo1011

153

New contributor

edited yesterday

asked yesterday

gemo1011

153

New contributor

edited yesterday

asked yesterday

gemo1011

153

New contributor

asked yesterday

gemo1011

153

asked yesterday

gemo1011

153

New contributor

gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM (

  SELECT name, the_geom FROM geonames 

  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

  ) AS bbox

GROUP BY ST_GeoHash((the_geom),2)

answered yesterday

CL.

512412

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315634%2fbetter-performance-for-geohash-aggregation-with-postgis%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM (

  SELECT name, the_geom FROM geonames 

  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

  ) AS bbox

GROUP BY ST_GeoHash((the_geom),2)

answered yesterday

CL.

512412

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

add a comment |

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM (

  SELECT name, the_geom FROM geonames 

  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

  ) AS bbox

GROUP BY ST_GeoHash((the_geom),2)

answered yesterday

CL.

512412

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

add a comment |

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM (

  SELECT name, the_geom FROM geonames 

  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

  ) AS bbox

GROUP BY ST_GeoHash((the_geom),2)

answered yesterday

CL.

512412

In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)

To allow more optimizations, move bbox into a view, or inline it as a subquery:

SELECT COUNT(name), ST_GeoHash((the_geom),2) 

FROM (

  SELECT name, the_geom FROM geonames 

  WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)

  ) AS bbox

GROUP BY ST_GeoHash((the_geom),2)

answered yesterday

CL.

512412

answered yesterday

CL.

512412

answered yesterday

CL.

512412

answered yesterday

CL.

512412

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

add a comment |

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

thank you for your answer. this speeds up the query for about 30%

– gemo1011
yesterday

add a comment |

gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Geographic Information Systems Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt