Better performance for geohash aggregation with PostGIS












2















I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:



With bbox AS(
SELECT name, the_geom FROM geonames
WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
)
SELECT COUNT(name), ST_GeoHash((the_geom),2)
FROM bbox
GROUP BY ST_GeoHash((the_geom),2)


output looks like this:



+-------+------------+
| count | st_geohash |
+-------+------------+
| 34200 | tm |
+-------+------------+
| 3 | up |
+-------+------------+
| ... | ... |
+-------+------------+


and this is the query plan:



    "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
" Group Key: st_geohash(bbox.the_geom, 2)"
" CTE bbox"
" -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
" Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
" Rows Removed by Filter: 18"
" Heap Blocks: exact=48141"
" -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
" Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
" -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
"Planning time: 0.492 ms"
"Execution time: 5832.977 ms"


Is there a way to to increase the performence of this query ?
I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.



{
"aggregations" : {
"zoomed-in" : {
"filter" : {
"geo_bounding_box" : {
"location" : {
"top_left" : "64.05, -29.79",
"bottom_right" : "16.38, 90.26"
}
}
},
"aggregations":{
"zoom1":{
"geohash_grid" : {
"field": "location",
"precision": 2,
"size": 100000
}
}
}
}
}
}









share|improve this question









New contributor




gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    2















    I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:



    With bbox AS(
    SELECT name, the_geom FROM geonames
    WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
    )
    SELECT COUNT(name), ST_GeoHash((the_geom),2)
    FROM bbox
    GROUP BY ST_GeoHash((the_geom),2)


    output looks like this:



    +-------+------------+
    | count | st_geohash |
    +-------+------------+
    | 34200 | tm |
    +-------+------------+
    | 3 | up |
    +-------+------------+
    | ... | ... |
    +-------+------------+


    and this is the query plan:



        "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
    " Group Key: st_geohash(bbox.the_geom, 2)"
    " CTE bbox"
    " -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
    " Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
    " Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
    " Rows Removed by Filter: 18"
    " Heap Blocks: exact=48141"
    " -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
    " Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
    " -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
    "Planning time: 0.492 ms"
    "Execution time: 5832.977 ms"


    Is there a way to to increase the performence of this query ?
    I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.



    {
    "aggregations" : {
    "zoomed-in" : {
    "filter" : {
    "geo_bounding_box" : {
    "location" : {
    "top_left" : "64.05, -29.79",
    "bottom_right" : "16.38, 90.26"
    }
    }
    },
    "aggregations":{
    "zoom1":{
    "geohash_grid" : {
    "field": "location",
    "precision": 2,
    "size": 100000
    }
    }
    }
    }
    }
    }









    share|improve this question









    New contributor




    gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      2












      2








      2








      I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:



      With bbox AS(
      SELECT name, the_geom FROM geonames
      WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
      )
      SELECT COUNT(name), ST_GeoHash((the_geom),2)
      FROM bbox
      GROUP BY ST_GeoHash((the_geom),2)


      output looks like this:



      +-------+------------+
      | count | st_geohash |
      +-------+------------+
      | 34200 | tm |
      +-------+------------+
      | 3 | up |
      +-------+------------+
      | ... | ... |
      +-------+------------+


      and this is the query plan:



          "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
      " Group Key: st_geohash(bbox.the_geom, 2)"
      " CTE bbox"
      " -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
      " Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
      " Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
      " Rows Removed by Filter: 18"
      " Heap Blocks: exact=48141"
      " -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
      " Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
      " -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
      "Planning time: 0.492 ms"
      "Execution time: 5832.977 ms"


      Is there a way to to increase the performence of this query ?
      I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.



      {
      "aggregations" : {
      "zoomed-in" : {
      "filter" : {
      "geo_bounding_box" : {
      "location" : {
      "top_left" : "64.05, -29.79",
      "bottom_right" : "16.38, 90.26"
      }
      }
      },
      "aggregations":{
      "zoom1":{
      "geohash_grid" : {
      "field": "location",
      "precision": 2,
      "size": 100000
      }
      }
      }
      }
      }
      }









      share|improve this question









      New contributor




      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I'm using the GeoNames (https://www.geonames.org/) dataset and want to aggregate the points in geohash of a specific precision. Beforehand I'm filtering with an bbox. So this is the query I came up with:



      With bbox AS(
      SELECT name, the_geom FROM geonames
      WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
      )
      SELECT COUNT(name), ST_GeoHash((the_geom),2)
      FROM bbox
      GROUP BY ST_GeoHash((the_geom),2)


      output looks like this:



      +-------+------------+
      | count | st_geohash |
      +-------+------------+
      | 34200 | tm |
      +-------+------------+
      | 3 | up |
      +-------+------------+
      | ... | ... |
      +-------+------------+


      and this is the query plan:



          "HashAggregate  (cost=24426.50..24429.00 rows=200 width=40) (actual time=5805.214..5805.229 rows=121 loops=1)"
      " Group Key: st_geohash(bbox.the_geom, 2)"
      " CTE bbox"
      " -> Bitmap Heap Scan on geonames (cost=376.34..24317.79 rows=3953 width=46) (actual time=454.394..2950.692 rows=3349419 loops=1)"
      " Recheck Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
      " Filter: _st_contains('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry, the_geom)"
      " Rows Removed by Filter: 18"
      " Heap Blocks: exact=48141"
      " -> Bitmap Index Scan on idx_geonames_geom (cost=0.00..375.35 rows=11858 width=0) (actual time=444.950..444.950 rows=3349437 loops=1)"
      " Index Cond: ('0103000020E610000001000000050000000AD7A3703DCA3DC0E17A14AE476130400AD7A3703DCA3DC0713D0AD7A39056403333333333035040713D0AD7A39056403333333333035040E17A14AE476130400AD7A3703DCA3DC0E17A14AE47613040'::geometry ~ the_geom)"
      " -> CTE Scan on bbox (cost=0.00..88.94 rows=3953 width=64) (actual time=454.401..5030.976 rows=3349419 loops=1)"
      "Planning time: 0.492 ms"
      "Execution time: 5832.977 ms"


      Is there a way to to increase the performence of this query ?
      I'm also testing the same thing with Elasticsearch 6.6 and there the query with the same output is a lot faster.



      {
      "aggregations" : {
      "zoomed-in" : {
      "filter" : {
      "geo_bounding_box" : {
      "location" : {
      "top_left" : "64.05, -29.79",
      "bottom_right" : "16.38, 90.26"
      }
      }
      },
      "aggregations":{
      "zoom1":{
      "geohash_grid" : {
      "field": "location",
      "precision": 2,
      "size": 100000
      }
      }
      }
      }
      }
      }






      postgis postgresql json geohash elasticsearch






      share|improve this question









      New contributor




      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited yesterday







      gemo1011













      New contributor




      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      gemo1011gemo1011

      153




      153




      New contributor




      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      gemo1011 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          2














          In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)



          To allow more optimizations, move bbox into a view, or inline it as a subquery:



          SELECT COUNT(name), ST_GeoHash((the_geom),2) 
          FROM (
          SELECT name, the_geom FROM geonames
          WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
          ) AS bbox
          GROUP BY ST_GeoHash((the_geom),2)





          share|improve this answer
























          • thank you for your answer. this speeds up the query for about 30%

            – gemo1011
            yesterday











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "79"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315634%2fbetter-performance-for-geohash-aggregation-with-postgis%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)



          To allow more optimizations, move bbox into a view, or inline it as a subquery:



          SELECT COUNT(name), ST_GeoHash((the_geom),2) 
          FROM (
          SELECT name, the_geom FROM geonames
          WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
          ) AS bbox
          GROUP BY ST_GeoHash((the_geom),2)





          share|improve this answer
























          • thank you for your answer. this speeds up the query for about 30%

            – gemo1011
            yesterday
















          2














          In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)



          To allow more optimizations, move bbox into a view, or inline it as a subquery:



          SELECT COUNT(name), ST_GeoHash((the_geom),2) 
          FROM (
          SELECT name, the_geom FROM geonames
          WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
          ) AS bbox
          GROUP BY ST_GeoHash((the_geom),2)





          share|improve this answer
























          • thank you for your answer. this speeds up the query for about 30%

            – gemo1011
            yesterday














          2












          2








          2







          In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)



          To allow more optimizations, move bbox into a view, or inline it as a subquery:



          SELECT COUNT(name), ST_GeoHash((the_geom),2) 
          FROM (
          SELECT name, the_geom FROM geonames
          WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
          ) AS bbox
          GROUP BY ST_GeoHash((the_geom),2)





          share|improve this answer













          In PostgreSQL, common table expressions are always materialized. (This will change in version 12.)



          To allow more optimizations, move bbox into a view, or inline it as a subquery:



          SELECT COUNT(name), ST_GeoHash((the_geom),2) 
          FROM (
          SELECT name, the_geom FROM geonames
          WHERE ST_Contains((ST_MakeEnvelope(-29.79, 16.38, 64.05, 90.26, 4326)), the_geom)
          ) AS bbox
          GROUP BY ST_GeoHash((the_geom),2)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered yesterday









          CL.CL.

          512412




          512412













          • thank you for your answer. this speeds up the query for about 30%

            – gemo1011
            yesterday



















          • thank you for your answer. this speeds up the query for about 30%

            – gemo1011
            yesterday

















          thank you for your answer. this speeds up the query for about 30%

          – gemo1011
          yesterday





          thank you for your answer. this speeds up the query for about 30%

          – gemo1011
          yesterday










          gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.













          gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.












          gemo1011 is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Geographic Information Systems Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315634%2fbetter-performance-for-geohash-aggregation-with-postgis%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Plaza Victoria

          Brian Clough

          Cáceres