How can I restructure a table of points with repeating lengthy text ids?











up vote
3
down vote

favorite












I have a dataset that looks like this:



4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.226215,9.23175,244,1531840519
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,86,49.22584,9.229898,260,1531840525
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.225726,9.22834,263,1531840530
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,84,49.456019,11.026769,123,1531840479
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,82,49.455454,11.02813,122,1531840484
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,75,49.454935,11.029419,121,1531840489


Multiple tracks of cars of a car-sharing service. The first entry is a random ID that marks one rental. (I guess something like hash(user, car, RNG) ). Every row consists of





  • ID, speed, lat, lng, orientation (deg), epoch


...but the ID column takes up at least half of the storage. (Don't ask me why this format was chosen).



I want to keep the IDs, but I'm not sure how to do it. (In PostgreSQL with Django)



My idea is to use one table that only contains the ID strings and the second table that contains all the other information and a foreign key on the first table.



I just wonder if there is already a feature for this problem directly within PostgreSQL as I can imagine that multiple repetitions of a string in a column is a common problem.



Update:




  • I worked my way through the full data set, and found only 439252 unique ID Strings.


  • BigInt + @@@ + uuid: Thanks for pointing that out, however, I will not be able to exploit it as the string-format is not fixed throughout the whole data dump. (I had a manual look at the first thousand or so entries and they use it, but when I tried to implement the approach, I found other formats also.)











share|improve this question




















  • 2




    How much storage are we talking about that it requires action?
    – mustaccio
    Nov 28 at 17:53






  • 2




    What's the point? You'll need more space and more work to maintain both tables.
    – McNets
    Nov 28 at 18:02






  • 1




    One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
    – McNets
    Nov 28 at 18:21








  • 2




    @McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
    – Anthony Genovese
    Nov 28 at 18:24






  • 1




    But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
    – McNets
    Nov 28 at 18:31















up vote
3
down vote

favorite












I have a dataset that looks like this:



4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.226215,9.23175,244,1531840519
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,86,49.22584,9.229898,260,1531840525
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.225726,9.22834,263,1531840530
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,84,49.456019,11.026769,123,1531840479
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,82,49.455454,11.02813,122,1531840484
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,75,49.454935,11.029419,121,1531840489


Multiple tracks of cars of a car-sharing service. The first entry is a random ID that marks one rental. (I guess something like hash(user, car, RNG) ). Every row consists of





  • ID, speed, lat, lng, orientation (deg), epoch


...but the ID column takes up at least half of the storage. (Don't ask me why this format was chosen).



I want to keep the IDs, but I'm not sure how to do it. (In PostgreSQL with Django)



My idea is to use one table that only contains the ID strings and the second table that contains all the other information and a foreign key on the first table.



I just wonder if there is already a feature for this problem directly within PostgreSQL as I can imagine that multiple repetitions of a string in a column is a common problem.



Update:




  • I worked my way through the full data set, and found only 439252 unique ID Strings.


  • BigInt + @@@ + uuid: Thanks for pointing that out, however, I will not be able to exploit it as the string-format is not fixed throughout the whole data dump. (I had a manual look at the first thousand or so entries and they use it, but when I tried to implement the approach, I found other formats also.)











share|improve this question




















  • 2




    How much storage are we talking about that it requires action?
    – mustaccio
    Nov 28 at 17:53






  • 2




    What's the point? You'll need more space and more work to maintain both tables.
    – McNets
    Nov 28 at 18:02






  • 1




    One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
    – McNets
    Nov 28 at 18:21








  • 2




    @McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
    – Anthony Genovese
    Nov 28 at 18:24






  • 1




    But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
    – McNets
    Nov 28 at 18:31













up vote
3
down vote

favorite









up vote
3
down vote

favorite











I have a dataset that looks like this:



4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.226215,9.23175,244,1531840519
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,86,49.22584,9.229898,260,1531840525
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.225726,9.22834,263,1531840530
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,84,49.456019,11.026769,123,1531840479
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,82,49.455454,11.02813,122,1531840484
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,75,49.454935,11.029419,121,1531840489


Multiple tracks of cars of a car-sharing service. The first entry is a random ID that marks one rental. (I guess something like hash(user, car, RNG) ). Every row consists of





  • ID, speed, lat, lng, orientation (deg), epoch


...but the ID column takes up at least half of the storage. (Don't ask me why this format was chosen).



I want to keep the IDs, but I'm not sure how to do it. (In PostgreSQL with Django)



My idea is to use one table that only contains the ID strings and the second table that contains all the other information and a foreign key on the first table.



I just wonder if there is already a feature for this problem directly within PostgreSQL as I can imagine that multiple repetitions of a string in a column is a common problem.



Update:




  • I worked my way through the full data set, and found only 439252 unique ID Strings.


  • BigInt + @@@ + uuid: Thanks for pointing that out, however, I will not be able to exploit it as the string-format is not fixed throughout the whole data dump. (I had a manual look at the first thousand or so entries and they use it, but when I tried to implement the approach, I found other formats also.)











share|improve this question















I have a dataset that looks like this:



4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.226215,9.23175,244,1531840519
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,86,49.22584,9.229898,260,1531840525
4cd1c79bf6a87692@@@b6aa5eeb-8a83-433f-a0ea-6c72708abc1d,82,49.225726,9.22834,263,1531840530
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,84,49.456019,11.026769,123,1531840479
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,82,49.455454,11.02813,122,1531840484
ff9d6b955f214ff0@@@599e1d04-7fc4-4ce4-8c92-22e9d7ac6596,75,49.454935,11.029419,121,1531840489


Multiple tracks of cars of a car-sharing service. The first entry is a random ID that marks one rental. (I guess something like hash(user, car, RNG) ). Every row consists of





  • ID, speed, lat, lng, orientation (deg), epoch


...but the ID column takes up at least half of the storage. (Don't ask me why this format was chosen).



I want to keep the IDs, but I'm not sure how to do it. (In PostgreSQL with Django)



My idea is to use one table that only contains the ID strings and the second table that contains all the other information and a foreign key on the first table.



I just wonder if there is already a feature for this problem directly within PostgreSQL as I can imagine that multiple repetitions of a string in a column is a common problem.



Update:




  • I worked my way through the full data set, and found only 439252 unique ID Strings.


  • BigInt + @@@ + uuid: Thanks for pointing that out, however, I will not be able to exploit it as the string-format is not fixed throughout the whole data dump. (I had a manual look at the first thousand or so entries and they use it, but when I tried to implement the approach, I found other formats also.)








postgresql database-design






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 at 10:39

























asked Nov 28 at 17:31









FooTheBar

1163




1163








  • 2




    How much storage are we talking about that it requires action?
    – mustaccio
    Nov 28 at 17:53






  • 2




    What's the point? You'll need more space and more work to maintain both tables.
    – McNets
    Nov 28 at 18:02






  • 1




    One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
    – McNets
    Nov 28 at 18:21








  • 2




    @McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
    – Anthony Genovese
    Nov 28 at 18:24






  • 1




    But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
    – McNets
    Nov 28 at 18:31














  • 2




    How much storage are we talking about that it requires action?
    – mustaccio
    Nov 28 at 17:53






  • 2




    What's the point? You'll need more space and more work to maintain both tables.
    – McNets
    Nov 28 at 18:02






  • 1




    One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
    – McNets
    Nov 28 at 18:21








  • 2




    @McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
    – Anthony Genovese
    Nov 28 at 18:24






  • 1




    But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
    – McNets
    Nov 28 at 18:31








2




2




How much storage are we talking about that it requires action?
– mustaccio
Nov 28 at 17:53




How much storage are we talking about that it requires action?
– mustaccio
Nov 28 at 17:53




2




2




What's the point? You'll need more space and more work to maintain both tables.
– McNets
Nov 28 at 18:02




What's the point? You'll need more space and more work to maintain both tables.
– McNets
Nov 28 at 18:02




1




1




One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
– McNets
Nov 28 at 18:21






One table for Id's+Surrogate Key + Index, Current table + New PK. Is there a chance to change current Id's by some serial field?
– McNets
Nov 28 at 18:21






2




2




@McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
– Anthony Genovese
Nov 28 at 18:24




@McNets but wouldn't storing that ID only once, and the ID being something like an integer, reduce storage? Then you are only storing that long ID 1 time vs 3 times, like in the example above.
– Anthony Genovese
Nov 28 at 18:24




1




1




But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
– McNets
Nov 28 at 18:31




But you still have these ID's in one table >>> same disk space, plus additional space for the new key. And keep in mind you'll need 2 reads or 2 writes to select/insert that table. You would save space if you had more than one table
– McNets
Nov 28 at 18:31










2 Answers
2






active

oldest

votes

















up vote
4
down vote













First, you shouldn't use those IDs in the database. To start off they're clearly a compound ID, with




  • bigint (hex encoded)

  • uuid


The @@@ is a text-token. That means right off the bat you can make your table substantially smaller very easy.



ALTER TABLE foo
ADD COLUMN id_uuid uuid,
ADD COLUMN id_num bigint;


Now you need to populate it. (essentially we make sure you have 16 characters, prepend an x and then cast it to a bigint by proxy of ::bit(64))



UPDATE foo
SET id_num =
(
'x'
|| lpad(
(regexp_match(id, '(.*?)@'))[0]
, 16
, '0'
)
)::bit(64)::bigint,
id_uuid = (regexp_match(id, '.*@(.*)$'))[1]::uuid;


After this you could drop the original text id column. That's what you should do when you commit to a direction.



Now you have two id columns, but one of them is 16 bytes, the other is 8 bytes. compared to one text column that is 55 bytes (plus varlena overhead which is like 6 byts for you).



What you really want to do though is rewrite the table with these two columns at the start because that'll impact physical layout and make a big difference. Google for PostgreSQL Tetris for the reasons there..



At the point that you have two ids, and you can start to understand the data, the questions emerge.




  1. Are both ids necessary, can I drop one and will it make it make difference

  2. If they're both neccessary, can I store both of them as external vendor ids in a table and assign my own simplier int4 ids?


Depending on the answers to those questions you may want to structure some triggers so on insertion something (like a function you create) takes a text argument, decodes it into a composite type of (uuid,int) and assigns them to the new columns in one pass.



Moreover, you have another massive question here. This is a spatial problem. You have GPS coordinates. If you're talking about billion of rows, you may consider using a table like this until the routes are complete, and then migrating to a GIS solution with LINESTRINGS and PostGIS. Then you can map the routes in the database, and find what roads were traveled, how many miles the routes were, and place them on a map (or do other fancy goodness). The GIS solution makes the points columnar and gives you the ability to index them too.



You may want a consultant.






share|improve this answer






























    up vote
    1
    down vote














    I have around around 1.4Billion (1449000000) datapoints. So in uncompressed csv it's around 120GB.




    At this scale, and given that an average track has 50 or more points, it's probably worth a shot. While storage itself is cheap these days, you're also improving physical I/O and buffer pool efficiency, squeezing more records into every page.






    share|improve this answer





















      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "182"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223681%2fhow-can-i-restructure-a-table-of-points-with-repeating-lengthy-text-ids%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      4
      down vote













      First, you shouldn't use those IDs in the database. To start off they're clearly a compound ID, with




      • bigint (hex encoded)

      • uuid


      The @@@ is a text-token. That means right off the bat you can make your table substantially smaller very easy.



      ALTER TABLE foo
      ADD COLUMN id_uuid uuid,
      ADD COLUMN id_num bigint;


      Now you need to populate it. (essentially we make sure you have 16 characters, prepend an x and then cast it to a bigint by proxy of ::bit(64))



      UPDATE foo
      SET id_num =
      (
      'x'
      || lpad(
      (regexp_match(id, '(.*?)@'))[0]
      , 16
      , '0'
      )
      )::bit(64)::bigint,
      id_uuid = (regexp_match(id, '.*@(.*)$'))[1]::uuid;


      After this you could drop the original text id column. That's what you should do when you commit to a direction.



      Now you have two id columns, but one of them is 16 bytes, the other is 8 bytes. compared to one text column that is 55 bytes (plus varlena overhead which is like 6 byts for you).



      What you really want to do though is rewrite the table with these two columns at the start because that'll impact physical layout and make a big difference. Google for PostgreSQL Tetris for the reasons there..



      At the point that you have two ids, and you can start to understand the data, the questions emerge.




      1. Are both ids necessary, can I drop one and will it make it make difference

      2. If they're both neccessary, can I store both of them as external vendor ids in a table and assign my own simplier int4 ids?


      Depending on the answers to those questions you may want to structure some triggers so on insertion something (like a function you create) takes a text argument, decodes it into a composite type of (uuid,int) and assigns them to the new columns in one pass.



      Moreover, you have another massive question here. This is a spatial problem. You have GPS coordinates. If you're talking about billion of rows, you may consider using a table like this until the routes are complete, and then migrating to a GIS solution with LINESTRINGS and PostGIS. Then you can map the routes in the database, and find what roads were traveled, how many miles the routes were, and place them on a map (or do other fancy goodness). The GIS solution makes the points columnar and gives you the ability to index them too.



      You may want a consultant.






      share|improve this answer



























        up vote
        4
        down vote













        First, you shouldn't use those IDs in the database. To start off they're clearly a compound ID, with




        • bigint (hex encoded)

        • uuid


        The @@@ is a text-token. That means right off the bat you can make your table substantially smaller very easy.



        ALTER TABLE foo
        ADD COLUMN id_uuid uuid,
        ADD COLUMN id_num bigint;


        Now you need to populate it. (essentially we make sure you have 16 characters, prepend an x and then cast it to a bigint by proxy of ::bit(64))



        UPDATE foo
        SET id_num =
        (
        'x'
        || lpad(
        (regexp_match(id, '(.*?)@'))[0]
        , 16
        , '0'
        )
        )::bit(64)::bigint,
        id_uuid = (regexp_match(id, '.*@(.*)$'))[1]::uuid;


        After this you could drop the original text id column. That's what you should do when you commit to a direction.



        Now you have two id columns, but one of them is 16 bytes, the other is 8 bytes. compared to one text column that is 55 bytes (plus varlena overhead which is like 6 byts for you).



        What you really want to do though is rewrite the table with these two columns at the start because that'll impact physical layout and make a big difference. Google for PostgreSQL Tetris for the reasons there..



        At the point that you have two ids, and you can start to understand the data, the questions emerge.




        1. Are both ids necessary, can I drop one and will it make it make difference

        2. If they're both neccessary, can I store both of them as external vendor ids in a table and assign my own simplier int4 ids?


        Depending on the answers to those questions you may want to structure some triggers so on insertion something (like a function you create) takes a text argument, decodes it into a composite type of (uuid,int) and assigns them to the new columns in one pass.



        Moreover, you have another massive question here. This is a spatial problem. You have GPS coordinates. If you're talking about billion of rows, you may consider using a table like this until the routes are complete, and then migrating to a GIS solution with LINESTRINGS and PostGIS. Then you can map the routes in the database, and find what roads were traveled, how many miles the routes were, and place them on a map (or do other fancy goodness). The GIS solution makes the points columnar and gives you the ability to index them too.



        You may want a consultant.






        share|improve this answer

























          up vote
          4
          down vote










          up vote
          4
          down vote









          First, you shouldn't use those IDs in the database. To start off they're clearly a compound ID, with




          • bigint (hex encoded)

          • uuid


          The @@@ is a text-token. That means right off the bat you can make your table substantially smaller very easy.



          ALTER TABLE foo
          ADD COLUMN id_uuid uuid,
          ADD COLUMN id_num bigint;


          Now you need to populate it. (essentially we make sure you have 16 characters, prepend an x and then cast it to a bigint by proxy of ::bit(64))



          UPDATE foo
          SET id_num =
          (
          'x'
          || lpad(
          (regexp_match(id, '(.*?)@'))[0]
          , 16
          , '0'
          )
          )::bit(64)::bigint,
          id_uuid = (regexp_match(id, '.*@(.*)$'))[1]::uuid;


          After this you could drop the original text id column. That's what you should do when you commit to a direction.



          Now you have two id columns, but one of them is 16 bytes, the other is 8 bytes. compared to one text column that is 55 bytes (plus varlena overhead which is like 6 byts for you).



          What you really want to do though is rewrite the table with these two columns at the start because that'll impact physical layout and make a big difference. Google for PostgreSQL Tetris for the reasons there..



          At the point that you have two ids, and you can start to understand the data, the questions emerge.




          1. Are both ids necessary, can I drop one and will it make it make difference

          2. If they're both neccessary, can I store both of them as external vendor ids in a table and assign my own simplier int4 ids?


          Depending on the answers to those questions you may want to structure some triggers so on insertion something (like a function you create) takes a text argument, decodes it into a composite type of (uuid,int) and assigns them to the new columns in one pass.



          Moreover, you have another massive question here. This is a spatial problem. You have GPS coordinates. If you're talking about billion of rows, you may consider using a table like this until the routes are complete, and then migrating to a GIS solution with LINESTRINGS and PostGIS. Then you can map the routes in the database, and find what roads were traveled, how many miles the routes were, and place them on a map (or do other fancy goodness). The GIS solution makes the points columnar and gives you the ability to index them too.



          You may want a consultant.






          share|improve this answer














          First, you shouldn't use those IDs in the database. To start off they're clearly a compound ID, with




          • bigint (hex encoded)

          • uuid


          The @@@ is a text-token. That means right off the bat you can make your table substantially smaller very easy.



          ALTER TABLE foo
          ADD COLUMN id_uuid uuid,
          ADD COLUMN id_num bigint;


          Now you need to populate it. (essentially we make sure you have 16 characters, prepend an x and then cast it to a bigint by proxy of ::bit(64))



          UPDATE foo
          SET id_num =
          (
          'x'
          || lpad(
          (regexp_match(id, '(.*?)@'))[0]
          , 16
          , '0'
          )
          )::bit(64)::bigint,
          id_uuid = (regexp_match(id, '.*@(.*)$'))[1]::uuid;


          After this you could drop the original text id column. That's what you should do when you commit to a direction.



          Now you have two id columns, but one of them is 16 bytes, the other is 8 bytes. compared to one text column that is 55 bytes (plus varlena overhead which is like 6 byts for you).



          What you really want to do though is rewrite the table with these two columns at the start because that'll impact physical layout and make a big difference. Google for PostgreSQL Tetris for the reasons there..



          At the point that you have two ids, and you can start to understand the data, the questions emerge.




          1. Are both ids necessary, can I drop one and will it make it make difference

          2. If they're both neccessary, can I store both of them as external vendor ids in a table and assign my own simplier int4 ids?


          Depending on the answers to those questions you may want to structure some triggers so on insertion something (like a function you create) takes a text argument, decodes it into a composite type of (uuid,int) and assigns them to the new columns in one pass.



          Moreover, you have another massive question here. This is a spatial problem. You have GPS coordinates. If you're talking about billion of rows, you may consider using a table like this until the routes are complete, and then migrating to a GIS solution with LINESTRINGS and PostGIS. Then you can map the routes in the database, and find what roads were traveled, how many miles the routes were, and place them on a map (or do other fancy goodness). The GIS solution makes the points columnar and gives you the ability to index them too.



          You may want a consultant.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 28 at 21:43

























          answered Nov 28 at 21:30









          Evan Carroll

          30.6k864200




          30.6k864200
























              up vote
              1
              down vote














              I have around around 1.4Billion (1449000000) datapoints. So in uncompressed csv it's around 120GB.




              At this scale, and given that an average track has 50 or more points, it's probably worth a shot. While storage itself is cheap these days, you're also improving physical I/O and buffer pool efficiency, squeezing more records into every page.






              share|improve this answer

























                up vote
                1
                down vote














                I have around around 1.4Billion (1449000000) datapoints. So in uncompressed csv it's around 120GB.




                At this scale, and given that an average track has 50 or more points, it's probably worth a shot. While storage itself is cheap these days, you're also improving physical I/O and buffer pool efficiency, squeezing more records into every page.






                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote










                  I have around around 1.4Billion (1449000000) datapoints. So in uncompressed csv it's around 120GB.




                  At this scale, and given that an average track has 50 or more points, it's probably worth a shot. While storage itself is cheap these days, you're also improving physical I/O and buffer pool efficiency, squeezing more records into every page.






                  share|improve this answer













                  I have around around 1.4Billion (1449000000) datapoints. So in uncompressed csv it's around 120GB.




                  At this scale, and given that an average track has 50 or more points, it's probably worth a shot. While storage itself is cheap these days, you're also improving physical I/O and buffer pool efficiency, squeezing more records into every page.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 28 at 19:27









                  mustaccio

                  8,95262136




                  8,95262136






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Database Administrators Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223681%2fhow-can-i-restructure-a-table-of-points-with-repeating-lengthy-text-ids%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Plaza Victoria

                      Puebla de Zaragoza

                      Musa