Reserved de-dupe rules












3















I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










share|improve this question



























    3















    I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










    share|improve this question

























      3












      3








      3


      1






      I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










      share|improve this question














      I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.







      duplicate-contacts






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 4 at 17:09









      Mick KahnMick Kahn

      641316




      641316






















          2 Answers
          2






          active

          oldest

          votes


















          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





                      SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer


























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00













          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            Apr 5 at 22:26



















          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.





          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer


























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "605"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





                      SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer


























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00













          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            Apr 5 at 22:26
















          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





                      SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer


























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00













          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            Apr 5 at 22:26














          4












          4








          4







          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





                      SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer















          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





                      SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 5 at 16:20

























          answered Apr 5 at 2:58









          Jon G - Megaphone TechJon G - Megaphone Tech

          27.6k11872




          27.6k11872













          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00













          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            Apr 5 at 22:26



















          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00













          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            Apr 5 at 22:26

















          Thanks. Good lord...

          – Demerit
          Apr 5 at 4:16





          Thanks. Good lord...

          – Demerit
          Apr 5 at 4:16













          Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

          – Mick Kahn
          Apr 5 at 8:45





          Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

          – Mick Kahn
          Apr 5 at 8:45













          I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

          – Mick Kahn
          Apr 5 at 14:00







          I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

          – Mick Kahn
          Apr 5 at 14:00















          @MickKahn I just updated my answer, hopefully it makes things clearer!

          – Jon G - Megaphone Tech
          Apr 5 at 16:20





          @MickKahn I just updated my answer, hopefully it makes things clearer!

          – Jon G - Megaphone Tech
          Apr 5 at 16:20













          Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

          – Mick Kahn
          Apr 5 at 22:26





          Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

          – Mick Kahn
          Apr 5 at 22:26











          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.





          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer


























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38
















          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.





          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer


























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38














          2












          2








          2







          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.





          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer















          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.





          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 5 at 12:29

























          answered Apr 4 at 17:35









          DemeritDemerit

          4,2312622




          4,2312622













          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38



















          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38

















          Ah you're right. I'll update answer.

          – Demerit
          Apr 4 at 18:19





          Ah you're right. I'll update answer.

          – Demerit
          Apr 4 at 18:19













          Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

          – Mick Kahn
          Apr 4 at 20:22





          Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

          – Mick Kahn
          Apr 4 at 20:22













          I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

          – Mick Kahn
          Apr 5 at 13:38





          I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

          – Mick Kahn
          Apr 5 at 13:38


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to CiviCRM Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Plaza Victoria

          Puebla de Zaragoza

          Musa