Reserved de-dupe rules
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
add a comment |
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
add a comment |
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.
duplicate-contacts
duplicate-contacts
asked Apr 4 at 17:09
Mick KahnMick Kahn
641316
641316
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name
field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule
for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder
is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal
function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold
- that is, the threshold in civicrm_rule_group
. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe
rule has a SQL statement you can look at which gives the same results as this rule:
However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "605"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name
field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule
for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder
is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal
function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold
- that is, the threshold in civicrm_rule_group
. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe
rule has a SQL statement you can look at which gives the same results as this rule:
However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
add a comment |
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name
field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule
for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder
is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal
function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold
- that is, the threshold in civicrm_rule_group
. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe
rule has a SQL statement you can look at which gives the same results as this rule:
However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
add a comment |
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name
field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule
for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder
is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal
function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold
- that is, the threshold in civicrm_rule_group
. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe
rule has a SQL statement you can look at which gives the same results as this rule:
However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!
If a RuleGroup has a value in the name
field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule
for those RuleGroups are holdovers from before that system existed, and editing them has no effect.
"Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.
Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder
is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.
UPDATE: More clarification.
To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.
The internal
function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
JOIN civicrm_contact as contact2 ON
contact1.first_name = contact2.first_name AND
contact1.last_name = contact2.last_name
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'"
First, note that the weight is set to $rg->threshold
- that is, the threshold in civicrm_rule_group
. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.
To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.
That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe
rule has a SQL statement you can look at which gives the same results as this rule:
However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.
edited Apr 5 at 16:20
answered Apr 5 at 2:58
Jon G - Megaphone TechJon G - Megaphone Tech
27.6k11872
27.6k11872
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
add a comment |
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Thanks. Good lord...
– Demerit
Apr 5 at 4:16
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.
– Mick Kahn
Apr 5 at 8:45
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.
– Mick Kahn
Apr 5 at 14:00
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
@MickKahn I just updated my answer, hopefully it makes things clearer!
– Jon G - Megaphone Tech
Apr 5 at 16:20
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.
– Mick Kahn
Apr 5 at 22:26
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
add a comment |
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.
If you have access to the database type
SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;
which will give you a table which isn't pretty but is mostly understandable.
edited Apr 5 at 12:29
answered Apr 4 at 17:35
DemeritDemerit
4,2312622
4,2312622
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
add a comment |
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Ah you're right. I'll update answer.
– Demerit
Apr 4 at 18:19
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer
– Mick Kahn
Apr 4 at 20:22
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.
– Mick Kahn
Apr 5 at 13:38
add a comment |
Thanks for contributing an answer to CiviCRM Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown