Probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$ with at least...
$begingroup$
For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.
The more general definition of the problem is this:
What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.
Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.
Is the problem tractable analytically? If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?
Your help is highly appreciated! Thanks a lot in advance!
combinatorics random-variables
$endgroup$
add a comment |
$begingroup$
For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.
The more general definition of the problem is this:
What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.
Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.
Is the problem tractable analytically? If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?
Your help is highly appreciated! Thanks a lot in advance!
combinatorics random-variables
$endgroup$
add a comment |
$begingroup$
For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.
The more general definition of the problem is this:
What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.
Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.
Is the problem tractable analytically? If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?
Your help is highly appreciated! Thanks a lot in advance!
combinatorics random-variables
$endgroup$
For a somwehat simple version of the problem, imagine a group of ten people ($n=10$), for which we observe two binary attributes ($d= 2$). One of the attributes tells us if any given person wears glasses ($d_1$) the other if any given person wears a hat ($d_2$). We know that in total $5$ people wear hats and $6$ people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size $4$ ($k= 4$) such that at least three people in the subset wear a hat ($s_1 = 3$) AND at least four people wear glasses ($s_2 = 4$)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size $k$ with the above properties. I can't figure, though, how to calculate the probabilities.
The more general definition of the problem is this:
What is the probability of finding a $k$-subset in a $d$-dimensional random binary array of length $n$, such that this subset has at least $s_1$ ones in dimension $1$, $s_2$ ones in dimension $2$, ..., $s_d$ ones in dimension $d$. The total number of ones per dimension ($ts_1$ to $ts_d$) are fixed.
Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.
Is the problem tractable analytically? If yes does the solution scale to larger $d$, $k$ and $n$ when solved algorithmically?
Your help is highly appreciated! Thanks a lot in advance!
combinatorics random-variables
combinatorics random-variables
edited Dec 8 '18 at 7:18
Brahadeesh
6,35442363
6,35442363
asked Dec 7 '18 at 23:56
derpetermannderpetermann
11
11
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
$newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
$newcommand{ksubsets}{binom{[N]}{k}}$
I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...
Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:
$$ X_{n,d} :=
begin{cases}
1 & text{Feature $d$ present in individual $n$} \
0 & text{o/w}
end{cases}
$$
We know the total number of occurrences of each feature, given by
$$C_d := sum_{n in [N]} X_{n,d}$$
Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?
$$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$
Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):
$$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
$$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$
but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3030523%2fprobability-of-finding-a-k-subset-in-a-d-dimensional-random-binary-array-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
$newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
$newcommand{ksubsets}{binom{[N]}{k}}$
I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...
Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:
$$ X_{n,d} :=
begin{cases}
1 & text{Feature $d$ present in individual $n$} \
0 & text{o/w}
end{cases}
$$
We know the total number of occurrences of each feature, given by
$$C_d := sum_{n in [N]} X_{n,d}$$
Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?
$$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$
Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):
$$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
$$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$
but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.
$endgroup$
add a comment |
$begingroup$
$newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
$newcommand{ksubsets}{binom{[N]}{k}}$
I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...
Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:
$$ X_{n,d} :=
begin{cases}
1 & text{Feature $d$ present in individual $n$} \
0 & text{o/w}
end{cases}
$$
We know the total number of occurrences of each feature, given by
$$C_d := sum_{n in [N]} X_{n,d}$$
Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?
$$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$
Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):
$$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
$$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$
but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.
$endgroup$
add a comment |
$begingroup$
$newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
$newcommand{ksubsets}{binom{[N]}{k}}$
I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...
Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:
$$ X_{n,d} :=
begin{cases}
1 & text{Feature $d$ present in individual $n$} \
0 & text{o/w}
end{cases}
$$
We know the total number of occurrences of each feature, given by
$$C_d := sum_{n in [N]} X_{n,d}$$
Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?
$$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$
Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):
$$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
$$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$
but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.
$endgroup$
$newcommand{prob}[1]{text{Pr}mathopen{}left[#1right]}$
$newcommand{ksubsets}{binom{[N]}{k}}$
I am a colleague of @derpetermann and wanted to share a attempt to solving this, in the hope that it clarifies things and from there on it may be easier for others to solve it. So this is the way I approached the problem...
Among a population of $N$ individuals, every individual $n in [N]$ (I use the short notation [N] for ${1,2,..,N}$) is described by absence or presence of $D$ features:
$$ X_{n,d} :=
begin{cases}
1 & text{Feature $d$ present in individual $n$} \
0 & text{o/w}
end{cases}
$$
We know the total number of occurrences of each feature, given by
$$C_d := sum_{n in [N]} X_{n,d}$$
Otherwise, for a fixed $d$, the $X_{n,d}$ are equally distributed. Now we want to know: What is the probability of finding a subset $S$ of $k$ individuals, such that every feature $d$ is present in at least $c_d$ individuals in $S$ (for given $c_d in [k]$). I.e. how can we compute (or) estimate the following probability?
$$prob{bigcup_{S in binom{[N]}{k}} bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big) }$$
Note: While the $X_{cdot,d}$ for different features are independent, the union is not over independent (nor disjoint) events. And that is where we got stuck. We can rewrite the inner part (the probability that a fixed $k$-subset $S$ satisfies our condition):
$$Y_S := bigcap_{d in [D]} Big( sum_{n in S} X_{n,d} geq c_d Big)$$
$$prob{Y_S} = prod_{d in [D]} frac{sum_{c=c_d}^{min(k, C_d)} binom{k}{c} binom{N-k}{C_d - c}}{binom{N}{C_d}}$$
but have no idea of how to compute the union $prob{bigcup_{S in binom{[N]}{k}} Y_S}$. Even a useful bound upper bound on this probability would be highly appreciated.
answered Dec 20 '18 at 16:40
NicoNico
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3030523%2fprobability-of-finding-a-k-subset-in-a-d-dimensional-random-binary-array-of%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown