create cols/var named as a function of specific responses in previous cols/var
The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"
I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.
My responses data file is dfex
and the end result should be dfbw
.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).
The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".
I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).
Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?
# This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))
# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))
# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2
# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)
PS: Because of all the attention given to tidyverse
I thought this package collection might be suitable for my problem (?).
r
migrated from superuser.com Dec 2 at 16:57
This question came from our site for computer enthusiasts and power users.
add a comment |
The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"
I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.
My responses data file is dfex
and the end result should be dfbw
.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).
The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".
I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).
Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?
# This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))
# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))
# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2
# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)
PS: Because of all the attention given to tidyverse
I thought this package collection might be suitable for my problem (?).
r
migrated from superuser.com Dec 2 at 16:57
This question came from our site for computer enthusiasts and power users.
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
This question found a solution here: community.rstudio.com/t/…, usingtidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59
add a comment |
The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"
I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.
My responses data file is dfex
and the end result should be dfbw
.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).
The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".
I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).
Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?
# This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))
# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))
# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2
# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)
PS: Because of all the attention given to tidyverse
I thought this package collection might be suitable for my problem (?).
r
The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"
I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.
My responses data file is dfex
and the end result should be dfbw
.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).
The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".
I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).
Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?
# This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))
# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))
# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2
# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)
PS: Because of all the attention given to tidyverse
I thought this package collection might be suitable for my problem (?).
r
r
asked Dec 2 at 9:45
streamline
32
32
migrated from superuser.com Dec 2 at 16:57
This question came from our site for computer enthusiasts and power users.
migrated from superuser.com Dec 2 at 16:57
This question came from our site for computer enthusiasts and power users.
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
This question found a solution here: community.rstudio.com/t/…, usingtidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59
add a comment |
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
This question found a solution here: community.rstudio.com/t/…, usingtidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
This question found a solution here: community.rstudio.com/t/…, using
tidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.– streamline
Dec 2 at 15:09
This question found a solution here: community.rstudio.com/t/…, using
tidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.– streamline
Dec 2 at 15:09
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59
add a comment |
1 Answer
1
active
oldest
votes
The tidyverse
collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.
Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather
does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.
Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by
and mutate
.
library(tidyverse)
dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())
This gives you:
# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2
… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)
Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.
We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:
dfex.results = dfex.tidy %>%
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)
This gives you:
subject question best worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01
This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.
You may further tidy the data in order to get the best or worst alternative for a given subject and question:
dfex.results %>%
gather(type, alternative, c(best, worst))
# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53582527%2fcreate-cols-var-named-as-a-function-of-specific-responses-in-previous-cols-var%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The tidyverse
collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.
Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather
does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.
Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by
and mutate
.
library(tidyverse)
dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())
This gives you:
# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2
… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)
Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.
We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:
dfex.results = dfex.tidy %>%
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)
This gives you:
subject question best worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01
This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.
You may further tidy the data in order to get the best or worst alternative for a given subject and question:
dfex.results %>%
gather(type, alternative, c(best, worst))
# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03
add a comment |
The tidyverse
collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.
Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather
does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.
Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by
and mutate
.
library(tidyverse)
dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())
This gives you:
# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2
… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)
Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.
We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:
dfex.results = dfex.tidy %>%
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)
This gives you:
subject question best worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01
This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.
You may further tidy the data in order to get the best or worst alternative for a given subject and question:
dfex.results %>%
gather(type, alternative, c(best, worst))
# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03
add a comment |
The tidyverse
collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.
Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather
does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.
Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by
and mutate
.
library(tidyverse)
dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())
This gives you:
# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2
… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)
Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.
We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:
dfex.results = dfex.tidy %>%
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)
This gives you:
subject question best worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01
This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.
You may further tidy the data in order to get the best or worst alternative for a given subject and question:
dfex.results %>%
gather(type, alternative, c(best, worst))
# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03
The tidyverse
collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.
Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather
does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.
Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by
and mutate
.
library(tidyverse)
dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())
This gives you:
# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2
… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)
Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.
We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:
dfex.results = dfex.tidy %>%
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)
This gives you:
subject question best worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01
This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.
You may further tidy the data in order to get the best or worst alternative for a given subject and question:
dfex.results %>%
gather(type, alternative, c(best, worst))
# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03
answered Dec 2 at 15:56
slhck
19.4k2298145
19.4k2298145
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53582527%2fcreate-cols-var-named-as-a-function-of-specific-responses-in-previous-cols-var%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55
This question found a solution here: community.rstudio.com/t/…, using
tidyverse
. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.– streamline
Dec 2 at 15:09
Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31
@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58
No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59