create cols/var named as a function of specific responses in previous cols/var












0














The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"



I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.



My responses data file is dfex and the end result should be dfbw.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).



The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".



I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).



Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?



    # This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))

# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))

# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2

# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)


PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).










share|improve this question













migrated from superuser.com Dec 2 at 16:57


This question came from our site for computer enthusiasts and power users.















  • I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
    – streamline
    Dec 2 at 9:55










  • This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
    – streamline
    Dec 2 at 15:09










  • Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
    – slhck
    Dec 2 at 15:31










  • @slhck. I believe you might be right. I apologize for the misplacement.
    – streamline
    Dec 2 at 15:58










  • No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
    – slhck
    Dec 2 at 15:59
















0














The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"



I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.



My responses data file is dfex and the end result should be dfbw.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).



The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".



I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).



Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?



    # This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))

# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))

# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2

# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)


PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).










share|improve this question













migrated from superuser.com Dec 2 at 16:57


This question came from our site for computer enthusiasts and power users.















  • I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
    – streamline
    Dec 2 at 9:55










  • This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
    – streamline
    Dec 2 at 15:09










  • Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
    – slhck
    Dec 2 at 15:31










  • @slhck. I believe you might be right. I apologize for the misplacement.
    – streamline
    Dec 2 at 15:58










  • No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
    – slhck
    Dec 2 at 15:59














0












0








0







The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"



I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.



My responses data file is dfex and the end result should be dfbw.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).



The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".



I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).



Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?



    # This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))

# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))

# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2

# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)


PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).










share|improve this question













The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"



I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.



My responses data file is dfex and the end result should be dfbw.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).



The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".



I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).



Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?



    # This is my starting data frame
dfex <- data.frame(stringsAsFactors=FALSE,
q01_01 = c("most useful", "least useful", "least useful", NA),
q01_02 = c(NA, "most useful", "most useful", "most useful"),
q01_03 = c("least useful", NA, NA, "least useful"),
q01_04 = c(NA, NA, NA, NA),
q02_01 = c("least useful", "least useful", NA, "least useful"),
q02_02 = c(NA, NA, NA, NA),
q02_03 = c(NA, "most useful", "most useful", "most useful"),
q02_04 = c("most useful", NA, "least useful", NA))

# which now I'm processing manually
dfex <- dfex %>%
mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>%
mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>%
mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>%
mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>%
mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>%
mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>%
mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>%
mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%
mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>%
mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>%
mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>%
mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>%
mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>%
mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>%
mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>%
mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))

# create (manually) cols
dfex %>%
select(c(b01_01:b01_04)) %>%
rowSums(., na.rm = T) -> dfex$B1
dfex %>%
select(c(b02_01:b02_04)) %>%
rowSums(., na.rm = T) -> dfex$B2
dfex %>%
select(c(w01_01:w01_04)) %>%
rowSums(., na.rm = T) -> dfex$W1
dfex %>%
select(c(w02_01:w02_04)) %>%
rowSums(., na.rm = T) -> dfex$W2

# getting the desired BW datafile after selecting only cols of interest
dfbw <- dfex %>% select(B1:W2)


PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).







r






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 2 at 9:45









streamline

32




32




migrated from superuser.com Dec 2 at 16:57


This question came from our site for computer enthusiasts and power users.






migrated from superuser.com Dec 2 at 16:57


This question came from our site for computer enthusiasts and power users.














  • I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
    – streamline
    Dec 2 at 9:55










  • This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
    – streamline
    Dec 2 at 15:09










  • Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
    – slhck
    Dec 2 at 15:31










  • @slhck. I believe you might be right. I apologize for the misplacement.
    – streamline
    Dec 2 at 15:58










  • No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
    – slhck
    Dec 2 at 15:59


















  • I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
    – streamline
    Dec 2 at 9:55










  • This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
    – streamline
    Dec 2 at 15:09










  • Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
    – slhck
    Dec 2 at 15:31










  • @slhck. I believe you might be right. I apologize for the misplacement.
    – streamline
    Dec 2 at 15:58










  • No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
    – slhck
    Dec 2 at 15:59
















I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55




I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55












This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09




This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09












Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31




Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31












@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58




@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58












No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59




No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59












1 Answer
1






active

oldest

votes


















0














The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.



Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.



Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.



library(tidyverse)

dfex.tidy = dfex %>%
gather(question, answer, starts_with("q")) %>%
separate(question, into=c("question", "alternative")) %>%
mutate(question = str_replace_all(question, "q", "")) %>%
group_by(question, alternative) %>%
mutate(subject = row_number())


This gives you:



# A tibble: 32 x 4
# Groups: question, alternative [8]
question alternative answer subject
<chr> <chr> <chr> <int>
1 01 01 most useful 1
2 01 01 least useful 2
3 01 01 least useful 3
4 01 01 NA 4
5 01 02 NA 1
6 01 02 most useful 2
7 01 02 most useful 3
8 01 02 most useful 4
9 01 03 least useful 1
10 01 03 NA 2


… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)



Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.



We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:



dfex.results = dfex.tidy %>% 
filter(!is.na(answer)) %>%
# recode answer to 1 or 2
mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
group_by(subject, question) %>%
summarize(
best = first(alternative, order_by = answer),
worst = first(alternative, order_by = desc(answer))
)


This gives you:



  subject question best  worst
<int> <chr> <chr> <chr>
1 1 01 01 03
2 1 02 04 01
3 2 01 02 01
4 2 02 03 01
5 3 01 02 01
6 3 02 03 04
7 4 01 02 03
8 4 02 03 01


This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.



You may further tidy the data in order to get the best or worst alternative for a given subject and question:



dfex.results %>% 
gather(type, alternative, c(best, worst))

# A tibble: 16 x 4
# Groups: subject [4]
subject question type alternative
<int> <chr> <chr> <chr>
1 1 01 best 01
2 1 02 best 04
3 2 01 best 02
4 2 02 best 03





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53582527%2fcreate-cols-var-named-as-a-function-of-specific-responses-in-previous-cols-var%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.



    Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.



    Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.



    library(tidyverse)

    dfex.tidy = dfex %>%
    gather(question, answer, starts_with("q")) %>%
    separate(question, into=c("question", "alternative")) %>%
    mutate(question = str_replace_all(question, "q", "")) %>%
    group_by(question, alternative) %>%
    mutate(subject = row_number())


    This gives you:



    # A tibble: 32 x 4
    # Groups: question, alternative [8]
    question alternative answer subject
    <chr> <chr> <chr> <int>
    1 01 01 most useful 1
    2 01 01 least useful 2
    3 01 01 least useful 3
    4 01 01 NA 4
    5 01 02 NA 1
    6 01 02 most useful 2
    7 01 02 most useful 3
    8 01 02 most useful 4
    9 01 03 least useful 1
    10 01 03 NA 2


    … and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)



    Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.



    We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:



    dfex.results = dfex.tidy %>% 
    filter(!is.na(answer)) %>%
    # recode answer to 1 or 2
    mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
    group_by(subject, question) %>%
    summarize(
    best = first(alternative, order_by = answer),
    worst = first(alternative, order_by = desc(answer))
    )


    This gives you:



      subject question best  worst
    <int> <chr> <chr> <chr>
    1 1 01 01 03
    2 1 02 04 01
    3 2 01 02 01
    4 2 02 03 01
    5 3 01 02 01
    6 3 02 03 04
    7 4 01 02 03
    8 4 02 03 01


    This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.



    You may further tidy the data in order to get the best or worst alternative for a given subject and question:



    dfex.results %>% 
    gather(type, alternative, c(best, worst))

    # A tibble: 16 x 4
    # Groups: subject [4]
    subject question type alternative
    <int> <chr> <chr> <chr>
    1 1 01 best 01
    2 1 02 best 04
    3 2 01 best 02
    4 2 02 best 03





    share|improve this answer


























      0














      The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.



      Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.



      Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.



      library(tidyverse)

      dfex.tidy = dfex %>%
      gather(question, answer, starts_with("q")) %>%
      separate(question, into=c("question", "alternative")) %>%
      mutate(question = str_replace_all(question, "q", "")) %>%
      group_by(question, alternative) %>%
      mutate(subject = row_number())


      This gives you:



      # A tibble: 32 x 4
      # Groups: question, alternative [8]
      question alternative answer subject
      <chr> <chr> <chr> <int>
      1 01 01 most useful 1
      2 01 01 least useful 2
      3 01 01 least useful 3
      4 01 01 NA 4
      5 01 02 NA 1
      6 01 02 most useful 2
      7 01 02 most useful 3
      8 01 02 most useful 4
      9 01 03 least useful 1
      10 01 03 NA 2


      … and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)



      Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.



      We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:



      dfex.results = dfex.tidy %>% 
      filter(!is.na(answer)) %>%
      # recode answer to 1 or 2
      mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
      group_by(subject, question) %>%
      summarize(
      best = first(alternative, order_by = answer),
      worst = first(alternative, order_by = desc(answer))
      )


      This gives you:



        subject question best  worst
      <int> <chr> <chr> <chr>
      1 1 01 01 03
      2 1 02 04 01
      3 2 01 02 01
      4 2 02 03 01
      5 3 01 02 01
      6 3 02 03 04
      7 4 01 02 03
      8 4 02 03 01


      This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.



      You may further tidy the data in order to get the best or worst alternative for a given subject and question:



      dfex.results %>% 
      gather(type, alternative, c(best, worst))

      # A tibble: 16 x 4
      # Groups: subject [4]
      subject question type alternative
      <int> <chr> <chr> <chr>
      1 1 01 best 01
      2 1 02 best 04
      3 2 01 best 02
      4 2 02 best 03





      share|improve this answer
























        0












        0








        0






        The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.



        Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.



        Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.



        library(tidyverse)

        dfex.tidy = dfex %>%
        gather(question, answer, starts_with("q")) %>%
        separate(question, into=c("question", "alternative")) %>%
        mutate(question = str_replace_all(question, "q", "")) %>%
        group_by(question, alternative) %>%
        mutate(subject = row_number())


        This gives you:



        # A tibble: 32 x 4
        # Groups: question, alternative [8]
        question alternative answer subject
        <chr> <chr> <chr> <int>
        1 01 01 most useful 1
        2 01 01 least useful 2
        3 01 01 least useful 3
        4 01 01 NA 4
        5 01 02 NA 1
        6 01 02 most useful 2
        7 01 02 most useful 3
        8 01 02 most useful 4
        9 01 03 least useful 1
        10 01 03 NA 2


        … and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)



        Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.



        We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:



        dfex.results = dfex.tidy %>% 
        filter(!is.na(answer)) %>%
        # recode answer to 1 or 2
        mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
        group_by(subject, question) %>%
        summarize(
        best = first(alternative, order_by = answer),
        worst = first(alternative, order_by = desc(answer))
        )


        This gives you:



          subject question best  worst
        <int> <chr> <chr> <chr>
        1 1 01 01 03
        2 1 02 04 01
        3 2 01 02 01
        4 2 02 03 01
        5 3 01 02 01
        6 3 02 03 04
        7 4 01 02 03
        8 4 02 03 01


        This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.



        You may further tidy the data in order to get the best or worst alternative for a given subject and question:



        dfex.results %>% 
        gather(type, alternative, c(best, worst))

        # A tibble: 16 x 4
        # Groups: subject [4]
        subject question type alternative
        <int> <chr> <chr> <chr>
        1 1 01 best 01
        2 1 02 best 04
        3 2 01 best 02
        4 2 02 best 03





        share|improve this answer












        The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.



        Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.



        Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.



        library(tidyverse)

        dfex.tidy = dfex %>%
        gather(question, answer, starts_with("q")) %>%
        separate(question, into=c("question", "alternative")) %>%
        mutate(question = str_replace_all(question, "q", "")) %>%
        group_by(question, alternative) %>%
        mutate(subject = row_number())


        This gives you:



        # A tibble: 32 x 4
        # Groups: question, alternative [8]
        question alternative answer subject
        <chr> <chr> <chr> <int>
        1 01 01 most useful 1
        2 01 01 least useful 2
        3 01 01 least useful 3
        4 01 01 NA 4
        5 01 02 NA 1
        6 01 02 most useful 2
        7 01 02 most useful 3
        8 01 02 most useful 4
        9 01 03 least useful 1
        10 01 03 NA 2


        … and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)



        Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.



        We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:



        dfex.results = dfex.tidy %>% 
        filter(!is.na(answer)) %>%
        # recode answer to 1 or 2
        mutate(answer = ifelse(answer == "most useful", 1, 2)) %>%
        group_by(subject, question) %>%
        summarize(
        best = first(alternative, order_by = answer),
        worst = first(alternative, order_by = desc(answer))
        )


        This gives you:



          subject question best  worst
        <int> <chr> <chr> <chr>
        1 1 01 01 03
        2 1 02 04 01
        3 2 01 02 01
        4 2 02 03 01
        5 3 01 02 01
        6 3 02 03 04
        7 4 01 02 03
        8 4 02 03 01


        This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.



        You may further tidy the data in order to get the best or worst alternative for a given subject and question:



        dfex.results %>% 
        gather(type, alternative, c(best, worst))

        # A tibble: 16 x 4
        # Groups: subject [4]
        subject question type alternative
        <int> <chr> <chr> <chr>
        1 1 01 best 01
        2 1 02 best 04
        3 2 01 best 02
        4 2 02 best 03






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 2 at 15:56









        slhck

        19.4k2298145




        19.4k2298145






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53582527%2fcreate-cols-var-named-as-a-function-of-specific-responses-in-previous-cols-var%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Plaza Victoria

            In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

            How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...