create cols/var named as a function of specific responses in previous cols/var

The long title would've been "create cols/var 1) named as a function of specific responses in previous cols/var and 2) that contain new values as function of previous responses positions in an array (best-worst scaling application)"

I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.

My responses data file is dfex and the end result should be dfbw.
Basically, the example starting datafile is self-explanatory, i.e., there are two questions q01 and q02, and the respondents indicate which of the available alternatives to these questions are "best" or "worst" ("most useful" and "least useful", here).

The final data file, suited for BWS analysis, should indicate for each question, which alternative, out of four possible per each question, was chosen. For instance, in case 1, question 1, the respondent indicated 1st alternative as the "most useful" and 3rd alternative as the "least useful".

I currently doing this via some seriously stupid manual conversions of scores but I'm certain that much more elegant solutions are available (I'm just not that advanced in R and I'm under a lot of pressure).

Could you, please, point me in the right direction (i.e., how to automate this process and make it more smoothly) and, if possible, please indicate your reasons for choosing using a certain package or particular approach versus another (e.g., a custom function versus a combination of functions in a package/s)?

    # This is my starting data frame

    dfex <- data.frame(stringsAsFactors=FALSE,

          q01_01 = c("most useful", "least useful", "least useful", NA),

          q01_02 = c(NA, "most useful", "most useful", "most useful"),

          q01_03 = c("least useful", NA, NA, "least useful"),

          q01_04 = c(NA, NA, NA, NA),

          q02_01 = c("least useful", "least useful", NA, "least useful"),

          q02_02 = c(NA, NA, NA, NA),

          q02_03 = c(NA, "most useful", "most useful", "most useful"),

          q02_04 = c("most useful", NA, "least useful", NA))



     # which now I'm processing manually

    dfex <- dfex %>% 

      mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>% 

      mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>% 

      mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>% 

      mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>% 

      mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>% 

      mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>% 

      mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>% 

      mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%

      mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>% 

      mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>% 

      mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>% 

      mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>% 

      mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>% 

      mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>% 

      mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>% 

      mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))



    # create (manually) cols 

    dfex %>% 

      select(c(b01_01:b01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B1

    dfex %>% 

      select(c(b02_01:b02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B2

    dfex %>% 

      select(c(w01_01:w01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W1

    dfex %>% 

      select(c(w02_01:w02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W2



    # getting the desired BW datafile after selecting only cols of interest

    dfbw <- dfex %>% select(B1:W2)

PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).

asked Dec 2 at 9:45

streamline

migrated from superuser.com Dec 2 at 16:57

This question came from our site for computer enthusiasts and power users.

I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55

This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09

Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31

@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58

No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59

add a comment |

I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.

    # This is my starting data frame

    dfex <- data.frame(stringsAsFactors=FALSE,

          q01_01 = c("most useful", "least useful", "least useful", NA),

          q01_02 = c(NA, "most useful", "most useful", "most useful"),

          q01_03 = c("least useful", NA, NA, "least useful"),

          q01_04 = c(NA, NA, NA, NA),

          q02_01 = c("least useful", "least useful", NA, "least useful"),

          q02_02 = c(NA, NA, NA, NA),

          q02_03 = c(NA, "most useful", "most useful", "most useful"),

          q02_04 = c("most useful", NA, "least useful", NA))



     # which now I'm processing manually

    dfex <- dfex %>% 

      mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>% 

      mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>% 

      mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>% 

      mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>% 

      mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>% 

      mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>% 

      mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>% 

      mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%

      mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>% 

      mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>% 

      mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>% 

      mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>% 

      mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>% 

      mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>% 

      mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>% 

      mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))



    # create (manually) cols 

    dfex %>% 

      select(c(b01_01:b01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B1

    dfex %>% 

      select(c(b02_01:b02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B2

    dfex %>% 

      select(c(w01_01:w01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W1

    dfex %>% 

      select(c(w02_01:w02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W2



    # getting the desired BW datafile after selecting only cols of interest

    dfbw <- dfex %>% select(B1:W2)

PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).

asked Dec 2 at 9:45

streamline

migrated from superuser.com Dec 2 at 16:57

This question came from our site for computer enthusiasts and power users.

I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55

This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09

Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31

@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58

No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59

add a comment |

I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.

    # This is my starting data frame

    dfex <- data.frame(stringsAsFactors=FALSE,

          q01_01 = c("most useful", "least useful", "least useful", NA),

          q01_02 = c(NA, "most useful", "most useful", "most useful"),

          q01_03 = c("least useful", NA, NA, "least useful"),

          q01_04 = c(NA, NA, NA, NA),

          q02_01 = c("least useful", "least useful", NA, "least useful"),

          q02_02 = c(NA, NA, NA, NA),

          q02_03 = c(NA, "most useful", "most useful", "most useful"),

          q02_04 = c("most useful", NA, "least useful", NA))



     # which now I'm processing manually

    dfex <- dfex %>% 

      mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>% 

      mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>% 

      mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>% 

      mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>% 

      mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>% 

      mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>% 

      mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>% 

      mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%

      mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>% 

      mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>% 

      mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>% 

      mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>% 

      mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>% 

      mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>% 

      mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>% 

      mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))



    # create (manually) cols 

    dfex %>% 

      select(c(b01_01:b01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B1

    dfex %>% 

      select(c(b02_01:b02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B2

    dfex %>% 

      select(c(w01_01:w01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W1

    dfex %>% 

      select(c(w02_01:w02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W2



    # getting the desired BW datafile after selecting only cols of interest

    dfbw <- dfex %>% select(B1:W2)

PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).

asked Dec 2 at 9:45

streamline

I'm performing some best-worst scaling (maxdiff) analyses and I need to convert the data from the raw responses data file into a data file suitable for BWS analyses.

    # This is my starting data frame

    dfex <- data.frame(stringsAsFactors=FALSE,

          q01_01 = c("most useful", "least useful", "least useful", NA),

          q01_02 = c(NA, "most useful", "most useful", "most useful"),

          q01_03 = c("least useful", NA, NA, "least useful"),

          q01_04 = c(NA, NA, NA, NA),

          q02_01 = c("least useful", "least useful", NA, "least useful"),

          q02_02 = c(NA, NA, NA, NA),

          q02_03 = c(NA, "most useful", "most useful", "most useful"),

          q02_04 = c("most useful", NA, "least useful", NA))



     # which now I'm processing manually

    dfex <- dfex %>% 

      mutate(b01_01 = case_when(q01_01 == "most useful" ~ 1)) %>% 

      mutate(b01_02 = case_when(q01_02 == "most useful" ~ 2)) %>% 

      mutate(b01_03 = case_when(q01_03 == "most useful" ~ 3)) %>% 

      mutate(b01_04 = case_when(q01_04 == "most useful" ~ 4)) %>% 

      mutate(b02_01 = case_when(q02_01 == "most useful" ~ 1)) %>% 

      mutate(b02_02 = case_when(q02_02 == "most useful" ~ 2)) %>% 

      mutate(b02_03 = case_when(q02_03 == "most useful" ~ 3)) %>% 

      mutate(b02_04 = case_when(q02_04 == "most useful" ~ 4)) %>%

      mutate(w01_01 = case_when(q01_01 == "least useful" ~ 1)) %>% 

      mutate(w01_02 = case_when(q01_02 == "least useful" ~ 2)) %>% 

      mutate(w01_03 = case_when(q01_03 == "least useful" ~ 3)) %>% 

      mutate(w01_04 = case_when(q01_04 == "least useful" ~ 4)) %>% 

      mutate(w02_01 = case_when(q02_01 == "least useful" ~ 1)) %>% 

      mutate(w02_02 = case_when(q02_02 == "least useful" ~ 2)) %>% 

      mutate(w02_03 = case_when(q02_03 == "least useful" ~ 3)) %>% 

      mutate(w02_04 = case_when(q02_04 == "least useful" ~ 4))



    # create (manually) cols 

    dfex %>% 

      select(c(b01_01:b01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B1

    dfex %>% 

      select(c(b02_01:b02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$B2

    dfex %>% 

      select(c(w01_01:w01_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W1

    dfex %>% 

      select(c(w02_01:w02_04)) %>% 

      rowSums(., na.rm = T) -> dfex$W2



    # getting the desired BW datafile after selecting only cols of interest

    dfbw <- dfex %>% select(B1:W2)

PS: Because of all the attention given to tidyverse I thought this package collection might be suitable for my problem (?).

asked Dec 2 at 9:45

streamline

asked Dec 2 at 9:45

streamline

asked Dec 2 at 9:45

streamline

asked Dec 2 at 9:45

streamline

asked Dec 2 at 9:45

streamline

migrated from superuser.com Dec 2 at 16:57

This question came from our site for computer enthusiasts and power users.

migrated from superuser.com Dec 2 at 16:57

This question came from our site for computer enthusiasts and power users.

I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55

This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09

Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31

@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58

No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59

add a comment |

I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55

This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09

Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31

@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58

No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59

I suspect that the solution may imply melting and then recasting back the dataframe, but I couldn't figure it out, yet.
– streamline
Dec 2 at 9:55

This question found a solution here: community.rstudio.com/t/…, using tidyverse. I'm not sure if I should leave it here, for other possible solutions or not. Anyway, for that particular solution, I believe it should be acknowledged.
– streamline
Dec 2 at 15:09

Why ask here and not on Stack Overflow? Voted to migrate. You may answer your own question though.
– slhck
Dec 2 at 15:31

@slhck. I believe you might be right. I apologize for the misplacement.
– streamline
Dec 2 at 15:58

No worries. You can flag your question for attention to have it migrated. In the meantime I provided an answer that hopefully should make it easier to handle your data.
– slhck
Dec 2 at 15:59

add a comment |

1 Answer
1

active

oldest

votes

The tidyverse collection of packages is great, once you've learned how to format your data, and that it's usually never necessary to a) write loops of any kind or b) do more work coding once you get more data.

Start by making your data frame "tidy". Each observation should be one row, so you want a column for the question ID, a column for the alternative, and one for the answer. This is what gather does: it makes your "wide" data "long". We further remove the "q" from the question ID, as it's not needed.

Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.

library(tidyverse)



dfex.tidy = dfex %>% 

  gather(question, answer, starts_with("q")) %>% 

  separate(question, into=c("question", "alternative")) %>% 

  mutate(question = str_replace_all(question, "q", "")) %>% 

  group_by(question, alternative) %>% 

  mutate(subject = row_number())

This gives you:

# A tibble: 32 x 4

# Groups:   question, alternative [8]

   question alternative answer       subject

   <chr>    <chr>       <chr>          <int>

 1 01       01          most useful        1

 2 01       01          least useful       2

 3 01       01          least useful       3

 4 01       01          NA                 4

 5 01       02          NA                 1

 6 01       02          most useful        2

 7 01       02          most useful        3

 8 01       02          most useful        4

 9 01       03          least useful       1

10 01       03          NA                 2

… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)

Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.

We then group by subject and question. The chosen "most useful" alternative will be the first alternative when ordering the results by the answer value (1 or 2). Vice-versa, the "least useful" alternative will be the first alternative when ordering the results by the descending answer value:

dfex.results = dfex.tidy %>% 

  filter(!is.na(answer)) %>% 

  # recode answer to 1 or 2

  mutate(answer = ifelse(answer == "most useful", 1, 2)) %>% 

  group_by(subject, question) %>% 

  summarize(

    best = first(alternative, order_by = answer),

    worst = first(alternative, order_by = desc(answer))

  )

This gives you:

  subject question best  worst

    <int> <chr>    <chr> <chr>

1       1 01       01    03   

2       1 02       04    01   

3       2 01       02    01   

4       2 02       03    01   

5       3 01       02    01   

6       3 02       03    04   

7       4 01       02    03   

8       4 02       03    01

This data should be easy to work with and scales for any number of questions, alternatives, and answers, as long as your input data stays in the same format. That said, you should really try to collect your input data in a tidy manner already, if possible.

You may further tidy the data in order to get the best or worst alternative for a given subject and question:

dfex.results %>% 

  gather(type, alternative, c(best, worst))



# A tibble: 16 x 4

# Groups:   subject [4]

   subject question type  alternative

     <int> <chr>    <chr> <chr>      

 1       1 01       best  01         

 2       1 02       best  04         

 3       2 01       best  02         

 4       2 02       best  03

answered Dec 2 at 15:56

slhck

19.4k2298145

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53582527%2fcreate-cols-var-named-as-a-function-of-specific-responses-in-previous-cols-var%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.

library(tidyverse)



dfex.tidy = dfex %>% 

  gather(question, answer, starts_with("q")) %>% 

  separate(question, into=c("question", "alternative")) %>% 

  mutate(question = str_replace_all(question, "q", "")) %>% 

  group_by(question, alternative) %>% 

  mutate(subject = row_number())

This gives you:

# A tibble: 32 x 4

# Groups:   question, alternative [8]

   question alternative answer       subject

   <chr>    <chr>       <chr>          <int>

 1 01       01          most useful        1

 2 01       01          least useful       2

 3 01       01          least useful       3

 4 01       01          NA                 4

 5 01       02          NA                 1

 6 01       02          most useful        2

 7 01       02          most useful        3

 8 01       02          most useful        4

 9 01       03          least useful       1

10 01       03          NA                 2

… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)

Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.

dfex.results = dfex.tidy %>% 

  filter(!is.na(answer)) %>% 

  # recode answer to 1 or 2

  mutate(answer = ifelse(answer == "most useful", 1, 2)) %>% 

  group_by(subject, question) %>% 

  summarize(

    best = first(alternative, order_by = answer),

    worst = first(alternative, order_by = desc(answer))

  )

This gives you:

  subject question best  worst

    <int> <chr>    <chr> <chr>

1       1 01       01    03   

2       1 02       04    01   

3       2 01       02    01   

4       2 02       03    01   

5       3 01       02    01   

6       3 02       03    04   

7       4 01       02    03   

8       4 02       03    01

You may further tidy the data in order to get the best or worst alternative for a given subject and question:

dfex.results %>% 

  gather(type, alternative, c(best, worst))



# A tibble: 16 x 4

# Groups:   subject [4]

   subject question type  alternative

     <int> <chr>    <chr> <chr>      

 1       1 01       best  01         

 2       1 02       best  04         

 3       2 01       best  02         

 4       2 02       best  03

answered Dec 2 at 15:56

slhck

19.4k2298145

add a comment |

Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.

library(tidyverse)



dfex.tidy = dfex %>% 

  gather(question, answer, starts_with("q")) %>% 

  separate(question, into=c("question", "alternative")) %>% 

  mutate(question = str_replace_all(question, "q", "")) %>% 

  group_by(question, alternative) %>% 

  mutate(subject = row_number())

This gives you:

# A tibble: 32 x 4

# Groups:   question, alternative [8]

   question alternative answer       subject

   <chr>    <chr>       <chr>          <int>

 1 01       01          most useful        1

 2 01       01          least useful       2

 3 01       01          least useful       3

 4 01       01          NA                 4

 5 01       02          NA                 1

 6 01       02          most useful        2

 7 01       02          most useful        3

 8 01       02          most useful        4

 9 01       03          least useful       1

10 01       03          NA                 2

… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)

Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.

dfex.results = dfex.tidy %>% 

  filter(!is.na(answer)) %>% 

  # recode answer to 1 or 2

  mutate(answer = ifelse(answer == "most useful", 1, 2)) %>% 

  group_by(subject, question) %>% 

  summarize(

    best = first(alternative, order_by = answer),

    worst = first(alternative, order_by = desc(answer))

  )

This gives you:

  subject question best  worst

    <int> <chr>    <chr> <chr>

1       1 01       01    03   

2       1 02       04    01   

3       2 01       02    01   

4       2 02       03    01   

5       3 01       02    01   

6       3 02       03    04   

7       4 01       02    03   

8       4 02       03    01

You may further tidy the data in order to get the best or worst alternative for a given subject and question:

dfex.results %>% 

  gather(type, alternative, c(best, worst))



# A tibble: 16 x 4

# Groups:   subject [4]

   subject question type  alternative

     <int> <chr>    <chr> <chr>      

 1       1 01       best  01         

 2       1 02       best  04         

 3       2 01       best  02         

 4       2 02       best  03

answered Dec 2 at 15:56

slhck

19.4k2298145

add a comment |

Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.

library(tidyverse)



dfex.tidy = dfex %>% 

  gather(question, answer, starts_with("q")) %>% 

  separate(question, into=c("question", "alternative")) %>% 

  mutate(question = str_replace_all(question, "q", "")) %>% 

  group_by(question, alternative) %>% 

  mutate(subject = row_number())

This gives you:

# A tibble: 32 x 4

# Groups:   question, alternative [8]

   question alternative answer       subject

   <chr>    <chr>       <chr>          <int>

 1 01       01          most useful        1

 2 01       01          least useful       2

 3 01       01          least useful       3

 4 01       01          NA                 4

 5 01       02          NA                 1

 6 01       02          most useful        2

 7 01       02          most useful        3

 8 01       02          most useful        4

 9 01       03          least useful       1

10 01       03          NA                 2

… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)

Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.

dfex.results = dfex.tidy %>% 

  filter(!is.na(answer)) %>% 

  # recode answer to 1 or 2

  mutate(answer = ifelse(answer == "most useful", 1, 2)) %>% 

  group_by(subject, question) %>% 

  summarize(

    best = first(alternative, order_by = answer),

    worst = first(alternative, order_by = desc(answer))

  )

This gives you:

  subject question best  worst

    <int> <chr>    <chr> <chr>

1       1 01       01    03   

2       1 02       04    01   

3       2 01       02    01   

4       2 02       03    01   

5       3 01       02    01   

6       3 02       03    04   

7       4 01       02    03   

8       4 02       03    01

You may further tidy the data in order to get the best or worst alternative for a given subject and question:

dfex.results %>% 

  gather(type, alternative, c(best, worst))



# A tibble: 16 x 4

# Groups:   subject [4]

   subject question type  alternative

     <int> <chr>    <chr> <chr>      

 1       1 01       best  01         

 2       1 02       best  04         

 3       2 01       best  02         

 4       2 02       best  03

answered Dec 2 at 15:56

slhck

19.4k2298145

Also, add a column that gives you the subject ID – it's basically the row number in each group of question/alternatives. This is done with a combination of group_by and mutate.

library(tidyverse)



dfex.tidy = dfex %>% 

  gather(question, answer, starts_with("q")) %>% 

  separate(question, into=c("question", "alternative")) %>% 

  mutate(question = str_replace_all(question, "q", "")) %>% 

  group_by(question, alternative) %>% 

  mutate(subject = row_number())

This gives you:

# A tibble: 32 x 4

# Groups:   question, alternative [8]

   question alternative answer       subject

   <chr>    <chr>       <chr>          <int>

 1 01       01          most useful        1

 2 01       01          least useful       2

 3 01       01          least useful       3

 4 01       01          NA                 4

 5 01       02          NA                 1

 6 01       02          most useful        2

 7 01       02          most useful        3

 8 01       02          most useful        4

 9 01       03          least useful       1

10 01       03          NA                 2

… and so on. As you can see, each answer is a row. (In fact, non-answers are also part of this data frame but it's not required to remove those rows.)

Now, we filter out all missing answers, and re-code them from "most/least useful" to 1 and 2 so we know their intrinsic order.

dfex.results = dfex.tidy %>% 

  filter(!is.na(answer)) %>% 

  # recode answer to 1 or 2

  mutate(answer = ifelse(answer == "most useful", 1, 2)) %>% 

  group_by(subject, question) %>% 

  summarize(

    best = first(alternative, order_by = answer),

    worst = first(alternative, order_by = desc(answer))

  )

This gives you:

  subject question best  worst

    <int> <chr>    <chr> <chr>

1       1 01       01    03   

2       1 02       04    01   

3       2 01       02    01   

4       2 02       03    01   

5       3 01       02    01   

6       3 02       03    04   

7       4 01       02    03   

8       4 02       03    01

You may further tidy the data in order to get the best or worst alternative for a given subject and question:

dfex.results %>% 

  gather(type, alternative, c(best, worst))



# A tibble: 16 x 4

# Groups:   subject [4]

   subject question type  alternative

     <int> <chr>    <chr> <chr>      

 1       1 01       best  01         

 2       1 02       best  04         

 3       2 01       best  02         

 4       2 02       best  03

answered Dec 2 at 15:56

slhck

19.4k2298145

answered Dec 2 at 15:56

slhck

19.4k2298145

answered Dec 2 at 15:56

slhck

19.4k2298145

answered Dec 2 at 15:56

slhck

19.4k2298145

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

9,cTQwIdoAVhgIewBQTTfR2sD06gMDH5xhTi3,X

搜尋此網誌

Csdrhrt