prelim_analysis_fh
```{r importing packages, message=FALSE, warning=FALSE} library(knitr) library(tidyverse) library(gridExtra) library(ggridges) setwd('/home/fhopp/github/mf_amp/analysis/')
setwd('/Users/Wasp/GitHub/mf_amp/analysis/')
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, results = 'show', tidy.opts=list(width.cutoff=60),tidy=TRUE)
```{r importing data and defining functions}
surveydata = read.csv("amp_sym/surveydata.csv")
files = list.files(path ="amp_sym", pattern="^10.*.csv$", full.names = TRUE)
expdata_full = do.call(bind_rows, lapply(files, function(x) read_csv(x,
col_types = cols('img_logo_AMP_trials.stopped'=col_skip(),
'LDT_cond1.thisN' = col_skip(),
'LDT_cond1.thisIndex' = col_skip(),
'mask_AMP_2.started'=col_skip(),
'img_logo_prac.stopped'=col_skip(),
'resp_AMP_trials.started'=col_skip(),
'amp_prime_mask.stopped'=col_skip(),
'img_logo_AMP.stopped'=col_skip(),
"img_logo_AMP.started"=col_skip(),
'mask_AMP.started'=col_skip(),
'LDT_cond1.thisTrialN' = col_skip(),
'LDT_cond1.thisRepN' = col_skip(),
"sym_resp_2.started" = col_skip(),
"resp_AMP.started" = col_skip(),
"AMP_key_reminder.started"=col_skip(),
"instr_sym11.started"=col_skip(),
'sym_layout_reminder.stopped'=col_skip(),
'AMP_key_reminder_2.started'=col_skip(),
'img_logo_AMP.stopped'=col_skip(),
"mask_logo_prac.started"=col_skip(),
"img_logo_AMP_trials.started"=col_skip(),
"img_logo_trial.stopped"=col_skip(),
"mask_logo_trial.started"=col_skip()))))
# participant is participant ID in new data
expdata <- expdata_full %>%
select(`participant ID`, # participant ID
words, # words that are being shown
corr_ans, # correct response for AMP and MEM, corrected for LDT below!
category, # category of the word (e.g., fairness.vice)
task, # e.g., AMP, LDT, memory, ...
prime_dur, # The condition for the AMP --
resp_AMP_trials.keys, # Which key was pressed during AMP?
resp_AMP_trials.rt, # RT for keypress during AMP?
resp_LDT_trial.keys, # Which key pressed during LDT
resp_LDT_trial.rt, # RT for LDT
resp_memtask.keys, # which key was pressed during memory task
resp_memtask.rt, # reponse time for memory task
resp_memtask.corr, # was the response for memtask correct?
AMP_word_rating.response, # Degree to which participants rated the word
AMP_nonword_rating.response, # Degree to which participants rated the symbol
AMP_random_rating.response, # Degree to which participants responded randomly
LDT_rating.response)
# Drop practise round words
prac_words <- c('keyboard', 'bisebell', 'banana', 'car', 'keyboard', 'scarf')
expdata <- expdata[ ! expdata$words %in% prac_words, ]
# Rename correct answers for LDT
# Nonword = left
# Word = right
expdata <- within(expdata, corr_ans[corr_ans == 'left' & category != 'nonword' & task == 'LDT_prac'] <- 'right')
expdata <- within(expdata, task[task == 'AMP_prac'] <- 'AMP')
expdata <- within(expdata, task[task == 'LDT_prac'] <- 'LDT')
# Rename 'participant ID' to participant
names(expdata)[names(expdata) == 'participant ID'] <- 'participant'
# Drop NAs
#expdata <- expdata[!is.na(expdata$category), ]
# Determine Correct and False Responses
expdata["AMP_task.corr"] <- 0
expdata["LDT_task.corr"] <- 0
expdata <- expdata %>%
mutate(AMP_task.corr = ifelse(corr_ans == resp_AMP_trials.keys, 1, 0)) %>%
mutate(LDT_task.corr = ifelse(corr_ans == resp_LDT_trial.keys, 1, 0))
# This function will check to see whether an observation is
# an outlier based on median absolute deviation (MAD)
out_mad <- function(x, thres = 3, na.rm = TRUE) {
abs(x - median(x, na.rm = na.rm)) >= thres * mad(x, na.rm = na.rm)
}
out_replace = function(dataframe, cols, rows, newValue = NA) {
if (any(rows)) {
set(dataframe, rows, cols, newValue)
}
}Recoding and Data Checks
Let's do some recoding and variable manipulation.
We'll add a new variable called "wordcat" that tells us whether the word is moral, nonmoral, or a nonword.
We'll split up the category variable into "foundation" and "valence" in case we are interested in the two things separately at any point.
First, let's make sure that we have all of the data and that we don't have duplicate subject numbers. For this we need to get the LDT trials and the AMP trials specifically, and then check to see how many trials we have per subject.
Looks like a bunch of subjects are missing the LDT.
Now we need to do a little bit of cleaning to try our best to get rid of outlying STRTs as well as those who didn't actually try in the task(s). I'm going to look at a few things: a) who they self-reported random responses, b) chance accuracy in the LDT, c) distributions of response times, d) the participants that the RA's called out as not paying attention
First, lets look at self-reported ratings. These are going to be fairly interesting.
```{r rating responses}
df <- expdata_w %>% select(participant, prime_dur, AMP_word_rating.response, AMP_random_rating.response, AMP_nonword_rating.response) %>% group_by(participant) %>% summarize(AMP_word = mean(AMP_word_rating.response, na.rm=TRUE), AMP_symbol = mean(AMP_nonword_rating.response, na.rm=TRUE), AMP_random = mean(AMP_random_rating.response, na.rm=TRUE), AMP_cond = mean(prime_dur, na.rm=TRUE))
plot_data <- df %>% gather("task", "response", -c(AMP_cond, participant))
ggplot(plot_data, aes(x = response, y = ..count..)) + geom_bar(position = "dodge") + scale_x_continuous(breaks=1:7,labels=c("not at all", "","", "somewhat", "","", "completely")) + scale_fill_discrete(labels=c("AMP judgment based on targets", "AMP judgment based on primes", "AMP responded randomly")) + labs(title = "Post-task Responses") + facet_wrap(.~ AMP_cond * task)
df %>% select(participant, AMP_random) %>% filter(AMP_random == 7)
ggplot(plot_data, aes(x = response)) + geom_density(aes(fill = as.factor(AMP_cond), color = as.factor(AMP_cond)), alpha = 0.7) + facet_wrap(.~task)
Looks like people were more willing to admit they responded randomly in the LDT (1001,1013,1023,1029,1032,1063) than the AMP (1032, 1038, 1080). Let's look at button presses in the AMP.
```{r press proportions}
df1 <- expdata_w %>% filter(task == "AMP") %>% group_by(participant, task, resp_AMP_trials.keys) %>% summarize(n = n())
df1 <- df1 %>% spread(resp_AMP_trials.keys, n) %>% summarize(ratio = left/right)
median(df1$ratio) + 3 * sd(df1$ratio)
Pretty highly right-skewed. Let's remove outliers within subjects and conditions and then check to see how we look.
Looks a little better. Let's see if we have any partipant outliers. Here I'll go back to the unfiltered dataset and run the same procedure on the participant means.
Looks like we have a few outliers, but I'm hestitant to toss them since they're on the long end, suggesting that they are thinking about their answers rather than just thumbing through mindlessly. We can do some thinking on this.
So the only cleaning that that we've done so far is re: reaction times. We can perhaps do some more thoughtful cleaning in the future. Now that we have filtered, we can go on to some some preliminary analysis. Let's do some EDA plotting
LDT EDA
AMP EDA
Before we do anything with the AMP, we need to manipulate the data in a way that it is able to be analyzed (turning it from wide to long).
```{r AMP gathering}
AMPtrials <- AMP_trials %>% mutate(subnum = participant) %>% unite("sub_trial", "subnum", "trialnum", sep = "")
AMP_trials_long <- AMP_trials %>% rename("prime_cat" = "wordcat", "prime_foundation" = "foundation", "prime_val" = "valence") %>% gather("prime_target", "words", "words", "nonwords") %>% mutate(prime_target = ifelse(prime_target == "words", "prime", ifelse(prime_target == "nonwords", "target", NA)), val_response = ifelse(keypress == "left", -1, 1), prime_val = ifelse(is.na(prime_val), "nonword", prime_val)) %>% arrange(by = sub_trial)
Interesting! Looks like the procedure works, but need to test statistically to be sure. Vice words elicit negative valence, virtue words elicit positive valence, and nonwords are somewhere in the middle. Nothing really interesting for RTs. Let's look at the foundations and the weights.
Very neat. Looks like our valences line out quite nicely with the words in the moral categories. Note that the controls aren't plotting because there is only one observation per category (the word "pleasant" and the word "unpleasant"). The mean for "pleasant is $.201$ and the mean for "unpleasant" is $-.34$ so they align roughly as expected. Some of the nonword primes have low numbers of observations (n ~= 4), but as we recruit more participants this should even out.
```{r AMP weights}
AMP_weights <- left_join(AMP_trials_long, weights, by = "words")
plot_data <- AMP_weights %>% filter(prime_cat == "moralword" & prime_target == "prime" & !is.na(weight)) %>% group_by(participant) %>% mutate(meanrt = mean(RT)) %>% ungroup() %>% group_by(participant, words, prime_val) %>% summarize(meanval = mean(val_response), weights = mean(weight, na.rm=TRUE), rt = mean(RT), meanrt = mean(meanrt)) %>% mutate(phi = (1/(rt/meanrt)) * meanval) %>% ungroup() %>% group_by(words, prime_val) %>% summarize(phi = mean(phi), meanval = mean(meanval), weights = mean(weights))
ggplot(plot_data, aes(x = weights, y = meanval, color = prime_val)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs(title = "Correlation Between E-MFD Weighting and MF-AMP Responses", x = "E-MFD Weights", y = "Mean Reported Valence") + theme_minimal() + theme(legend.title=element_blank())
df <- plot_data %>% filter(prime_val == "negative") Hmisc::rcorr(df$meanval, df$weights)
df <- plot_data %>% filter(prime_val == "positive") Hmisc::rcorr(df$meanval, df$weights)
Looks like we have a few participants that did not actually fill out the MFQ (1003, 1020, 1021, 1035, 1040, 1044, 1045, 1060). Let's drop them from further analysis.
Survey Data EDA
Let's do some EDA on the survey data.
Descriptives of MFQ domains
Sex differences in MFQ
Native speaker differences in MFQ domains
Looks like no real differences across domains. Let's look at sex differences
Don't see anything interesting right off the bat. Let's combine the survey data with the exp data to do some more stuff.
Combining Survey with Exp Data
Now we can do some more analyses.
Influence of moral salience on response times to moral words in the LDT
RT differences between native and non-native speakers
Influence of moral salience on pos/neg ratings in the AMP
LDT Survey EDA
Looks like people might respond slightly faster to moral words if they have high moral salience but the effect is small if it's anything. Let's look at within-domain salience.
Let's look at whether native speakers respond faster than non-native speakers.
Seems clear that native speakers outperform non-native speakers in the AMP. Might be worth controlling for in a future analysis. Let's move on to effects of within-domain salience.
Seems fairly interesting. Some more thinking to do though. Let's turn to the AMP for now.
AMP Survey EDA
Let's look at the influence of overall moral salience on responses to positive and negative moral words
Interesting. We get a slightly positive correlation between MFQ salience and valence, but it's not significant. Not exactly what I expected. Let's look at individual domains.
Pretty uninterpretable IMHO. Main effect of valence, but not really anything with MFQ salience as far as I can tell. Let's look at a couple of the other self-report things., namely religiosity and political affiliation.
Let's follow up on that political orientation plot and just do some bar plots.
Last updated
Was this helpful?