prelim_analysis
```{r importing packages, message=FALSE, warning=FALSE} library(knitr) library(tidyverse) library(gridExtra) library(ggridges)
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, results = 'show', tidy.opts=list(width.cutoff=60),tidy=TRUE)
```{r importing data and defining functions}
surveydata = read.csv("data/surveydata.csv")
files = list.files(path ="data", pattern="^10.*.csv$", full.names = TRUE)
expdata_full = do.call(bind_rows,
lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))
expdata <- expdata_full %>%
select(participant,
words,
mask,
corr_ans,
category,
nonwords,
tweets,
topic,
task,
keypress,
RT,
corr,
participant,
AMP_nonword_rating.response,
AMP_word_rating.response,
AMP_random_rating.response,
LDT_rating.response,
AMP_loop.thisN,
AMP_trials.thisN)
# This function will check to see whether an observation is
# an outlier based on median absolute deviation (MAD)
out_mad <- function(x, thres = 3, na.rm = TRUE) {
abs(x - median(x, na.rm = na.rm)) >= thres * mad(x, na.rm = na.rm)
}
out_replace = function(dataframe, cols, rows, newValue = NA) {
if (any(rows)) {
set(dataframe, rows, cols, newValue)
}
}Recoding and Data Checks
Let's do some recoding and variable manipulation.
For some reason PsychoPy records correct as 0 and incorrect as 1, so let's switch those.
We'll add a new variable caled "wordcat" that tells us whether the word is moral, nonmoral, or a nonword.
We'll split up the category variable into "foundation" and "valence" in case we are interested in the two things separately at any point.
Looks like I accidentally named the AMP different things in condition 1 and condition 2. To remedy this, we'll coalesce the 'AMP_loop.thisN
andAMP_trials.thisN` variables together into one column.
First, let's make sure that we have all of the data and that we don't have duplicate subject numbers. For this we need to get the LDT trials and the AMP trials specifically, and then check to see how many trials we have per subject.
Looks like it's all there.
Now we need to do a little bit of cleaning to try our best to get rid of outlying STRTs as well as those who didn't actually try in the task(s). I'm going to look at a few things: a) who they self-reported random responses, b) chance accuracy in the LDT, c) distributions of response times, d) the participants that the RA's called out as not paying attention
First, lets look at self-reported ratings. These are going to be fairly interesting.
```{r rating responses}
df <- expdata_w %>% select(participant, AMP_nonword_rating.response, AMP_word_rating.response, AMP_random_rating.response, LDT_rating.response) %>% group_by(participant) %>% summarize(AMP_word = mean(AMP_word_rating.response, na.rm=TRUE), AMP_nonword = mean(AMP_nonword_rating.response, na.rm=TRUE), AMP_random = mean(AMP_random_rating.response, na.rm=TRUE), LDT_random = mean(LDT_rating.response, na.rm=TRUE))
plot_data <- df %>% gather("task", "response", -participant)
ggplot(plot_data, aes(x = response, y = ..count..)) + geom_bar(position = "dodge") + scale_x_continuous(breaks=1:7,labels=c("not at all", "","", "somewhat", "","", "completely")) + scale_fill_discrete(labels=c("AMP judgment based on targets", "AMP judgment based on primes", "AMP responded randomly", "LDT responded randomly")) + labs(title = "Post-task Responses") + facet_wrap(. ~ task)
df %>% select(participant, LDT_random, AMP_random) %>% filter(AMP_random == 7 | LDT_random == 7)
There are a few clear outliers here. 1031 and 1074. 1074 was one of the ones called out by the RA's as not trying at all. Would probably be worth filtering if they pop up again.
```{r LDT accuracy}
LDT_trials <- expdata_w %>% group_by(participant) %>% mutate(LDT_random = mean(LDT_rating.response, na.rm=TRUE)) %>% ungroup() %>% filter(task == "LDT") %>% group_by(participant) %>% summarize(corr = mean(corr, na.rm=TRUE))
Let's see if there are any participants who are correctness outliers.
df_indouts <- LDT_trials %>% select(participant, corr) %>% mutate_if(is.numeric, funs(mad = out_mad))
table(df_indouts$mad)
Only two words are outliers, the non-word "ammessment," and the non-word "tove" Only 28% of people got ammessment and only 27% got tove correct.
Moral words were more accurately responded to as words than were nonwords. Looks like there are a few people that hover around chance in their responses but they aren't technically outliers.
Now let's looks at reaction times in the AMP and the LDT. I'm not sure that we are going to be able to pick out people who are just clicking through based on reaction time, but I'll do some cross referencing with the people who the RA's said were just clicking through without looking at the screen and see if I can get an idea.
```{r reaction times}
df <- expdata_w %>% filter(task == "AMP" | task == "LDT")
ggplot(df, aes(x = RT)) + geom_histogram()
Looks a little better. Let's see if we have any partipant outliers. Here I'll go back to the unfiltered dataset and run the same procedure on the participant means.
Looks like we have a few outliers, but I'm hestitant to toss them since they're on the long end, suggesting that they are thinking about their answers rather than just thumbing through mindlessly. We can do some thinking on this.
So the only cleaning that that we've done so far is re: reaction times. We can perhaps do some more thoughtful cleaning in the future. Now that we have filtered, we can go on to some some preliminary analysis. Let's do some EDA plotting
LDT EDA
AMP EDA
Before we do anything with the AMP, we need to manipulate the data in a way that it is able to be analyzed (turning it from wide to long).
```{r AMP gathering}
AMPtrials <- AMP_trials %>% mutate(subnum = participant) %>% unite("sub_trial", "subnum", "trialnum", sep = "")
AMP_trials_long <- AMP_trials %>% rename("prime_cat" = "wordcat", "prime_foundation" = "foundation", "prime_val" = "valence") %>% gather("prime_target", "words", "words", "nonwords") %>% mutate(prime_target = ifelse(prime_target == "words", "prime", ifelse(prime_target == "nonwords", "target", NA)), val_response = ifelse(keypress == "left", -1, 1), prime_val = ifelse(is.na(prime_val), "nonword", prime_val)) %>% arrange(by = sub_trial)
Interesting! Looks like the procedure works, but need to test statistically to be sure. Vice words elicit negative valence, virtue words elicit positive valence, and nonwords are somewhere in the middle. Nothing really interesting for RTs. Let's look at the foundations and the weights.
Very neat. Looks like our valences line out quite nicely with the words in the moral categories. Note that the controls aren't plotting because there is only one observation per category (the word "pleasant" and the word "unpleasant"). The mean for "pleasant is $.201$ and the mean for "unpleasant" is $-.34$ so they align roughly as expected. Some of the nonword primes have low numbers of observations (n ~= 4), but as we recruit more participants this should even out.
```{r AMP weights}
AMP_weights <- left_join(AMP_trials_long, weights, by = "words")
plot_data <- AMP_weights %>% filter(prime_cat == "moralword" & prime_target == "prime" & !is.na(weight)) %>% group_by(participant) %>% mutate(meanrt = mean(RT)) %>% ungroup() %>% group_by(participant, words, prime_val) %>% summarize(meanval = mean(val_response), weights = mean(weight, na.rm=TRUE), rt = mean(RT), meanrt = mean(meanrt)) %>% mutate(phi = (1/(rt/meanrt)) * meanval) %>% ungroup() %>% group_by(words, prime_val) %>% summarize(phi = mean(phi), meanval = mean(meanval), weights = mean(weights))
ggplot(plot_data, aes(x = weights, y = meanval, color = prime_val)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs(title = "Correlation Between E-MFD Weighting and MF-AMP Responses", x = "E-MFD Weights", y = "Mean Reported Valence") + theme_minimal() + theme(legend.title=element_blank())
df <- plot_data %>% filter(prime_val == "negative") Hmisc::rcorr(df$meanval, df$weights)
df <- plot_data %>% filter(prime_val == "positive") Hmisc::rcorr(df$meanval, df$weights)
Looks like we have a few participants that did not actually fill out the MFQ (1003, 1020, 1021, 1035, 1040, 1044, 1045, 1060). Let's drop them from further analysis.
Survey Data EDA
Let's do some EDA on the survey data.
Descriptives of MFQ domains
Sex differences in MFQ
Native speaker differences in MFQ domains
Looks like no real differences across domains. Let's look at sex differences
Don't see anything interesting right off the bat. Let's combine the survey data with the exp data to do some more stuff.
Combining Survey with Exp Data
Now we can do some more analyses.
Influence of moral salience on response times to moral words in the LDT
RT differences between native and non-native speakers
Influence of moral salience on pos/neg ratings in the AMP
LDT Survey EDA
Looks like people might respond slightly faster to moral words if they have high moral salience but the effect is small if it's anything. Let's look at within-domain salience.
Let's look at whether native speakers respond faster than non-native speakers.
Seems clear that native speakers outperform non-native speakers in the AMP. Might be worth controlling for in a future analysis. Let's move on to effects of within-domain salience.
Seems fairly interesting. Some more thinking to do though. Let's turn to the AMP for now.
AMP Survey EDA
Let's look at the influence of overall moral salience on responses to positive and negative moral words
Interesting. We get a slightly positive correlation between MFQ salience and valence, but it's not significant. Not exactly what I expected. Let's look at individual domains.
Pretty uninterpretable IMHO. Main effect of valence, but not really anything with MFQ salience as far as I can tell. Let's look at a couple of the other self-report things., namely religiosity and political affiliation.
Let's follow up on that political orientation plot and just do some bar plots.
Last updated
Was this helpful?