Exercises

4.1 Simple statistical tests

Start by loading in and inspecting your data as you have done in previous tasks. I recommend you do these tasks on a data set which you have already made tidy & you know that the data import works as it should.

Q4.1.1 Select a continuous variable and a factor with two levels, e.g. sex. Run a t-test (command t.test) on these variables.

Q4.1.2 From the output, find the t-statistic, degrees of freedom and p-value. Can you also find confidence interval and means in both groups?

Q4.1.3 Can you find out how to change the test from two-sided to one-sided? Run the same test with one-sided alternative hypothesis (you can pick which direction makes more sense).

Tip: this is something you can answer with the built-in help!

Q4.1.4 By default, R assumes unequal variances and runs Welch's t-test. Can you find out how to tell the t.test function that you assume equal variances in the two groups?

4.2 Testing assumptions

Demo 4: Testing assumptions (talk)

Demo 4: Testing assumptions (demo)

Q4.2.1 Are we justified in assuming equal variances in our t-test? Find out by testing for homogeneity of variance. If your data are normally distributed, you can use var.test for F-test or bartlett.test (for more than two groups). If your data are not normally distributed, you can use leveneTest from the car package (can also manage more than two groups) or fligner.test.

Tip: At least the Levene test requires that your explanatory variable (the one after the ~) is a factor. If you did not take the time to convert your factors into factors when importing data, do that now!

Q4.2.2 What does your result tell you? Is a low p-value a sign of having equal variances or not having equal variances?

Tip: You can try to formulate the null hypothesis and alternative hypothesis for these tests, and reason based on that.

Q4.2.3 Choose one continuous variable from your data and see if it is normally distributed. First, run visual inspection (hist, qqnorm and qqline).

Q4.2.4 Then, also run Shapiro-Wilk test of normality (shapiro.test)

4.3 Regression

Demo 4: (Linear) regression models (talk)

Demo 4: Regression (demo)

For the regression tasks, find a continuous dependent variable and a number of independent variables

Q4.3.1 Create a simple linear regression with lm(y ~ x, data=nameofdataframe) (where y is your dependent variable, x is one of your independent variables and nameofdataframe is the name of your dataframe).

Q4.3.2 Assign the result of the linear model to a variable and run summary on that variable. What can you tell about the result?

Q4.3.3 Fit another linear model. Add another independent variable to your model by adding + z to your lm() function call and assign the result to another variable. Again, see the result with summary

Q4.3.4 Compare these two models with anova. Does the more complex model fit your data better than the simpler one?

Bonus: logistic regression

Create a new binary variable (or use one you have already) to run logistic regression.

Demo 4: Recap

Course feedback

Now that you made it this far, would you please take a few minutes and fill in the course feedback form which you can find here. It would mean a lot to me if you did that and it shouldn't take too long. Thanks in advance!

Resources

You can find the demo videos introducing new concepts and giving you tools to complete these exercises here
You can find the slides shown in the video here
You can see the code from the videos here
You can see the solutions for the exercises below here (but do try to solve the tasks with help from the demo code and google before looking at solutions)
An approachable introduction to linear mixed effects modeling with implementation in R (a pre-print)
Specifying multilevel models in R
Twitter thread about ANOVA in R and why it's not always as simple as one might hope
A tutorial for logistic regression

Week 4

Learning goals