Science des données biologiques

Réalisé par le service d'Écologie numérique des Milieux aquatiques, Université de Mons (Belgique)

Préambule

Durant l’ensemble de ce cours, vous avez été confronté à des learnr afin de vérifier l’acquisition de différents concepts liés à la science des données biologiques. Votre examen vous est proposé sous le même format.

Ce questionnaire comprend 21 questions dont 20 questions à choix multiples

Ne vous trompez pas dans votre adresse mail et votre identifiant Github

N’oubliez pas de soumettre votre réponse après chaque exercice

Conformément au RGPD (Règlement Général sur la Protection des Données), nous sommes tenus de vous informer de ce que vos résultats seront collecté afin de suivre votre progression. Les données seront enregistrées au nom de l’utilisateur apparaissant en haut de cette page. Corrigez si nécessaire ! En utilisant ce tutoriel, vous marquez expressément votre accord pour que ces données puissent être collectées par vos enseignants et utilisées pour vous aider et vous évaluer. Après avoir été anonymisées, ces données pourront également servir à des études globales dans un cadre scientifique et/ou éducatif uniquement.

Nuage de points

Le jeu de données growth est mis à votre disposition comprenant la variable weight et la variable height.

Réalisez un nuage de points avec la variable weight en abscisse et height en ordonnée.

Des snippets sont mis à votre disposition en fin de question

set.seed(1000)
growth <- tibble::tibble(
  weight = (1:100 + rnorm(n = 100, mean = 0,sd = 0.5) + 10),
  height = (0.3*weight) + rnorm(n = 100, mean = 0,sd = 3) + 30)

#chart(growth, height~ weight) +
#  geom_point() 

Répondez à la question ci-dessous sur base du graphique ci-dessus:

Snippets

## Charts ###############################################################################################
# ...charts
#..c

## Charts: Add ##########################################################################################
#..charts: add layers or annotations
#.ca

#.caplotly: convert last ggplot2 into interactive chart
plotly::ggplotly()

#.caylab: add or change Y label
${1:CHART} +
  ylab("${2:YOUR Y LABEL HERE}")

#.caxlab: add or change X label
${1:CHART} +
  xlab("${2:YOUR X LABEL HERE}")

#.catitle: add a plot title
${1:CHART} +
  ggtitle("${2:YOUR TITLE HERE}")

## Charts: Multivariate #################################################################################
# ..charts: multivariate
# .cm

#.cmcorr: correlation chart
corrplot::corrplot(cor(${1:DF}[, ${2:1:3}],
                       use = "pairwise.complete.obs"), method = "ellipse")

#.cmxy: multivariate X-Y scatterplot
GGally::ggscatmat(as.data.frame(${1:DF}), ${2:1:3})

## Charts: Bivariate ####################################################################################
snippet ..charts: bivariate
    .cb

#.cbhistfact: histogram by factor (facets)
chart(data = ${1:DF}, ~${2:XNUM} %fill=% ${3:XFACTOR} | ${3:XFACTOR}) +
  geom_histogram(data = select(${1:DF}, -${3:XFACTOR}), fill = "grey", bins = ${4:30}) +
  geom_histogram(bins = ${4:30}, show.legend = FALSE)

#.cberrbar2: error bars by two factors
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} %col=% ${4:XFACTOR2}) +
  geom_jitter(alpha = 0.4, position = position_dodge(0.4)) +
  stat_summary(geom = "point", fun.y = "mean", position = position_dodge(0.4)) +
  stat_summary(geom = "errorbar", width = 0.1, position = position_dodge(0.4),
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cberrbar: error bars by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_jitter(alpha = 0.4, width = 0.2) +
  stat_summary(geom = "point", fun.y = "mean") +
  stat_summary(geom = "errorbar", width = 0.1,
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cbviolin: violinplot by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_violin()

#.cbbox: boxplot by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_boxplot()

#.cbxy: X-Y scatterplot
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XNUM}) +
  geom_point()

## Charts: Univariate ###################################################################################
# ..charts: univariate
#.cu

#.cuqqchisq: QQ plot - chi-square
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "chisq", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqf: QQ plot - F
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "f", df1 = ${3:NUMERATOR_DF}, df2 = ${4:DENOMINATOR_DF},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqt: QQ plot - Student t
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "t", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqnorm: QQ plot - normal
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "norm",
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuhbar: horizontal bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar() + coord_flip()

#.cuvbar: vertical bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar()

#.cuhist: histogram
chart(data = ${1:DF}, ~${2:VARNUM}) +
  geom_histogram(binwidth = ${3:30})

Campagne de pêche de Thymallus thymallus L. 1758

Le service de la pêche du service public de wallonie a réalisé une campagne de pêche axée sur l’ombre commun sur son territoire afin d’étudier la taille des spécimens présents dans les cours d’eau.

Le jeu de données thymallus est mis à votre disposition comprenant la variable size. Réalisez un histogramme afin de déterminer le mode et la symétrie.

Des snippets sont mis à votre disposition en fin de question

set.seed(1000)
thymallus <- data_frame(size = rnorm(5000, 30, 8)) %>.%
  filter(., size > 0)

#chart(df, ~ size) +
#  geom_histogram() +
#  labs(x = "Unimodal & symétrique", y = "Effectifs")

Répondez à la question ci-dessous sur base du graphique ci-dessus:

Snippets

## Charts ###############################################################################################
# ...charts
#..c

## Charts: Add ##########################################################################################
#..charts: add layers or annotations
#.ca

#.caplotly: convert last ggplot2 into interactive chart
plotly::ggplotly()

#.caylab: add or change Y label
${1:CHART} +
  ylab("${2:YOUR Y LABEL HERE}")

#.caxlab: add or change X label
${1:CHART} +
  xlab("${2:YOUR X LABEL HERE}")

#.catitle: add a plot title
${1:CHART} +
  ggtitle("${2:YOUR TITLE HERE}")

## Charts: Multivariate #################################################################################
# ..charts: multivariate
# .cm

#.cmcorr: correlation chart
corrplot::corrplot(cor(${1:DF}[, ${2:1:3}],
                       use = "pairwise.complete.obs"), method = "ellipse")

#.cmxy: multivariate X-Y scatterplot
GGally::ggscatmat(as.data.frame(${1:DF}), ${2:1:3})

## Charts: Bivariate ####################################################################################
snippet ..charts: bivariate
    .cb

#.cbhistfact: histogram by factor (facets)
chart(data = ${1:DF}, ~${2:XNUM} %fill=% ${3:XFACTOR} | ${3:XFACTOR}) +
  geom_histogram(data = select(${1:DF}, -${3:XFACTOR}), fill = "grey", bins = ${4:30}) +
  geom_histogram(bins = ${4:30}, show.legend = FALSE)

#.cberrbar2: error bars by two factors
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} %col=% ${4:XFACTOR2}) +
  geom_jitter(alpha = 0.4, position = position_dodge(0.4)) +
  stat_summary(geom = "point", fun.y = "mean", position = position_dodge(0.4)) +
  stat_summary(geom = "errorbar", width = 0.1, position = position_dodge(0.4),
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cberrbar: error bars by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_jitter(alpha = 0.4, width = 0.2) +
  stat_summary(geom = "point", fun.y = "mean") +
  stat_summary(geom = "errorbar", width = 0.1,
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cbviolin: violinplot by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_violin()

#.cbbox: boxplot by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_boxplot()

#.cbxy: X-Y scatterplot
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XNUM}) +
  geom_point()

## Charts: Univariate ###################################################################################
# ..charts: univariate
#.cu

#.cuqqchisq: QQ plot - chi-square
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "chisq", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqf: QQ plot - F
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "f", df1 = ${3:NUMERATOR_DF}, df2 = ${4:DENOMINATOR_DF},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqt: QQ plot - Student t
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "t", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqnorm: QQ plot - normal
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "norm",
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuhbar: horizontal bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar() + coord_flip()

#.cuvbar: vertical bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar()

#.cuhist: histogram
chart(data = ${1:DF}, ~${2:VARNUM}) +
  geom_histogram(binwidth = ${3:30})

Test de comparaisons multiples

# nom du jeu de données df
names(df)
[1] "y"     "group"
summary(df)
       y         group 
 Min.   :10.15   A:40  
 1st Qu.:18.12   B:40  
 Median :21.59   C:40  
 Mean   :21.37         
 3rd Qu.:24.69         
 Max.   :35.08         
anova(anova. <- lm(data = df, y ~ group))

Suite à l’analyse des résultats ci-dessus, réalisez un test de comparaison multiple de Tukey.

Des snippets sont mis à votre disposition en fin de question

set.seed(43)
y <- c(rnorm(n = 40, mean = 20, sd = 5),
       rnorm(n = 40, mean = 21, sd = 5),
       rnorm(n = 40, mean = 22, sd = 5))
group <- rep(c("A", "B", "C"), each = 40)

df <- tibble::tibble(y = y, group = as.factor(group))
anova(anova. <- lm(data = df, y ~ group))
# ..hypothesis tests: means
#.hm

# .hmanovamult: anova - multiple comparisons [multcomp]
summary(anovaComp. <- confint(multcomp::glht(anova.,
  linfct = multcomp::mcp(${1:XFACTOR} = "Tukey")))) # Add a second factor if you want
.oma <- par(oma = c(0, 5.1, 0, 0)); plot(anovaComp.); par(.oma); rm(.oma)

# .hmanovaresid: anova - residuals
residuals(anova.)

# .hmanovaqqplot: anova - residuals QQ-plot
# plot(anova., which = 2)
anova. %>.%
  chart(broom::augment(.), aes(sample = .std.resid)) +
  geom_qq() +
  #geom_qq_line(colour = "darkgray") +
  labs(x = "Theoretical quantiles", y = "Standardized residuals") +
  ggtitle("Normal Q-Q")

# .hmanova2nested: two-way ANOVA (nested model)
anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR} + ${4:BLOCK} %in% ${3:XFACTOR}))

# .hmanova2noint: two-way ANOVA (without interactions)
anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} + ${4:XFACTOR2}))

# .hmanova2: two-way ANOVA (complete model)
anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} * ${4:XFACTOR2}))

# .hmanova2desc: two-way ANOVA (description)
${1:DF} %>.%
  group_by(., ${2:XFACTOR1}, ${3:XFACTOR2}) %>.%
  summarise(., mean = mean(${4:YNUM}), sd = sd(${4:YNUM}), count = sum(!is.na(${4:YNUM})))

# .hmanova1: one-way ANOVA
anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}))

# .hmanova1desc: one-way ANOVA (description)
${1:DF} %>.%
  group_by(., ${2:XFACTOR}) %>.%
  summarise(., mean = mean(${3:YNUM}), sd = sd(${3:YNUM}), count = sum(!is.na(${3:YNUM})))

# .hmttestindep: independent Student's t-test
    t.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR},
        alternative = "two.sided", conf.level = 0.95, var.equal = TRUE)

# .hmttestpaired: paired Student's t-test
t.test(${1:DF}\$XNUM, ${1:DF}\$YNUM,
       alternative = "two.sided", conf.level = 0.95, paired = TRUE)

# .hmttestuni: univariate Student's t-test
t.test(${1:DF}\$2:XNUM,
       alternative = "two.sided", mu = 0, conf.level = 0.95)

Biométrie de Leptograpsus variegatus (Fabricius, 1793)

Sur base de la matrice de corrélation ci-dessous provenant de données collectées sur 200 crabes :

  FL RW CL CW BD
FL 1 0.9051 0.9785 0.9654 0.9866
RW 0.9051 1 0.8916 0.8992 0.8895
CL 0.9785 0.8916 1 0.9951 0.9823
CW 0.9654 0.8992 0.9951 1 0.9675
BD 0.9866 0.8895 0.9823 0.9675 1

Analyse d’un test statistique

Répondez aux questions ci-dessous sur base du tableau proposé ci-dessus :

Quiz

Reproduction d’un graphique

Reproduisez le graphique suivant

Warning: `data_frame()` is deprecated, use `tibble()`.
This warning is displayed once per session.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

avec le jeu de donnée df

# nom du jeu de données df
# names
names(df)
[1] "x"    "y"    "zone" "area"
summary(df)
       x                 y          zone    area   
 Min.   :  4.127   Min.   : 13.68   A:150   1:100  
 1st Qu.: 85.470   1st Qu.: 94.31   B:150   2:100  
 Median :160.200   Median :171.84           3:100  
 Mean   :160.770   Mean   :170.66                  
 3rd Qu.:234.662   3rd Qu.:244.18                  
 Max.   :315.348   Max.   :331.31                  

Des snippets sont mis à votre disposition en fin de question

set.seed(43)
df <- data_frame(x = 1:300 + rnorm(n = 300, mean = 10, sd = 5), 
                   y = x + rnorm(n = 300, mean = 10, sd = 5),
                   zone = as.factor(rep(c("A", "B"), times = 150)),
                   area = as.factor(rep(1:3, each = 100))
                   )
#TODO

Vous avez à votre disposition les snippets suivants

## Charts ###############################################################################################
# ...charts
# ..c


## Charts: Add ##########################################################################################
# ..charts: add layers or annotations
# .ca

# .caplotly: convert last ggplot2 into interactive chart
    plotly::ggplotly()

# .caylab: add or change Y label
    ${1:CHART} +
        ylab("${2:YOUR Y LABEL HERE}")

# .caxlab: add or change X label
    ${1:CHART} +
        xlab("${2:YOUR X LABEL HERE}")

# .catitle: add a plot title
    ${1:CHART} +
        ggtitle("${2:YOUR TITLE HERE}")


## Charts: Multivariate #################################################################################
# ..charts: multivariate
# .cm

# .cmcorr: correlation chart
corrplot::corrplot(cor(${1:DF}[, ${2:1:3}], 
                       use = "pairwise.complete.obs"), method = "ellipse")

# .cmxy: multivariate X-Y scatterplot
    GGally::ggscatmat(as.data.frame(${1:DF}), ${2:1:3})


## Charts: Bivariate ####################################################################################
# ..charts: bivariate
# .cb

#.cbhistfact: histogram by factor (facets)
chart(data = ${1:DF}, ~${2:XNUM} %fill=% ${3:XFACTOR} | ${3:XFACTOR}) +
      geom_histogram(data = select(${1:DF}, -${3:XFACTOR}), fill = "grey", bins = ${4:30}) +
      geom_histogram(bins = ${4:30}, show.legend = FALSE)

# .cberrbar2: error bars by two factors
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} %col=% ${4:XFACTOR2}) +
  geom_jitter(alpha = 0.4, position = position_dodge(0.4)) +
  stat_summary(geom = "point", fun.y = "mean", position = position_dodge(0.4)) +
  stat_summary(geom = "errorbar", width = 0.1, position = position_dodge(0.4),
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cberrbar: error bars by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_jitter(alpha = 0.4, width = 0.2) +
  stat_summary(geom = "point", fun.y = "mean") +
  stat_summary(geom = "errorbar", width = 0.1,
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

# .cbviolin: violinplot by factor
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
        geom_violin()

# .cbbox: boxplot by factor
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
        geom_boxplot()

# .cbxy: X-Y scatterplot
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XNUM}) +
        geom_point()

Descripteurs employés avec la boite de dispersion

Normalité d’une variable

#nom du jeu de données : df
names(df)
[1] "x"
summary(df)
       x          
 Min.   :-94.368  
 1st Qu.:-24.333  
 Median :  7.475  
 Mean   :  9.854  
 3rd Qu.: 36.684  
 Max.   :113.215  

Sur base du jeu de données suivant ci-dessus , répondez à la question ci-dessous

Des snippets sont mis à votre disposition en fin de question

set.seed(43)
df <- tibble::tibble(x = rnorm(n = 60, mean = 10, sd = 50))
## Charts ###############################################################################################
# ...charts
# ..c


## Charts: Add ##########################################################################################
# ..charts: add layers or annotations
# .ca

# .caplotly: convert last ggplot2 into interactive chart
    plotly::ggplotly()

# .caylab: add or change Y label
    ${1:CHART} +
        ylab("${2:YOUR Y LABEL HERE}")

# .caxlab: add or change X label
    ${1:CHART} +
        xlab("${2:YOUR X LABEL HERE}")

# .catitle: add a plot title
    ${1:CHART} +
        ggtitle("${2:YOUR TITLE HERE}")


## Charts: Multivariate #################################################################################
# ..charts: multivariate
# .cm

# .cmcorr: correlation chart
corrplot::corrplot(cor(${1:DF}[, ${2:1:3}], 
                       use = "pairwise.complete.obs"), method = "ellipse")

# .cmxy: multivariate X-Y scatterplot
    GGally::ggscatmat(as.data.frame(${1:DF}), ${2:1:3})


## Charts: Bivariate ####################################################################################
# ..charts: bivariate
# .cb

#.cbhistfact: histogram by factor (facets)
chart(data = ${1:DF}, ~${2:XNUM} %fill=% ${3:XFACTOR} | ${3:XFACTOR}) +
      geom_histogram(data = select(${1:DF}, -${3:XFACTOR}), fill = "grey", bins = ${4:30}) +
      geom_histogram(bins = ${4:30}, show.legend = FALSE)

# .cberrbar2: error bars by two factors
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} %col=% ${4:XFACTOR2}) +
  geom_jitter(alpha = 0.4, position = position_dodge(0.4)) +
  stat_summary(geom = "point", fun.y = "mean", position = position_dodge(0.4)) +
  stat_summary(geom = "errorbar", width = 0.1, position = position_dodge(0.4),
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

#.cberrbar: error bars by factor
chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
  geom_jitter(alpha = 0.4, width = 0.2) +
  stat_summary(geom = "point", fun.y = "mean") +
  stat_summary(geom = "errorbar", width = 0.1,
               fun.data = "mean_cl_normal", fun.args = list(conf.int = 0.95))

# .cbviolin: violinplot by factor
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
        geom_violin()

# .cbbox: boxplot by factor
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}) +
        geom_boxplot()

# .cbxy: X-Y scatterplot
    chart(data = ${1:DF}, ${2:YNUM} ~ ${3:XNUM}) +
        geom_point()
    
## Charts: Univariate ###################################################################################
# ..charts: univariate
#.cu

#.cuqqchisq: QQ plot - chi-square
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "chisq", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqf: QQ plot - F
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "f", df1 = ${3:NUMERATOR_DF}, df2 = ${4:DENOMINATOR_DF},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqt: QQ plot - Student t
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "t", df = ${3:DEGREES_OF_FREEDOM},
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuqqnorm: QQ plot - normal
car::qqPlot(${1:DF}[["${2:XNUM}"]], distribution = "norm",
            envelope = 0.95, col = "Black", ylab = "${2:XNUM}")

#.cuhbar: horizontal bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar() + coord_flip()

#.cuvbar: vertical bars
chart(data = ${1:DF}, ~factor(${2:VAR})) +
  geom_bar()

#.cuhist: histogram
chart(data = ${1:DF}, ~${2:VARNUM}) +
  geom_histogram(binwidth = ${3:30})

Diversité de poissons dans les cours d’eau wallons

Des scientifiques réalisent des pêches sur 100 stations d’intérêts afin d’étudier la diversité des poissons dans les cours d’eau de wallonie. Ils s’intéressent tout particulièrement à l’espèce Barbus barbus L. 1758. A la suite de leurs recensements, ils souhaitent connaitre le nombre de stations dans les zones A,C,D dont la densité relative en barbeau est supérieure à 12.5 % par rapport à l’ensemble des poissons pêchés.

Ils mettent à votre disposition un jeu de données qui se nomme density avec deux variables les zones (area) et la densité relative (densi) de barbeau commun par rapport à l’ensemble des poissons pêchés.

#nom du jeu de données : density
names(density)
[1] "area"  "densi"
summary(density)
 area       densi       
 A:20   Min.   : 8.094  
 B:20   1st Qu.:10.676  
 C:20   Median :11.876  
 D:20   Mean   :11.862  
 E:20   3rd Qu.:12.882  
        Max.   :16.617  

Vous devez retirer les zones (area) B et E et ne garder que les valeurs de densité strictement supérieures à 12.5.

Des snippets sont mis à votre disposition en fin de question

set.seed(43)
set.seed(43)
rep(c("A", "B", "C", "D", "E"), each = 20) -> t
tt <- c(rnorm(n = 20, mean = 10, sd = 1),
        rnorm(n = 20, mean = 12, sd = 1),
        rnorm(n = 20, mean = 11, sd = 1),
        rnorm(n = 20, mean = 12, sd = 1),
        rnorm(n = 20, mean = 14, sd = 1))
density <- tibble::tibble(area = as.factor(t) , densi = tt)

density %>.%
  dplyr::filter(., area %in% c("A", "C", "D") & densi > 12.5) %>.%
  nrow(.)-> t111

Snippets

#.dosel select cases
DF %>.%
  filter(., CONDITIONS) -> DF2
DF %>.%
  select(., VAR1, VAR2) -> DF2

Homoscédasticité

Type de données

Identifiant Travail Age Genre
1 intensif 18 H
2 faible 24 H
3 moyen 20 F
4 moyen 19 H

Sur base des données ci-dessus, répondez aux questions suivantes

Choix d’un test statistique

Indice de masse corporelle

Sur base d’un questionnaire lié à l’indice de masse corporelle et l’activité physique, les chercheurs ont classé les individus en différents niveaux d’activité physique et d’IMC. Ils obtiennent le tableau suivant :

Tableau
Sous.poids Normal Surpoids Obésité
Activité physique occasionnelle 66 72 47 35
Activité physique régulière 70 62 16 22
Activité physique de haut niveau 34 55 42 50

Afin de vous aider dans vos réflexions, voici la somme des lignes et des colonnes

 Activité physique occasionnelle      Activité physique régulière 
                             220                              170 
Activité physique de haut niveau 
                             181 
Sous.poids     Normal   Surpoids    Obésité 
       170        189        105        107 

Vous avez à votre disposition une zone de code.

Répondez aux questions ci-dessous sur base du tableau proposé ci-dessus :

Distribution de Fischer

Culture de Zea mays L. 1753

Dans un champ de maïs (Considérez le nombre de plants de maïs comme très grand), dont la taille moyenne est de 139 cm et d’écart type de 22 cm.

Des snippets sont mis à votre disposition en fin de question

Snippets

### .iu : distribution uniforme 
punif(QUANTILES, min = 0, max = 1, lower.tail = TRUE)
qunif(PROBABILITIES, min = 0, max = 1, lower.tail = TRUE)

### .in distribution normale
pnorm(QUANTILES, mean = 0, sd = 1, lower.tail = TRUE)
qnorm(PROBABILITIES, mean = 0, sd = 1, lower.tail = TRUE)

### .il distribution log-normal
plnorm(QUANTILES, meanlog = 0, sdlog = 1, lower.tail = TRUE)
qlnorm(PROBABILITIES, meanlog = 0, sdlog = 1, lower.tail = TRUE)

### .it distribution de student
.mu <- 0; .s <- 1; pt((QUANTILES - .mu)/.s, df = DEGREES_OF_FREEDOM, lower.tail = TRUE)
.mu <- 0; .s <- 1; .mu + .s * qt(PROBABILITIES, df = DEGREES_OF_FREEDOM, lower.tail = TRUE)

### .ib distribution binomial
pbinom(QUANTILES, size = N_TRIALS, prob = SUCCESS_PROB, lower.tail = TRUE)
qbinom(PROBABILITIES, size = N_TRIALS, prob = SUCCESS_PROB, lower.tail = TRUE)

### .ip distribution de poisson
ppois(QUANTILES, lambda = MEAN_OCCURENCES, lower.tail = TRUE)
qpois(PROBABILITIES, lambda = MEAN_OCCURENCES, lower.tail = TRUE)

### .ic distribution chi2
pchisq(QUANTILES, df = DEGREES_OF_FREEDOM, lower.tail = TRUE)
qchisq(PROBABILITIES, df = DEGREES_OF_FREEDOM, lower.tail = TRUE)

### .if distibution de F
pf(QUANTILES, df1 = NUMERATOR_DF, df2 = DENOMINATOR_DF, lower.tail = TRUE)

Test de Student

set.seed(43)
weight <- tibble::tibble(weight = c(rnorm(n = 15, mean = 100, sd = 5), 
                      rnorm(n = 15, mean = 102, sd = 5)), 
           area = rep(c("a", "b"), each = 15))
weight$area <- as.factor(weight$area)

Vous avez à votre disposition le jeu de données weight dont voici quelques informations :

# nom du jeu de données : weight
# nom des variables du jeu de données
names(weight)
[1] "weight" "area"  
# résumé des variables
summary(weight)
     weight       area  
 Min.   : 90.47   a:15  
 1st Qu.: 97.71   b:15  
 Median :100.00         
 Mean   :100.33         
 3rd Qu.:102.71         
 Max.   :112.32         

Réalisez un test de Student bilatéral avec un seuil \(\alpha\) de 0.05 et de variance inégale.

Des snippets sont mis à votre disposition en fin de question

Quiz
# .hm
# .hmttestindep: independent Student's t-test
t.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}, alternative = "two.sided", conf.level = 0.95, var.equal = TRUE)

# .hmttestpaired: paired Student's t-test
t.test(${1:DF}\$XNUM, ${1:DF}\$YNUM, alternative = "two.sided", conf.level = 0.95, paired = TRUE)

# .hmttestuni: univariate Student's t-test
t.test(${1:DF}\$2:XNUM, alternative = "two.sided", mu = 0, conf.level = 0.95)

Distribution de poisson

Calculez P(\(Y = 2\)) d’une distribution de poisson avec une valeur de lambda de 5.

Des snippets sont mis à votre disposition en fin de question

## Distribution: poisson #########################################################################
# ..i (d)istribution: poisson
#.ip

# .ipcumul: poisson dist. - cumulative dens. plot
plot(0:(${1:MEAN_OCCURENCES}+20), dpois(0:(${1:MEAN_OCCURENCES}+20), lambda = ${1:MEAN_OCCURENCES}), type = "s",
     col = "black", xlab = "Quantiles", ylab = "Cumulative probability")

#.ipdens: poisson dist. - density plot
plot(0:(${1:MEAN_OCCURENCES}+20), dpois(0:(${1:MEAN_OCCURENCES}+20), lambda = ${1:MEAN_OCCURENCES}), type = "h",
     col = "black", xlab = "Quantiles", ylab = "Probability mass")

# .iptable: poisson dist. - table of probabilities
    (.table <- data.frame(occurences = 0:(${1:MEAN_OCCURENCES}+20), probability = dpois(0:(${2:MEAN_OCCURENCES}+20),
        lambda = ${2:MEAN_OCCURENCES})))

# .iprandom: poisson dist. - random
    rpois(${1:<N>}, lambda = ${2:MEAN_OCCURENCES})

# .ipquant: poisson dist. - quantiles
    qpois(${1:PROBABILITIES}, lambda = ${2:MEAN_OCCURENCES}, lower.tail = ${3:TRUE})

# .ipproba: poisson dist. - probabilities
    ppois(${1:QUANTILES}, lambda = ${2:MEAN_OCCURENCES}, lower.tail = ${3:TRUE})


## Distribution: binomial #########################################################################
# ..i (d)istribution: binomial
    .ib

# .ibcumul: binomial dist. - cumulative dens. plot
    plot(0:${1:N_TRIALS}, pbinom(0:${1:N_TRIALS}, size = ${1:N_TRIALS}, prob = ${2:SUCCESS_PROB), type = "s",
        col = "black", xlab = "Quantiles", ylab = "Cumulative probability")

#.ibdens: binomial dist. - density plot
    plot(0:${1:N_TRIALS}, dbinom(0:${1:N_TRIALS}, size = ${1:N_TRIALS}, prob = ${2:SUCCESS_PROB}), type = "h",
        col = "black", xlab = "Quantiles", ylab = "Probability mass")

# .ibtable: binomial dist. - table of probabilities
    (.table <- data.frame(success = 0:${1:N_TRIALS},
        probability = dbinom(0:${1:N_TRIALS}, size = ${1:N_TRIALS}, prob = ${2:SUCCESS_PROB})))

# .ibrandom: binomial dist. - random
    rbinom(${1:N}, size = ${2:N_TRIALS}, prob = ${3:SUCCESS_PROB})

# .ibquant: binomial dist. - quantiles
    qbinom(${1:PROBABILITIES}, size = ${2:N_TRIALS}, prob = ${3:SUCCESS_PROB}, lower.tail = ${4:TRUE})

# .ibproba: binomial dist. - probabilities
    pbinom(${1:QUANTILES}, size = ${2:N_TRIALS}, prob = ${3:SUCCESS_PROB}, lower.tail = ${4:TRUE})

Croissance d’ Ailuropoda melanoleuca (David, 1868)

Plusieurs naissances d’ Ailuropoda melanoleuca (David, 1868) se sont déroulés au sein du centre de recherche et d’élevage du panda géant situé dans la province du Sichuan (Chine). Les scientifiques ont pesé les nouveaux nés et voici les masses en grammes de ces derniers :

117.442 104.566 98.818 116.754 88.41

Les scientifiques souhaitent connaitre la moyenne et l’écart-type pour ces 5 nouveaux individus. Vous avez à votre disposition une zone de code afin de réaliser vos calculs.

Répondez à la question ci-dessous :

Parité au sein d’une entreprise

Une étude est menée au sein d’une entreprise qui s’intéresse à la parité homme-femme en son sein.

Homme Femme 
 2312  2165 

Vous avez à votre disposition une zone de code afin de réaliser vos calculs.

Des snippets sont mis à votre disposition en fin de question

Suite à l’analyse ci-dessus, répondez à la question ci-dessous :

Snippets

## Hypothesis tests ####################################################################################
snippet ...hypothesis tests
    ..h


## Hypothesis tests: Correlation #########################################################################
snippet ..hypothesis tests: correlation
    .hc

snippet .hccorr: correlation test
    cor.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XNUM},
        alternative = "two.sided", method = "pearson")


## Hypothesis tests: Variances #########################################################################
snippet ..hypothesis tests: variances
    .hv

snippet .hvftest: two-variances F-test
    var.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR},
        alternative = "two.sided", conf.level = 0.95)

snippet .hvlevene: Levene test
    car::levene.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR})

snippet .hvbartlett: Bartlett test
    bartlett.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR})


## Hypothesis tests: Proportions #######################################################################
snippet ..hypothesis tests: proportions
    .hp

snippet .hpuni: univariate proportion test
    prop.test(rbind(table(${1:DF}\$XFACTOR)),
        alternative = "two.sided", p = ${3:0.5}, conf.level = 0.95, correct = FALSE)

snippet .hpbi: bivariate proportion test
    prop.test(rbind(table(${1:DF}\$XFACTOR, ${1:DF}\$YFACTOR)),
        alternative = "two.sided", conf.level = 0.95, correct = FALSE)


## Hypothesis tests: Nonparametric ####################################################################
snippet ..hypothesis tests: nonparametric
    .hn

snippet .hnfried: Friedman test
    friedman.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR} | ${4:BLOCK})

snippet .hnkrusmult: Kruskal-Wallis - multiple comparisons [nparcomp]
    summary(kw_comp. <- nparcomp::nparcomp(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}))
    plot(kw_comp.)

snippet .hnkrus: Kruskal-Wallis test
    kruskal.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR})

snippet .hnwilkindep: independent Wilcoxon test
    wilcox.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR},
        alternative = "two.sided", conf.level = 0.95)

snippet .hnwilkpaired: paired Wilcoxon test
    wilcox.test(${1:DF}\$XNUM, ${1:DF}\$YNUM,
        alternative = "two.sided", conf.level = 0.95, paired = TRUE)


## Hypothesis tests: Means #############################################################################
snippet ..hypothesis tests: means
    .hm

snippet .hmanovamult: anova - multiple comparisons [multcomp]
    summary(anovaComp. <- confint(multcomp::glht(anova.,
        linfct = multcomp::mcp(${1:XFACTOR} = "Tukey")))) # Add a second factor if you want
    .oma <- par(oma = c(0, 5.1, 0, 0)); plot(anovaComp.); par(.oma); rm(.oma)

snippet .hmanovaresid: anova - residuals
    residuals(anova.)

snippet .hmanovaqqplot: anova - residuals QQ-plot
    #plot(anova., which = 2)
    anova. %>.%
        chart(broom::augment(.), aes(sample = .std.resid)) +
        geom_qq() +
        geom_qq_line(colour = "darkgray") +
        labs(x = "Theoretical quantiles", y = "Standardized residuals") +
        ggtitle("Normal Q-Q")

snippet .hmanova2nested: two-way ANOVA (nested model)
    anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR} + ${4:BLOCK} %in% ${3:XFACTOR}))

snippet .hmanova2noint: two-way ANOVA (without interactions)
    anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} + ${4:XFACTOR2}))

snippet .hmanova2: two-way ANOVA (complete model)
    anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR1} * ${4:XFACTOR2}))

snippet .hmanova2desc: two-way ANOVA (description)
    ${1:DF} %>.%
        group_by(., ${2:XFACTOR1}, ${3:XFACTOR2}) %>.%
        summarise(., mean = mean(${4:YNUM}), sd = sd(${4:YNUM}), count = sum(!is.na(${4:YNUM})))

snippet .hmanova1: one-way ANOVA
    anova(anova. <- lm(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR}))

snippet .hmanova1desc: one-way ANOVA (description)
    ${1:DF} %>.%
        group_by(., ${2:XFACTOR}) %>.%
        summarise(., mean = mean(${3:YNUM}), sd = sd(${3:YNUM}), count = sum(!is.na(${3:YNUM})))

snippet .hmttestindep: independent Student's t-test
    t.test(data = ${1:DF}, ${2:YNUM} ~ ${3:XFACTOR},
        alternative = "two.sided", conf.level = 0.95, var.equal = TRUE)

snippet .hmttestpaired: paired Student's t-test
    t.test(${1:DF}\$XNUM, ${1:DF}\$YNUM,
        alternative = "two.sided", conf.level = 0.95, paired = TRUE)

snippet .hmttestuni: univariate Student's t-test
    t.test(${1:DF}\$2:XNUM,
        alternative = "two.sided", mu = 0, conf.level = 0.95)


## Hypothesis tests: Distribution #######################################################################
snippet ..hypothesis tests: distribution
    .hd

snippet .hdnorm: Shapiro-Wilk test of normality
    shapiro.test(${1:DF}\$XNUM)


## Hypothesis tests: Contingency #######################################################################
snippet ..hypothesis tests: contingency
    .hc

snippet .hcfisher: Fisher test of independence
    fisher.test(${1:{TABLE})

snippet .hcchi2comp: Chi2 test (components)
    round(chisq.test(${1:TABLE})[["residuals"]]^2, 2)

snippet .hcchi2bi: Chi2 test (independence)
    (chi2. <- chisq.test(${1:TABLE})); cat("Expected frequencies:\n"); chi2.[["expected"]]

snippet .hcchi2uni: Chi2 test (univariate)
    chisq.test(${1:TABLE}, p = ${2:PROBABILITIES}, rescale.p = FALSE)

Conclusion

Vous venez de terminer votre examen.

Laissez nous vos impressions sur cet outil pédagogique ou expérimentez encore dans la zone ci-dessous. Rappelez-vous que pour placer un commentaire dans une zone de code R, vous devez utilisez un dièse (#) devant vos phrases.

# Ajout de commentaires 
# ...
# Not yet...

Traitement des données I

Guyliann Engels & Philippe Grosjean