class: titleSlide, hide_logo # Data Visualization ## Common geoms and plot styles to know <br> <center><img src="data:image/png;base64,#logo.png" width="200px"/></center> --- class: left, hide_logo, hide-count ## Setup .pull-left[ **Option 1** * Download and unzip today's materials into `glhlth562/materials` * Find and open the `dataviz3_template.Rmd` file in `materials/dataviz3`. ] .pull-right[ **Option 2** * Pull updates from [github](https://github.com/ericpgreen/glhlth562) (assumed you cloned this repo previously) * Find and open the `dataviz3_template.Rmd` file in `materials/dataviz3`. ] --- class: left ### Palmer penguins The `palmerpenguins` data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica. .pull-left[ <img src="data:image/png;base64,#img/palmerpenguins.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#img/penguins.png" width="100%" style="display: block; margin: auto;" /> ] --- class: left, hide-count ### Histograms and density plots .panelset[ .panel[.panel-name[Code] ```r # see chunk 'histogram' ggplot(data = penguins, aes(x = flipper_length_mm)) + geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") + scale_fill_manual(values = c("darkorange","purple","cyan4")) + theme_minimal() + labs(x = "Flipper length (mm)", y = "Frequency", title = "Penguin flipper lengths") ``` ] .panel[.panel-name[Plot] <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/histogram_plot-1.png" width="70%" /> ] .panel[.panel-name[Your Turn] * Adjust alpha * Change the colors in `scale_fill_manual()` (see what colors are known to R by running `colors()` in your Console...then try different hex colors * Replace `scale_fill_manual(values = c("darkorange","purple","cyan4"))` with `scale_fill_viridis_d()` (first run `install.packages("viridis")`) * Change `geom_histogram()` to `geom_density` * Move the legend to the bottom with `theme(legend.position = "something")` ] ]
03
:
00
--- class: left, hide-count ### Violin plots .panelset[ .panel[.panel-name[Code] ```r # see chunk 'violin' ggplot(data = penguins, aes(x=species, y = flipper_length_mm, fill=species)) + geom_violin() + scale_fill_manual(values = c("darkorange","purple","cyan4")) + theme_minimal() + labs(y = "Flipper length (mm)", x = NULL, title = "Penguin flipper lengths") + theme(legend.position = "none") ``` ] .panel[.panel-name[Plot] <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/violin_plot-1.png" width="70%" /> ] .panel[.panel-name[Your Turn] * Add scale = "count" inside `geom_violin()`...what does it change? * Add `adjust = 0.5` inside `geom_violin()` and try different values * Add points with `geom_jitter()` and adjust with arguments `height` and `width` * Add a boxplot with `geom_boxplot(width=0.1, color="grey", alpha=0.2)` * Flip the coordinates with `coord_flip()` ] ]
03
:
00
--- class: left ### Raincloud plots <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/rain-1.png" width="100%" /> --- class: left, hide-count ### Bar plots .panelset[ .panel[.panel-name[Code `geom_bar()`] ```r # see chunk 'bar' ggplot(penguins, aes(x = species)) + geom_bar(alpha = 0.8) + theme_minimal() ``` ] .panel[.panel-name[Plot `geom_bar()`] <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/bar_plot-1.png" width="70%" /> ] .panel[.panel-name[Your Turn] * Reorder these 3 bars by setting `x` equal to `forcats::fct_infreq(species)` * Remove the axis titles and provide an overall plot title * Color the bars by species * Try adding `+ coord_flip()` * Remove `+ coord_flip()` and map `species` directly to `y` instead of `x` ] ]
03
:
00
--- class: left, hide-count ### Bar plots: under the hood `?geom_bar` > There are two types of bar charts: `geom_bar()` and `geom_col()`. `geom_bar()` makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use `geom_col()` instead. `geom_bar()` uses stat_count() by default: it counts the number of cases at each x position. `geom_col()` uses `stat_identity()`: it leaves the data as is. --- class: left, hide-count ### Bar plots: count by default > `geom_bar()` uses stat_count() by default: it counts the number of cases at each x position. .pull-left[ ```r ggplot(penguins, aes(x = fct_infreq(species))) + geom_bar() + theme_minimal() ``` <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/bar_plot_default_l-1.png" width="90%" /> ] .pull-right[ ```r penguins %>% group_by(species) %>% count() %>% arrange(desc(n)) ``` ``` ## # A tibble: 3 × 2 ## # Groups: species [3] ## species n ## <fct> <int> ## 1 Adelie 152 ## 2 Gentoo 124 ## 3 Chinstrap 68 ``` ] --- class: left, hide-count ### Bar plots: sum of `weight` instead of counts > `geom_bar()` makes the height of the bar proportional to the number of cases in each group **(or if the weight aesthetic is supplied, the sum of the weights)** .pull-left[ ```r ggplot(penguins, aes(x = species)) + geom_bar(aes(weight = flipper_length_mm)) + theme_minimal() ``` <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/bar_plot_weight_l-1.png" width="90%" /> ] .pull-right[ ```r penguins %>% group_by(species) %>% summarize(total_mm = sum(flipper_length_mm, na.rm=TRUE)) ``` ``` ## # A tibble: 3 × 2 ## species total_mm ## <fct> <int> ## 1 Adelie 28683 ## 2 Chinstrap 13316 ## 3 Gentoo 26714 ``` ] --- class: left, hide-count ### Bar plots: plot data as is with `geom_col()` > If you want the heights of the bars to represent values in the data, use `geom_col()` instead...`geom_col()` uses `stat_identity()` .pull-left[ ```r penguins %>% group_by(species) %>% summarize(flip_m = mean(flipper_length_mm, na.rm=TRUE)) %>% ggplot(aes(x = species, y = flip_m)) + geom_col() + theme_minimal() ``` <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/bar_plot_col_l-1.png" width="90%" /> ] .pull-right[ ```r penguins %>% group_by(species) %>% summarize(flip_m = mean(flipper_length_mm, na.rm=TRUE)) ``` ``` ## # A tibble: 3 × 2 ## species flip_m ## <fct> <dbl> ## 1 Adelie 190. ## 2 Chinstrap 196. ## 3 Gentoo 217. ``` ] --- class: left, hide-count ### Dot plots .panelset[ .panel[.panel-name[Code] ```r # see chunk 'dot' penguins %>% remove_missing() %>% group_by(species) %>% summarise(mean_bmg = mean(body_mass_g)) %>% ggplot() + geom_segment(aes(x = 0, xend = mean_bmg, y = reorder(species, mean_bmg), yend = reorder(species, mean_bmg)), color = "grey", size = 2) + geom_point(aes(y = reorder(species, mean_bmg), x = mean_bmg), size = 5, color = "darkorange") + labs(x = NULL, y = NULL, title = "Gentoos are big birds", subtitle = "Average body mass (g) by species") + theme_minimal() + theme(plot.title = element_text(face="bold"), plot.title.position = "plot") ``` ] .panel[.panel-name[Plot] <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/dot_plot-1.png" width="70%" /> ] .panel[.panel-name[Your Turn] * Change from mean body mass to mean bill length * Change from grouping by species to by sex * Update all labels * Change the dot colors ] ]
03
:
00
--- class: newTopicSub, hide_logo # Scatterplots and smoothing --- class: left, hide-count ### Scatterplots and smoothing .panelset[ .panel[.panel-name[Code] ```r # see chunk 'scatter' ggplot(data = penguins, aes(x = flipper_length_mm, y = bill_length_mm)) + geom_point(aes(color = species), size = 3, alpha = 0.8) + theme_minimal() + scale_color_manual(values = c("darkorange","purple","cyan4")) + labs(title = "Flipper and bill length", subtitle = "Dimensions for Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER", x = "Flipper length (mm)", y = "Bill length (mm)", color = "Penguin species", caption = "A caption!") + theme(legend.position = "bottom", plot.title.position = "plot", plot.caption = element_text(hjust = 0, face= "italic"), plot.caption.position = "plot") ``` ] .panel[.panel-name[Plot] <img src="data:image/png;base64,#dataviz3_deck_files/figure-html/scatter_plot-1.png" width="70%" /> ] .panel[.panel-name[Your Turn] * Add `color = species` to the `aes()` mapping (and add `shape = "Penguin species"` to `labs()`...what effect does it have?) * Add another geom: `geom_smooth(method = "lm", se = FALSE, aes(color = species))` * Change the method in `geom_smooth()` to "loess" and try different values for `span = something` within this function * Make small multiples for species by adding `facet_wrap(~something)` AND change the color mapping to `sex` (you will get an error for the number of colors, and you'll need to update the labels) ] ]
03
:
00
--- class: left # Credits Deck by Eric Green ([@ericpgreen](https://twitter.com/ericpgreen)), licensed under Creative Commons Attribution [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) * {[`xaringan`](https://github.com/yihui/xaringan)} for slides with help from {[`xaringanExtra`](https://github.com/gadenbuie/xaringanExtra)} * Allison Horst, [Palmer penguins artwork and vignettes](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) * [Cédric Scherer](https://z3tt.github.io/OutlierConf2021/) shows off a bit with a custom raincloud plot