class: center, middle, inverse, title-slide .title[ # Scientific workflows using R and git ] .author[ ### Sara Mortara & Andrea SΓ‘nchez-Tapia ] .institute[ ### re.green | Β‘liibre! ] .date[ ### 2022-07-07 ] --- <style type="text/css"> .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> ## today: + scientific workflows + good practices for writing scripts + use our .Rproj structure to run a script + generate outputs + `commit` of the day --- ## scientific workflows + reproducibility - for you, collegues, and community + script based tools (`R`, `python`) + version control (`git`) + share methods, and protocols + peer review --- ## our project structure __names__ and __paths__ are essential to a reproducible workflow ```r project/ * βββ .gitignore βββ data/ βββ docs/ βββ figs/ βββ R/ * βββ 02_importing_data.R βββ output/ βββ README.md * βββ .Rproj ``` --- background-image: url(figs/jenny_bryan.jpeg) background-size: 220px 240px background-position: 85% 50% ## Rstudio projects forget `setwd()` and meet [Jenny Bryan](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/) ![](figs/jenny_bryan_wf.png) --- # .Rproj defines the wd ![](figs/rstudio_proj.png) --- ## .Rproj + git = <3 <img src="figs/01_git.png" width="917" /> --- ## .Rproj + git = <3 <img src="figs/02_git.png" width="925" /> --- ## .Rproj + git = <3 <img src="figs/03_git.png" width="1819" /> --- ## .Rproj + git = <3 <img src="figs/04_git.png" width="1232" /> --- class: inverse, middle, center ## workkflow: scripts --- ## writing scripts + Never create a single script with all the analysis + `01_read_and_format_data.R` + `02_diversity_analysis.R` + `03_pca_analysis.R` + `04_simulations.R` `...` -- + Ideally, each script statrs __reading__ a particular input/data and ends __writing__ results -- + the next script can __read raw data__ or __results from previous scriptss__. --- ## example <small> .pull-left[ + `R/01_data_clean.R` ] .pull-right[ + reads __`data/data_raw.csv`__ + writes __`data/data_processed.csv`__ ] -- .pull-left[ + `R/02_diversity_analysis.R` ] .pull-right[ + reads __`data/data_processed.csv`__ + writes __`results/02_diversity.csv`__ __`figs/02_diversity.png`__ ] </small> --- ## example <small> .pull-left[ + `R/03_pca_analysis.R` ] .pull-right[ + reads __`data/data_processed.csv`__ + writes __`figs/03_pca.png`__ ] -- .pull-left[ + `R/04_simulations.R` ] .pull-right[ + reds __`data/dados_processed.csv`__ + saves __`results/04_simulations.rda`__ __`figs/04_simulations.png`__ ] </small> --- ## example if an object is too large, or it takes too much time to process, it can be saved as an R object (__`.rda`__) exemple: __`save(object, "./results/04_simulations.rda")`__ + following scripts can start loading these objects: example: in the script `05_analysing_simulations.R` __`load("results/04_simulacoes.rda")`__ _but never save the workspace!_ --- class: middle, center ## organizing each script .footnote[Prints from swcarpentry.github.io/r-novice-inflammation/06-best-practices-R] --- ## each script + a header containing who, how, when, where, and why __METADATA__ ![](./figs/01meta.png) -- + a part loading all needed packages from the begining with `library()`* <!--library da erro quando hΓ‘ um error enquanto require nãá diz nada--> ![](./figs/02library.png) --- ## each script + reads needed data (__empty workspace__) ![](./figs/01read.png) -- + Coding a variable that will not change -- + Commenting every step -- + Writing in the HD the result from each step --- ## each script + the script must be able to be run in sequence from start to finish. + No repetitions, + No lines out of order + No parentheses or non-closing calls (`png` ---> `dev.off()`) + You should be able to erase the _workspace_ mid-session and rebuild + Do not define functions inside the script. Put the functions in a separate script and folder __`/fct/edit.R`__ and call via __`source()`__. --- ## additional tips + use concise and informative names + __`a <- `__ NO + do not use names already taken: `cor <-` (color) `cor()` `c <-` + __If you copy and paste more than three times it's time to write a loop or a function__ --- ## more tips + https://owi.usgs.gov/blog/intro-best-practices/ + https://swcarpentry.github.io/r-novice-inflammation/06-best-practices-R/ + https://www.r-bloggers.com/r-code-best-practices/ + https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ --- ## Getting back to the project .tiny[ ```r . βββ 2022_scientific_computing_intro.Rproj βββ data β βββ raw *β βββ cestes *β β βββ comm.csv *β β βββ coord.csv *β β βββ envir.csv *β β βββ README.md *β β βββ splist.csv *β β βββ traits.csv β βββ portal_data_joined.csv βββ docs *β βββ scientific_workflows.Rmd βββ figs βββ output βββ R *β βββ 01_intro.R *β βββ 02_importing_data.R βββ README.md ``` ] --- ## Getting back to git CRLF vs LF `git config --global core.autocrlf false` <img src="figs/typewriters.png" width="400" style="display: block; margin: auto;" /> More on this topic [here](https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/#crlf-vs-lf-what-are-line-endings-anyway) --- ## Getting back to git <svg viewBox="0 0 640 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> `git pull origin main` `git add .gitignore` `git add .` `git commit -m "adding project's first strucure"` --- ## Running a script and generating outputs <svg viewBox="0 0 640 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> `git pull origin main` `git add output/02_envir_summary.csv` `git add figs/02_species_abundance.png` `git commit -m "a very informative message about the scripts you're adding"` `git push origin main` --- ## Creating a report <svg viewBox="0 0 640 512" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M255.03 261.65c6.25 6.25 16.38 6.25 22.63 0l11.31-11.31c6.25-6.25 6.25-16.38 0-22.63L253.25 192l35.71-35.72c6.25-6.25 6.25-16.38 0-22.63l-11.31-11.31c-6.25-6.25-16.38-6.25-22.63 0l-58.34 58.34c-6.25 6.25-6.25 16.38 0 22.63l58.35 58.34zm96.01-11.3l11.31 11.31c6.25 6.25 16.38 6.25 22.63 0l58.34-58.34c6.25-6.25 6.25-16.38 0-22.63l-58.34-58.34c-6.25-6.25-16.38-6.25-22.63 0l-11.31 11.31c-6.25 6.25-6.25 16.38 0 22.63L386.75 192l-35.71 35.72c-6.25 6.25-6.25 16.38 0 22.63zM624 416H381.54c-.74 19.81-14.71 32-32.74 32H288c-18.69 0-33.02-17.47-32.77-32H16c-8.8 0-16 7.2-16 16v16c0 35.2 28.8 64 64 64h512c35.2 0 64-28.8 64-64v-16c0-8.8-7.2-16-16-16zM576 48c0-26.4-21.6-48-48-48H112C85.6 0 64 21.6 64 48v336h512V48zm-64 272H128V64h384v256z"></path></svg> - Rmarkdown basic structure run `docs/scientific_workflow.Rmd` --- class: center, middle # Β‘Thanks! <center> <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"></path></svg> [saramortara@gmail.com](mailto:saramortara@gmail.com) | [andreasancheztapia@gmail.com](mailto:andreasancheztapia@gmail.com) <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> [@MortaraSara](https://twitter.com/MortaraSara) | [@SanchezTapiaA](https://twitter.com/SanchezTapiaA) <svg viewBox="0 0 496 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg><svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#A70000;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M105.2 24.9c-3.1-8.9-15.7-8.9-18.9 0L29.8 199.7h132c-.1 0-56.6-174.8-56.6-174.8zM.9 287.7c-2.6 8 .3 16.9 7.1 22l247.9 184-226.2-294zm160.8-88l94.3 294 94.3-294zm349.4 88l-28.8-88-226.3 294 247.9-184c6.9-5.1 9.7-14 7.2-22zM425.7 24.9c-3.1-8.9-15.7-8.9-18.9 0l-56.6 174.8h132z"></path></svg> [saramortara](http://github.com/saramortara) | [andreasancheztapia](http://github.com/andreasancheztapia)