I run this at the start of every session as a base.
n.b. Additional packages are imported in as needed.
##### Set libPaths and memory/parallel cores usage #####.libPaths(c("/home/chongmorrison/R/x86_64-pc-linux-gnu-library/4.3","/home/chongmorrison/R/4.4.1-Bioc3.19")).libPaths() # check .libPathsoptions(Ncpus =12) # adjust no. of cores for base RgetOption("Ncpus", 1L)library(future)options(future.globals.maxSize =2000*1024^2) # adjust limit of allowable object size = 2G# *could* enable parallelisation i.e. workers > 1, for Seurat etc. but breaks Python processes currently...future::plan("multisession", workers =1) # or use "sequential" modefuture::plan()library(BiocParallel) # for parallelising Bioconductor packages##### Import analysis packages ###### Potential bug: Python environment needs to be activated before loading Seuratlibrary(reticulate)conda_list() # check available conda environmentsuse_condaenv("singlecell-scHPF", required=TRUE) # has leidenalg etc. installed# R packageslibrary(Seurat)library(tidyverse)library(dittoSeq)library(SingleCellExperiment)library(grateful)# set seed for reproducibility of random samplingset.seed(584)
2 Ensembl annotations
Useful to have gene annotations in CSV format at hand, which can be called into the analysis anytime.
library(AnnotationHub)library(ensembldb)# Connect to AnnotationHubah <-AnnotationHub()# Access the Ensembl database for organismahDb <-query(ah, pattern =c("Danio rerio", "EnsDb"), ignore.case =TRUE)# Acquire the latest annotation filesid <- ahDb %>%mcols() %>%rownames() %>%tail(n =1)# Download the appropriate Ensembldb databaseedb <- ah[[id]]# Extract gene-level information from databaseannotations <-genes(edb, return.type ="data.frame")# Select annotations of interestannotations <- annotations %>% dplyr::select(gene_id, gene_name, seq_name, gene_biotype, description)# Save for later usewrite.csv(annotations, file="./annotations/ensembl_annotations.csv")
3 Other databases
4 Zebrafish Information Network (ZFIN)
4.1 Gene expression data
ZFIN has an excellent curation of in vivo expression patterns obtained via WISH (Whole-mount in situ hybridisation). As an example, the following code retrieves WISH data for Wild Type condition from https://zfin.org/downloads (‘Gene Expression’ > ‘Expression data for wildtype fish’).
gex <-read.delim(url("https://zfin.org/downloads/wildtype-expression_fish.txt"), header =FALSE, sep ="\t")head(gex, 5)# Add column IDs (based on Column Headers in the Downloads page above)colnames(gex) <-c("GeneID", "GeneSymbol","FishName","SuperStructureID","SuperStructureName","SubStructureID","SubStructureName","StartStage","EndStage","Assay","AssayMMOID","PublicationID","ProbeID","AntibodyID","FishID")
Here, the data is filtered to only include information-of-interest e.g. Structure i.e. anatomical information.
ZFIN_human <-read.delim(url("https://zfin.org/downloads/human_orthos.txt"), header =FALSE, sep ="\t")head(ZFIN_human, 5)# retrieve fish and human orthologuesZFIN_human <-unique(data.frame(ZFIN_human$V2, ZFIN_human$V4))# Add column IDscolnames(ZFIN_human) <-c("zf_gene","human_gene")
5 References
Mary Piper, Meeta Mistry, Jihe Liu, William Gammerdinger, & Radhika Khetani. (2022, January 6). hbctraining/scRNA-seq_online: scRNA-seq Lessons from HCBC (first release). Zenodo. https://doi.org/10.5281/zenodo.5826256.
Peter W Harrison, M Ridwan Amode, Olanrewaju Austine-Orimoloye, Andrey G Azov, Matthieu Barba, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Sanjay Boddu, Paulo R Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Lahcen I Campbell, Manuel Carbajo Martinez, Mehrnaz Charkhchi, Kapeel Chougule, Alexander Cockburn, Claire Davidson, Nishadi H De Silva, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Dionysios Grigoriadis, Gurpreet S Ghattaoraya, Jose Gonzalez Martinez, Tatiana A Gurbich, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Disha Lodha, Diego Marques-Coelho, Gareth Maslen, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Piližota, Daniel Poppleton, Irina Prosovetskaia, Shriya Raj, José G Pérez-Silva, Ahamed Imran Abdul Salam, Shradha Saraf, Nuno Saraiva-Agostinho, Dan Sheppard, Swati Sinha, Botond Sipos, Vasily Sitnik, William Stark, Emily Steed, Marie-Marthe Suner, Likhitha Surapaneni, Kyösti Sutinen, Francesca Floriana Tricomi, David Urbina-Gómez, Andres Veidenberg, Thomas A Walsh, Doreen Ware, Elizabeth Wass, Natalie L Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R Ilsley, Jon Keatley, Jane E Loveland, Benjamin Moore, Jonathan M Mudge, Guy Naamati, John Tate, Stephen J Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E Hunt, Fiona Cunningham, Sarah Dyer, Robert D Finn, Fergal J Martin, and Andrew D Yates. (Ensembl 2024). Nucleic Acids Res. 2024, 52(D1):D891–D899. 10.1093/nar/gkad1049
Bradford, Y.M., Van Slyke, C.E., Ruzicka, L., Singer, A., Eagle, A., Fashena, D., Howe, D.G., Frazer, K., Martin, R., Paddock, H., Pich, C., Ramachandran, S., Westerfield, M. (2022) Zebrafish Information Network, the knowledgebase for Danio rerio research. Genetics. 220(4). 10.1093/genetics/iyac016
6 Session Info
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/London
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.2 fastmap_1.2.0 cli_3.6.3 tools_4.4.2
[5] htmltools_0.5.8.1 yaml_2.3.10 rmarkdown_2.29 knitr_1.49
[9] jsonlite_1.8.9 xfun_0.49 digest_0.6.37 rlang_1.1.4
[13] evaluate_1.0.1