Last updated: 2022-01-20
Checks: 6 1
Knit directory: Barley1kGBS_Proj/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you'll want to first commit it to the Git repo. If you're still working on the analysis, you can ignore this warning. When you're finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.
The command set.seed(20210831)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b34d099. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/GenomicVarPartitioning_cache/
Untracked files:
Untracked: VennDiagram2022-01-20_09-51-27.log
Unstaged changes:
Modified: analysis/GenomeEnvAssociation.Rmd
Modified: analysis/PreprocessEnvData.Rmd
Modified: analysis/ResistanceSurface.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ResistanceSurface.Rmd
) and HTML (public/ResistanceSurface.html
) files. If you've configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 5e688f9 | Che-Wei Chang | 2022-01-18 | Build site. |
html | b38dcf2 | Che-Wei Chang | 2022-01-18 | Build site. |
html | 5f53139 | Che-Wei Chang | 2022-01-18 | Build site. |
Rmd | 597a627 | Che-Wei Chang | 2021-11-30 | update analyses. unPC, ResistanceGA and GO enrichment |
html | 597a627 | Che-Wei Chang | 2021-11-30 | update analyses. unPC, ResistanceGA and GO enrichment |
In this analysis, we aim to answer that whether the resistance of geography can explain genetic differentiation better than geographical distance. To do so, we follow the framework of Peterman et al. (2014). This analysis can be conducted by ResistanceGA
package (https://github.com/wpeterman/ResistanceGA).
ResistanceGA
is used to optimize the transformation between landscape features and resistance surfaces. In brief, it optimizes the parameters of the transformation equation with the genetic algorithm which takes genetic distances and resistance distances as response variables and independent variables respectively. After optimization, it takes transformed values and genetic distances to fit a linear model in order to evalute relative support of resistance surfaces.
To begin with, we need the genetic distances between accessions from pairs of collection sites.
source("./code/Rfunctions.R")
library(vcfR)
library(adegenet)
vcf <- read.vcfR("./data/GBS_244accessions_LDpruned_19601SNP.vcf.gz", verbose = F) # load VCF
grp.index <- sapply(strsplit(colnames(vcf@gt[,-1]), split = "_"), function(x){x[1]}) # get subpops
gind <- vcfR2genind(vcf) # convert VCF to genind
pop(gind) <- as.factor(grp.index)
Dps <- cal.pairwise.Dps(gi = gind)
saveRDS(Dps, "./output/Dps_wild_barley.RDS")
To run ResistanceGA
, we also need geographical coordinates.
env <- read.table("./data/RawEnv_244accessions.tsv", header = T)
Dps <- readRDS("./output/Dps_wild_barley.RDS")
# get the central coordinates of 58 groups (collection sites) -> x in the 1st col and y in the 2nd col
centroid <-
data.frame(
Lon = tapply(env$Longitude, INDEX = as.factor(env$GeoGroup), mean),
Lat = tapply(env$Latitude, INDEX = as.factor(env$GeoGroup), mean)
)
# sorting the coordinates to match the genetic distance matrix
centroid <- centroid[match(colnames(as.matrix(Dps)) ,rownames(centroid)),]
write.table(centroid, file = "./output/Geocoord_pop.csv", sep = ",",
col.names = TRUE, row.names = TRUE)
To calculate resistance distances, we considered three landscape features:
To produce a resistance surface of geography, we take the elevation data (.tif
) from the SRTM database (https://srtm.csi.cgiar.org/).
ISR_SRTM_GeoElevation.tif
Slopes were calculate based on the elevation data by using terrain
function in the raster
.
The data of water body is from Global Surface Water(Pekel et al. 2016; doi:10.1038/nature20584).
water <- raster("./data/occurrence_30E_40N_v1_1.tif")
water <- crop(x = water, y = elev@extent) # match raster
writeRaster(water, filename = "./output/water_body_SLevant.tif")
ResistanceGA
To evaluate geographical resistance, we would need Circuitscape(McRae et al.) to identify paths with the least resistance and calculate corresponding resistance distances. The Circuitscape
package in Julia (https://docs.circuitscape.org/Circuitscape.jl/latest/) was published recently and it is used by ResistanceGA
. Considering the considerable computational resources may required, we won't run ResistanceGA
on a personal computer.
## IMPORTANT NOTE:
# Because our server has only C++11, some errors happened when I tried to install "ResistanceGA". Specifically, the latest "ResistanceGA" (ver 4.2) requires the latest "GA" package (>= 3.2.1) that depends on the latest "Rcpp" requiring C++14. However, "Rcpp" and other R packages installed on our server are all depend on C++11. To avoid messing up everything, a less risky solution is to force "ResistanceGA" to accept an old version of "GA" (ver 3.2; published in 2019 Jan).
# This approach should not influence the function of "ResistanceGA" because it has been published in 2018 and we let it depend on "GA" published in 2019.
# To do so, I download .zip from the github of "ResistanceGA" and modify the `DESCRIPTION` file. Then, I build the R package locally with the command below:
#devtools::build(path = "~/barley/temp_ResistGA/", pkg = "~/barley/temp_ResistGA/ResistanceGA", vignettes = FALSE) -> make a local package that has a requirement of GA with only >=3.0.2
# After building the package, I upload `ResistanceGA_4.2.tar.gz` to our server and install it. You can find the command used to install the package below
## Install packages on our server
install.packages("devtools", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
https://cran.r-project.org/src/contrib/Archive/spdep/spdep_1.0-2.tar.gz
install.packages("https://cran.r-project.org/src/contrib/Archive/terra/terra_0.5-2.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/raster/raster_2.7-15.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/GA/GA_3.2.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/") # install an old version -> we don't have CXX14 and I don't want to bother to install local g++...
install.packages("https://cran.r-project.org/src/contrib/Archive/nloptr/nloptr_1.0.4.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/lme4/lme4_1.1-14.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/htmltools/htmltools_0.5.2.tar.gz", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("sass", lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/rgdal/rgdal_1.2-8.tar.gz",
lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
install.packages("https://cran.r-project.org/src/contrib/Archive/JuliaCall/JuliaCall_0.17.1.tar.gz",repos = NULL,
lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
.libPaths( c( "/work/workspaces/chewei/bin/R/R-3.5.2/library/" , .libPaths() ) )
library(withr)
library(devtools)
## install ResistanceGA that accepts a lower version of GA
install.packages("ResistanceGA_4.2.tar.gz", repos = NULL, type="source",
lib="/work/workspaces/chewei/bin/R/R-3.5.2/library/")
ResistanceGA
is able to take multiple surfaces to form a composite resistance surface. We would take four geographic data (elevation, slope, and water body) as inputs to generate such a composite resistance surface and calculate resistance distances of geography accordingly. The codes below are all executed on our server.
Because the ResistanceGA
with Circuitscape
would run into error with unknown reasons on our server, we decided to use the communte distance method, which is an alternative method provided in ResistanceGA
package.
raster
objects by using aggregate
.all_comb
function was used to test all possible combination of given landscape features (elevation, slope and water body)Resist.boot
was used to perform bootstrap analyses.# set working directory
setwd("/work/workspaces/chewei/Formal_analysis_B1K+IL+IPK/ResistGA/")
# path to local R library
.libPaths( c( "/work/workspaces/chewei/bin/R/R-3.5.2/library/" , .libPaths() ) )
# load packages
library(ResistanceGA)
library(raster)
gen.dist <- readRDS("Dps_wild_barley.RDS")
geo.coord <- read.csv("Geocoord_pop.csv", header = TRUE)
elev <- raster("ISR_SRTM_GeoElevation.tif")
slope <- raster("slope_SLevant.tif")
water <- raster("water_body_SLevant.tif")
# the extent of water body tif is slightly smaller than others, so we use `crop` to make all tif objects have the same extent
elev <- crop(x = elev, y = extent(water))
slope <- crop(x = slope, y = extent(water))
# because the resolution of water body tif is different from others', we use `projectRaster` to reduce its resolution.
water <- projectRaster(from = water, to = elev)
values(water)[values(water) > 0] <- 255 # force all water body become a single category
# to reduce the required compuational resources, we use `aggregate` to reduce the resolution of tif
f = 20
elev <- aggregate(elev, fact = f, fun = mean)
slope <- aggregate(slope, fact = f, fun = mean)
water <- aggregate(water, fact = f, fun = mean)
values(water)[values(water) > 255/2] <- 255
values(water)[values(water) <= 255/2] <- 0
values(water)[is.na(values(water))] <- NA
# combine three raster into one
r.stack <- stack(elev, slope, water)
dir.create("./all.comb") # create a folder to save outputs
GA.input <- GA.prep(ASCII.dir = r.stack,
Results.dir = "all.comb",
parallel = 48,
select.trans = list(8, # Inverse Ricker transformation
1, # Inverse-Reverse Monomolecular transformation
NA) # no transformation for categorical data
)
gdist.input <- gdist.prep(n.Pops = nrow(geo.coord),
response = as.vector(gen.dist),
samples = SpatialPoints(geo.coord)
)
MS.output <- all_comb(gdist.inputs = gdist.input,
GA.inputs = GA.input,
results.dir = "/work/workspaces/chewei/Formal_analysis_B1K+IL+IPK/ResistGA/all.comb/")
saveRDS(MS.output, "ResistGA_gdist_wildbarley_geo_output.RDS")
save(GA.input, gdist.input,file = "./ResistGA_wildbarley_geo_input.Rdata")
MS.output <-readRDS("ResistGA_gdist_wildbarley_geo_output.RDS")
resist.mat <- MS.output$all.cd
k <- MS.output$all.k$k
resp <- as.matrix(gen.dist)
colnames(resp) <- rownames(resp) <- NULL
library(plyr) # required for bootstrap function
# bootstrapping with model ranking based on R^2
R2.boot <- Resist.boot(mod.names = names(resist.mat),
dist.mat = resist.mat,
n.parameters = k,
sample.prop = 0.75,
iters = 1000,
rank.method = "R2",
obs = nrow(resp),
genetic.mat = resp)
write.csv(R2.boot, file = "ResistGA_gdist_wild_barley_geo_bootstrap_output.csv",
row.names = F)
quit("no")
Here are some notes regarding the errors we confronted when running ResistanceGA
. We hope the information below would be helpful for someone who is also interested in using ResistanceGA
.
libstdc++.so.6
When running ResistanceGA
with Circuitscape in Julia, an error happened:
/lib64/libstdc++.so.6: version
GLIBCXX_3.4.21' not found`
This error is likely due to that the functions of JuliaCall
cannot find the needed libstdc++ version. A solution is to export R_LD_LIBRARY_PATH
in bash
(see https://github.com/Non-Contradiction/JuliaCall). For example:
export R_LD_LIBRARY_PATH=/work/workspaces/chewei/julia/julia-1.6.3/lib/julia
If you install the Circuitscape.jl
of version <5.8.3, you may run into an error:
MethodError: no method matching connected_components(::SimpleWeightedGraphs.SimpleWeightedGraph{Int32,Float64})
This is due to a bug in Circuitscape.jl
generating when it moved from LightGraphs
to Graphs
. There are some Julia codes do not adapt to this change in the version <5.8.3.
To solve this issue, you need to install the latest version. It can be achieve by sepcifying the package version, like:
Pkg.add(Pkg.PackageSpec(;name="Circuitscape", version="5.9.1"))
and don't forget to test the package to ensure there is no error (run Pkg.test("Circuitscape")
).
Apart from this, you may get errors if there is any required Julia
package is not installed or not with an appropriate version. In this case, you need to respectively look at the error messages and install each package with Pkg.add()
in Julia
(don't forget to test packages with Pkg.test
after installation).
When I ran my analysis, there was an error always happen whenever I tried to conduct parallel compuation:
Error in unserialize(socklist[[n]]) : error reading from connection
I used the example data from the ResistanceGA
package to do some tests. It seems this error only happen if I try to run ResistanceGA
with Circuitscape.jl
in parallel. Therefore, I switch to using communte distance, which is another approach available in ResistanceGA
.
When performing parallel computation, users may confront error messages as below:
Error in unserialize(socklist[[n]]) : error reading from connection Calls: all_comb ... recvOneData -> recvOneData.SOCKcluster -> unserialize
This error could result from the shortage of memory if input data (raster
) have high resolution. If one want to carry out parallel computation, decreasing the resolution of raster
is needed. Therefore, we used aggregate
function of the raster
package.
ResistanceGA
The final output of ResistanceGA
framework includes a summary table that displays statistics to evaluate model fit, such as AIC or R2.
I used all_comb
function to let ResistanceGA
carry out optimization for all combinaiton of input landscape features. To further compare the model fit, Resist.boot
function was used to conduct bootstrapping with 1,000 iterations. I considered AIC and R2 as the criteria for ranking models.
In the tables, \(n\) is the iteration number that a model is suggested to have the best fit. We can find that a model with the resistance surface generated by combining elevation and slope is the best if considering R2 as a selection criterion. If AIC is used as a selection criterion, a model taking water bodies to calculate resistance distances is suggested to be the best model.
In the tables below, the leftmost column shows resistance surfaces used in linear mixed effect models. "avg.AIC" is the average Akaike information criterion. "avg.\(R_m^2\)" is the average marginal \(R^2\). "avg.LL" is the average log-likelihood. "n" and "Percent.top" is the number and percentage of iterations that a model is selected as the best-supported model. "k" is the number of parameters fitted in a model.
surface | avg.AIC | avg.R2m | avg.LL | n | Percent.top | k |
---|---|---|---|---|---|---|
Elev.Slope | -5725.286 | 0.5083 | 2869.643 | 1000 | 100 | 7 |
Elev.Slope.Water | -5734.975 | 0.4414 | 2876.487 | 0 | 0 | 9 |
Slope.Water | -5736.288 | 0.3311 | 2874.144 | 0 | 0 | 6 |
Elev | -5741.162 | 0.3159 | 2874.581 | 0 | 0 | 4 |
Water | -5783.527 | 0.3042 | 2894.764 | 0 | 0 | 3 |
Elev.Water | -5739.697 | 0.3010 | 2875.849 | 0 | 0 | 6 |
Distance | -5753.420 | 0.2113 | 2878.710 | 0 | 0 | 2 |
Slope | -5730.958 | 0.2074 | 2869.479 | 0 | 0 | 4 |
surface | avg.AIC | avg.R2m | avg.LL | n | Percent.top | k |
---|---|---|---|---|---|---|
Water | -5783.397 | 0.3055 | 2894.698 | 968 | 96.8 | 3 |
Distance | -5752.230 | 0.2115 | 2878.115 | 9 | 0.9 | 2 |
Elev | -5739.792 | 0.3169 | 2873.896 | 7 | 0.7 | 4 |
Elev.Water | -5738.073 | 0.3014 | 2875.036 | 12 | 1.2 | 6 |
Elev.Slope.Water | -5732.737 | 0.4412 | 2875.369 | 0 | 0.0 | 9 |
Slope.Water | -5733.853 | 0.3317 | 2872.926 | 4 | 0.4 | 6 |
Slope | -5729.806 | 0.2077 | 2868.903 | 0 | 0.0 | 4 |
Elev.Slope | -5723.318 | 0.5080 | 2868.659 | 0 | 0.0 | 7 |
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.36 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 whisker_0.4 magrittr_2.0.1 R6_2.5.1
[5] rlang_0.4.12 fastmap_1.1.0 fansi_0.5.0 highr_0.9
[9] stringr_1.4.0 tools_3.6.0 xfun_0.27 utf8_1.2.2
[13] git2r_0.26.1 htmltools_0.5.2 ellipsis_0.3.2 rprojroot_1.3-2
[17] yaml_2.2.1 digest_0.6.28 tibble_3.1.5 lifecycle_1.0.1
[21] crayon_1.4.1 later_1.3.0 vctrs_0.3.8 promises_1.2.0.1
[25] fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.3
[29] stringi_1.7.5 compiler_3.6.0 pillar_1.6.4 backports_1.1.5
[33] httpuv_1.6.3 pkgconfig_2.0.3