Last updated: 2025-10-06
Checks: 7 0
Knit directory: DXR_continue/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20250701) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 2b33697. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/Cormotif_data/
Ignored: data/DER_data/
Ignored: data/Other_paper_data/
Ignored: data/TE_annotation/
Ignored: data/alignment_summary.txt
Ignored: data/all_peak_final_dataframe.txt
Ignored: data/cell_line_info_.tsv
Ignored: data/full_summary_QC_metrics.txt
Ignored: data/motif_lists/
Ignored: data/number_frag_peaks_summary.txt
Untracked files:
Untracked: analysis/Top2a_Top2b_expression.Rmd
Untracked: analysis/chromHMM.Rmd
Untracked: analysis/maps_and_plots.Rmd
Untracked: analysis/proteomics.Rmd
Untracked: analysis/summit_files_processing.Rmd
Untracked: code/making_analysis_file_summary.R
Unstaged changes:
Modified: analysis/Outlier_removal.Rmd
Modified: analysis/final_analysis.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/repeatmasker_data.Rmd) and
HTML (docs/repeatmasker_data.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 2b33697 | reneeisnowhere | 2025-10-06 | adding in millidiv/1e-9 |
| html | f5c590c | reneeisnowhere | 2025-10-06 | Build site. |
| Rmd | 95365f1 | reneeisnowhere | 2025-10-06 | adding in millidiv/2.2e-9 |
| html | 1f6e044 | reneeisnowhere | 2025-10-03 | Build site. |
| Rmd | d48317b | reneeisnowhere | 2025-10-03 | adding in repeatmasker |
| Rmd | 9d77dac | reneeisnowhere | 2025-09-24 | end of day |
| html | 1b0e001 | reneeisnowhere | 2025-09-23 | Build site. |
| Rmd | 2607095 | reneeisnowhere | 2025-09-23 | changing to parts per thousand =milliDiv |
| html | abd595b | reneeisnowhere | 2025-09-23 | Build site. |
| Rmd | c5fd708 | reneeisnowhere | 2025-09-23 | wflow_publish("analysis/repeatmasker_data.Rmd") |
| html | cf939c4 | reneeisnowhere | 2025-09-23 | Build site. |
| Rmd | 6b65c26 | reneeisnowhere | 2025-09-23 | wflow_publish("analysis/repeatmasker_data.Rmd") |
| html | fe42ebc | reneeisnowhere | 2025-09-16 | Build site. |
| Rmd | 6e58d85 | reneeisnowhere | 2025-09-16 | first setup |
| html | 967b7a8 | reneeisnowhere | 2025-09-16 | Build site. |
| Rmd | 48f3b75 | reneeisnowhere | 2025-09-16 | first setup |
| Rmd | 06d2af1 | reneeisnowhere | 2025-09-16 | wflow_git_commit("analysis/repeatmasker_data.Rmd") |
libraries
library(tidyverse)
library(DT)
library(genomation)
library(GenomicRanges)
library(BiocParallel)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(ChIPseeker)
library(plyranges)
library(ggsignif)
library(ggpubr)
Loading data files
repeatmasker <- read_delim("data/Other_paper_data/repeatmasker_20250911.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE)
peakAnnoList_H3K27ac <- readRDS("data/motif_lists/H3K27ac_annotated_peaks.RDS")
peakAnnoList_H3K27me3 <- readRDS("data/motif_lists/H3K27me3_annotated_peaks.RDS")
peakAnnoList_H3K36me3 <- readRDS("data/motif_lists/H3K36me3_annotated_peaks.RDS")
peakAnnoList_H3K9me3 <- readRDS("data/motif_lists/H3K9me3_annotated_peaks.RDS")
rmskr_std_gr <- repeatmasker %>%
makeGRangesFromDataFrame(., seqnames="genoName", keep.extra.columns=TRUE,
start.field="genoStart",
end.field="genoEnd",
starts.in.df.are.0based=TRUE) %>%
keepStandardChromosomes(., pruning.mode = "coarse")
##checking chromosomes
#seqlevels(rmskr_std_gr)
###making granges lists out of motifs
H3K27ac_sets_gr <- lapply(peakAnnoList_H3K27ac, function(df) {
as_granges(df)
})
H3K27me3_sets_gr <- lapply(peakAnnoList_H3K27me3, function(df) {
as_granges(df)
})
H3K36me3_sets_gr <- lapply(peakAnnoList_H3K36me3, function(df) {
as_granges(df)
})
H3K9me3_sets_gr <- lapply(peakAnnoList_H3K9me3, function(df) {
as_granges(df)
})
## this is now taking the overlaps and widths of the TE and roi, then calculating percentage overlaps for each (pct_te_overlap and pct_roi_overlap)
# H3K27ac_ols <- lapply(H3K27ac_sets_gr, function(df) {
# overlaps <- join_overlap_inner(df, rmskr_std_gr)
# # overlap base pairs
# })
H3K27ac_ols <- lapply(H3K27ac_sets_gr, function(df) {
# 1️⃣ find overlaps
hits <- findOverlaps(df, rmskr_std_gr)
# 2️⃣ intersect the overlapping ranges
overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
# 3️⃣ convert to data frame
overlaps_df <- as.data.frame(overlaps_gr)
# 4️⃣ add overlap length
overlaps_df$overlap_bp <- width(overlaps_gr)
# 5️⃣ ROI lengths & % coverage
roi_widths <- width(df)
names(roi_widths) <- df$Peakid
overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
# 6️⃣ TE lengths & % coverage
te_widths <- width(rmskr_std_gr)
overlaps_df$te_len <- te_widths[subjectHits(hits)]
overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
# 7️⃣ Add TE metadata from rmsk GRanges
overlaps_df$repName <- rmskr_std_gr$repName[subjectHits(hits)]
overlaps_df$repClass <- rmskr_std_gr$repClass[subjectHits(hits)]
overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
overlaps_df
})
rmskr_nonoverlapping_H3K27ac_gr <- rmskr_std_gr[countOverlaps(rmskr_std_gr, H3K27ac_sets_gr[[4]]) == 0]
H3K27ac_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 8 × 2
n nn
<int> <int>
1 1 40854
2 2 30144
3 3 12604
4 4 4120
5 5 1127
6 6 171
7 7 5
8 8 1
H3K27ac_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 89,026 × 2
Peakid repClass
<chr> <chr>
1 chr10:100005111-100005718 LINE;SINE
2 chr10:100006009-100007728 SINE;LINE
3 chr10:100008216-100011014 SINE;Simple_repeat;Unknown;LINE
4 chr10:100043350-100044800 SINE;DNA;Low_complexity
5 chr10:100046129-100046853 LTR;SINE
6 chr10:100184879-100187424 DNA
7 chr10:100199091-100199540 SINE
8 chr10:10020802-10021452 SINE;LTR
9 chr10:100208189-100209822 SINE;LTR
10 chr10:100226355-100227311 LINE
# ℹ 89,016 more rows
H3K27ac_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 6 × 2
n nn
<int> <int>
1 1 513
2 2 411
3 3 195
4 4 38
5 5 8
6 6 1
H3K27ac_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 1,166 × 2
Peakid repClass
<chr> <chr>
1 chr10:103196972-103198089 LINE;SINE;Low_complexity
2 chr10:103898324-103899678 LINE;SINE
3 chr10:104775314-104776084 SINE;LINE
4 chr10:105604321-105604967 SINE;LINE;LTR;Simple_repeat
5 chr10:11058814-11060583 SINE;DNA
6 chr10:110845092-110847119 SINE;LINE;Simple_repeat
7 chr10:110865305-110866546 SINE
8 chr10:110946871-110947695 LINE
9 chr10:113326142-113327412 SINE
10 chr10:113524679-113525161 SINE
# ℹ 1,156 more rows
H3K27ac_ols$Set_3 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 6 × 2
n nn
<int> <int>
1 1 2303
2 2 2054
3 3 974
4 4 317
5 5 67
6 6 9
H3K27ac_ols$Set_3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 5,724 × 2
Peakid repClass
<chr> <chr>
1 chr10:100738942-100739973 Simple_repeat
2 chr10:100790295-100791877 Low_complexity;Simple_repeat
3 chr10:101132954-101134846 Simple_repeat
4 chr10:101157763-101159378 RC;LTR
5 chr10:101227112-101229171 Low_complexity;Simple_repeat
6 chr10:101230211-101230842 Simple_repeat
7 chr10:101557797-101558624 SINE
8 chr10:101995380-101996279 LINE
9 chr10:102143712-102144701 LINE
10 chr10:102158807-102161487 SINE;Simple_repeat;LTR
# ℹ 5,714 more rows
H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 8 × 2
n nn
<int> <int>
1 1 55458
2 2 41836
3 3 17564
4 4 5519
5 5 1416
6 6 203
7 7 7
8 8 1
H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 122,004 × 2
Peakid repClass
<chr> <chr>
1 chr10:100005111-100005718 LINE;SINE
2 chr10:100006009-100007728 SINE;LINE
3 chr10:100008216-100011014 SINE;Simple_repeat;Unknown;LINE
4 chr10:100043350-100044800 SINE;DNA;Low_complexity
5 chr10:100046129-100046853 LTR;SINE
6 chr10:100179689-100180264 SINE
7 chr10:100184879-100187424 DNA
8 chr10:100199091-100199540 SINE
9 chr10:10020802-10021452 SINE;LTR
10 chr10:100208189-100209822 SINE;LTR
# ℹ 121,994 more rows
length(H3K27ac_sets_gr$all_H3K27ac)
[1] 150071
H3K27me3_ols <- lapply(H3K27me3_sets_gr, function(df) {
# 1️⃣ find overlaps
hits <- findOverlaps(df, rmskr_std_gr)
# 2️⃣ intersect the overlapping ranges
overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
# 3️⃣ convert to data frame
overlaps_df <- as.data.frame(overlaps_gr)
# 4️⃣ add overlap length
overlaps_df$overlap_bp <- width(overlaps_gr)
# 5️⃣ ROI lengths & % coverage
roi_widths <- width(df)
names(roi_widths) <- df$Peakid
overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
# 6️⃣ TE lengths & % coverage
te_widths <- width(rmskr_std_gr)
overlaps_df$te_len <- te_widths[subjectHits(hits)]
overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
# 7️⃣ Add TE metadata from rmsk GRanges
overlaps_df$repName <- rmskr_std_gr$repName[subjectHits(hits)]
overlaps_df$repClass <- rmskr_std_gr$repClass[subjectHits(hits)]
overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
overlaps_df
})
rmskr_nonoverlapping_H3K27me3_gr <- rmskr_std_gr[countOverlaps(rmskr_std_gr, H3K27me3_sets_gr[[3]]) == 0]
H3K27me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 8 × 2
n nn
<int> <int>
1 1 59299
2 2 34090
3 3 12830
4 4 5071
5 5 1774
6 6 379
7 7 35
8 8 1
H3K27me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 113,479 × 2
Peakid repClass
<chr> <chr>
1 chr10:100046102-100046460 LTR;SINE
2 chr10:100054231-100054520 SINE
3 chr10:100078858-100079126 LINE
4 chr10:100084784-100089581 SINE;LINE;LTR;Low_complexity
5 chr10:100113114-100113630 SINE
6 chr10:100113867-100114131 SINE
7 chr10:100114436-100115458 LINE
8 chr10:100127919-100128315 LTR;LINE
9 chr10:100129022-100129787 LINE;SINE
10 chr10:100229003-100229494 Simple_repeat
# ℹ 113,469 more rows
H3K27me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 3 × 2
n nn
<int> <int>
1 1 104
2 2 41
3 3 8
H3K27me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 153 × 2
Peakid repClass
<chr> <chr>
1 chr10:104424708-104425325 LINE;SINE
2 chr10:113524664-113525075 SINE
3 chr10:129076374-129076858 Simple_repeat
4 chr10:130134682-130135251 LINE
5 chr10:132175296-132177004 LINE;Simple_repeat;tRNA
6 chr10:133324189-133324746 Simple_repeat
7 chr10:1398800-1399241 LINE
8 chr10:27256098-27256641 SINE;LTR
9 chr10:29968825-29969238 Simple_repeat
10 chr10:30348778-30349159 Simple_repeat
# ℹ 143 more rows
H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 8 × 2
n nn
<int> <int>
1 1 59987
2 2 34344
3 3 12893
4 4 5078
5 5 1775
6 6 379
7 7 35
8 8 1
H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 114,492 × 2
Peakid repClass
<chr> <chr>
1 chr10:100046102-100046460 LTR;SINE
2 chr10:100054231-100054520 SINE
3 chr10:100078858-100079126 LINE
4 chr10:100084784-100089581 SINE;LINE;LTR;Low_complexity
5 chr10:100113114-100113630 SINE
6 chr10:100113867-100114131 SINE
7 chr10:100114436-100115458 LINE
8 chr10:100127919-100128315 LTR;LINE
9 chr10:100129022-100129787 LINE;SINE
10 chr10:100229003-100229494 Simple_repeat
# ℹ 114,482 more rows
length(H3K27me3_sets_gr$all_H3K27me3)
[1] 150464
H3K36me3_ols <- lapply(H3K36me3_sets_gr, function(df) {
# 1️⃣ find overlaps
hits <- findOverlaps(df, rmskr_std_gr)
# 2️⃣ intersect the overlapping ranges
overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
# 3️⃣ convert to data frame
overlaps_df <- as.data.frame(overlaps_gr)
# 4️⃣ add overlap length
overlaps_df$overlap_bp <- width(overlaps_gr)
# 5️⃣ ROI lengths & % coverage
roi_widths <- width(df)
names(roi_widths) <- df$Peakid
overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
# 6️⃣ TE lengths & % coverage
te_widths <- width(rmskr_std_gr)
overlaps_df$te_len <- te_widths[subjectHits(hits)]
overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
# 7️⃣ Add TE metadata from rmsk GRanges
overlaps_df$repName <- rmskr_std_gr$repName[subjectHits(hits)]
overlaps_df$repClass <- rmskr_std_gr$repClass[subjectHits(hits)]
overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
overlaps_df
})
rmskr_nonoverlapping_H3K36me3_gr <- rmskr_std_gr[countOverlaps(rmskr_std_gr, H3K36me3_sets_gr[[3]]) == 0]
H3K36me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 7 × 2
n nn
<int> <int>
1 1 61918
2 2 24560
3 3 5127
4 4 1062
5 5 192
6 6 22
7 7 1
H3K36me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 92,882 × 2
Peakid repClass
<chr> <chr>
1 chr10:100009339-100010339 SINE;Simple_repeat
2 chr10:100139015-100139306 LINE
3 chr10:100147443-100147738 LINE
4 chr10:100196474-100197370 SINE;DNA;LINE
5 chr10:100198988-100199626 SINE
6 chr10:100201468-100201937 DNA;SINE
7 chr10:100202789-100203153 SINE;LINE
8 chr10:100203832-100204198 LTR
9 chr10:100208051-100208777 SINE;LTR
10 chr10:100214861-100215206 LINE;SINE
# ℹ 92,872 more rows
H3K36me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 6 × 2
n nn
<int> <int>
1 1 628
2 2 359
3 3 148
4 4 43
5 5 18
6 6 5
H3K36me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 1,201 × 2
Peakid repClass
<chr> <chr>
1 chr10:110729433-110730934 LINE;SINE;Simple_repeat
2 chr10:110803615-110804488 SINE;LTR
3 chr10:114151594-114151957 DNA
4 chr10:117139090-117139567 Simple_repeat
5 chr10:117221071-117221394 DNA
6 chr10:118698372-118698876 SINE;LTR
7 chr10:122207358-122208228 DNA
8 chr10:123157127-123157564 LINE
9 chr10:124613974-124614749 SINE
10 chr10:124670751-124671297 LINE;LTR
# ℹ 1,191 more rows
H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 7 × 2
n nn
<int> <int>
1 1 86790
2 2 34976
3 3 7463
4 4 1502
5 5 284
6 6 39
7 7 3
H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 131,057 × 2
Peakid repClass
<chr> <chr>
1 chr10:100009339-100010339 SINE;Simple_repeat
2 chr10:100139015-100139306 LINE
3 chr10:100147443-100147738 LINE
4 chr10:100153009-100153887 SINE;LINE
5 chr10:100173135-100173409 DNA
6 chr10:100186943-100187317 DNA
7 chr10:100196474-100197370 SINE;DNA;LINE
8 chr10:100198988-100199626 SINE
9 chr10:100201468-100201937 DNA;SINE
10 chr10:100202789-100203153 SINE;LINE
# ℹ 131,047 more rows
length(H3K36me3_sets_gr$all_H3K36me3)
[1] 186724
H3K9me3_ols <- lapply(H3K9me3_sets_gr, function(df) {
# 1️⃣ find overlaps
hits <- findOverlaps(df, rmskr_std_gr)
# 2️⃣ intersect the overlapping ranges
overlaps_gr <- pintersect(df[queryHits(hits)], rmskr_std_gr[subjectHits(hits)])
# 3️⃣ convert to data frame
overlaps_df <- as.data.frame(overlaps_gr)
# 4️⃣ add overlap length
overlaps_df$overlap_bp <- width(overlaps_gr)
# 5️⃣ ROI lengths & % coverage
roi_widths <- width(df)
names(roi_widths) <- df$Peakid
overlaps_df$roi_len <- roi_widths[as.character(overlaps_df$Peakid)]
overlaps_df$pct_roi_overlap <- (overlaps_df$overlap_bp / overlaps_df$roi_len) * 100
# 6️⃣ TE lengths & % coverage
te_widths <- width(rmskr_std_gr)
overlaps_df$te_len <- te_widths[subjectHits(hits)]
overlaps_df$pct_te_overlap <- (overlaps_df$overlap_bp / overlaps_df$te_len) * 100
# 7️⃣ Add TE metadata from rmsk GRanges
overlaps_df$repName <- rmskr_std_gr$repName[subjectHits(hits)]
overlaps_df$repClass <- rmskr_std_gr$repClass[subjectHits(hits)]
overlaps_df$repFamily <- rmskr_std_gr$repFamily[subjectHits(hits)]
overlaps_df$milliDiv <- rmskr_std_gr$milliDiv[subjectHits(hits)]
overlaps_df$milliDel <- rmskr_std_gr$milliDel[subjectHits(hits)]
overlaps_df$milliIns <- rmskr_std_gr$milliIns[subjectHits(hits)]
overlaps_df
})
rmskr_nonoverlapping_H3K9me3_gr <- rmskr_std_gr[countOverlaps(rmskr_std_gr, H3K9me3_sets_gr[[4]]) == 0]
H3K9me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 6 × 2
n nn
<int> <int>
1 1 107939
2 2 32414
3 3 4976
4 4 574
5 5 69
6 6 2
H3K9me3_ols$Set_1 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 145,974 × 2
Peakid repClass
<chr> <chr>
1 chr10:100009524-100010017 Simple_repeat
2 chr10:100025478-100025829 LTR
3 chr10:100027743-100029126 LTR;SINE
4 chr10:100046119-100046433 LTR;SINE
5 chr10:100086477-100087363 LINE;LTR;SINE
6 chr10:100095279-100095524 LINE
7 chr10:100097281-100098436 SINE;LTR
8 chr10:100133860-100134292 LINE;Simple_repeat;LTR
9 chr10:10020906-10021393 SINE;LTR
10 chr10:100229039-100229275 Simple_repeat
# ℹ 145,964 more rows
H3K9me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 4 × 2
n nn
<int> <int>
1 1 1362
2 2 266
3 3 26
4 4 2
H3K9me3_ols$Set_2 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 1,656 × 2
Peakid repClass
<chr> <chr>
1 chr10:101006128-101006429 SINE
2 chr10:101777876-101778344 LINE
3 chr10:101951337-101951612 SINE
4 chr10:102418095-102418389 Simple_repeat
5 chr10:104329400-104329651 LINE
6 chr10:1101897-1102240 LINE
7 chr10:110227445-110227710 Low_complexity
8 chr10:114786610-114786850 Retroposon
9 chr10:117531528-117532237 SINE
10 chr10:11893983-11894609 SINE
# ℹ 1,646 more rows
H3K9me3_ols$Set_3 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 5 × 2
n nn
<int> <int>
1 1 405
2 2 119
3 3 26
4 4 4
5 5 1
H3K9me3_ols$Set_3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 555 × 2
Peakid repClass
<chr> <chr>
1 chr10:114252483-114252701 SINE
2 chr10:115903919-115904555 LTR
3 chr10:124611053-124611641 SINE
4 chr10:1298991-1300057 SINE;LTR
5 chr10:1300214-1301077 LINE;SINE
6 chr10:132172535-132172962 LINE;Simple_repeat
7 chr10:132279104-132279514 SINE
8 chr10:132363329-132364326 Simple_repeat
9 chr10:132757314-132757720 DNA
10 chr10:24446759-24447078 SINE
# ℹ 545 more rows
H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
group_by(Peakid,repClass) %>%
summarise() %>%
group_by(Peakid) %>% tally %>%
group_by(n) %>% tally
# A tibble: 6 × 2
n nn
<int> <int>
1 1 117258
2 2 34507
3 3 5210
4 4 591
5 5 70
6 6 2
H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"))
# A tibble: 157,638 × 2
Peakid repClass
<chr> <chr>
1 chr10:100006461-100006612 SINE;LINE
2 chr10:100009524-100010017 Simple_repeat
3 chr10:100025478-100025829 LTR
4 chr10:100027743-100029126 LTR;SINE
5 chr10:100046119-100046433 LTR;SINE
6 chr10:100086477-100087363 LINE;LTR;SINE
7 chr10:100095279-100095524 LINE
8 chr10:100097281-100098436 SINE;LTR
9 chr10:100133860-100134292 LINE;Simple_repeat;LTR
10 chr10:10020906-10021393 SINE;LTR
# ℹ 157,628 more rows
length(H3K9me3_sets_gr$all_H3K9me3)
[1] 218647
Below I am setting up the data frames, first I take all regions and annotate with the overlapping dataframe. Then I make new columns, TE_status, SINE_status, LINE_status, LTR_status, DNA_status, SVA_status. These columns mark Peakids if they have this attribute. Last, I am annotating a cluster column to add which set the Peakid exists in by histone. (field will be Set1, Set2, Set3, not_assigned)
All_H3K27ac <- H3K27ac_sets_gr$all_H3K27ac %>% as.data.frame()
All_H3K27me3 <- H3K27me3_sets_gr$all_H3K27me3 %>% as.data.frame()
All_H3K36me3 <- H3K36me3_sets_gr$all_H3K36me3 %>% as.data.frame()
All_H3K9me3 <- H3K9me3_sets_gr$all_H3K9me3 %>% as.data.frame()
H3K27ac_lookup <- imap_dfr(peakAnnoList_H3K27ac[1:3], ~
tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K27me3_lookup <- imap_dfr(peakAnnoList_H3K27me3[1:2], ~
tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K36me3_lookup <- imap_dfr(peakAnnoList_H3K36me3[1:2], ~
tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
H3K9me3_lookup <- imap_dfr(peakAnnoList_H3K9me3[1:3], ~
tibble(Peakid = .x@anno$Peakid, cluster = .y)
)
annotated_H3K27ac <- All_H3K27ac %>%
left_join(., (H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"),
repFamily=paste0(unique(repFamily),collapse=";"),
repName=paste0(unique(repName),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K27ac_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K27me3 <- All_H3K27me3 %>%
left_join(., (H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"),
repFamily=paste0(unique(repFamily),collapse=";"),
repName=paste0(unique(repName),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K27me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K36me3 <- All_H3K36me3 %>%
left_join(., (H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"),
repFamily=paste0(unique(repFamily),collapse=";"),
repName=paste0(unique(repName),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K36me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K9me3 <- All_H3K9me3 %>%
left_join(., (H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass), collapse=";"),
repFamily=paste0(unique(repFamily),collapse=";"),
repName=paste0(unique(repName),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K9me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
# annotated_tables <- list("annotated_H3K27ac"=annotated_H3K27ac,
# "annotated_H3K27me3"=annotated_H3K27me3,
# "annotated_H3K36me3"=annotated_H3K36me3,
# "annotated_H3K9me3"=annotated_H3K9me3)
#
# saveRDS(annotated_tables,"data/TE_annotation/annotated_tables_1bp_ol.RDS")
H3K27ac_data <- data.frame("total"=length(All_H3K27ac$Peakid),"Set1"=length(H3K27ac_lookup$cluster[H3K27ac_lookup$cluster=="Set_1"]),"Set2"=length(H3K27ac_lookup$cluster[H3K27ac_lookup$cluster=="Set_2"]),"Set3"=length(H3K27ac_lookup$cluster[H3K27ac_lookup$cluster=="Set_3"]))
H3K9me3_data <- data.frame("total"=length(All_H3K9me3$Peakid),"Set1"=length(H3K9me3_lookup$cluster[H3K9me3_lookup$cluster=="Set_1"]),"Set2"=length(H3K9me3_lookup$cluster[H3K9me3_lookup$cluster=="Set_2"]),"Set3"=length(H3K9me3_lookup$cluster[H3K9me3_lookup$cluster=="Set_3"]))
H3K27me3_data <- data.frame("total"=length(All_H3K27me3$Peakid),"Set1"=length(H3K27me3_lookup$cluster[H3K27me3_lookup$cluster=="Set_1"]),"Set2"=length(H3K27me3_lookup$cluster[H3K27me3_lookup$cluster=="Set_2"]),"Set3"=length(H3K27me3_lookup$cluster[H3K27me3_lookup$cluster=="Set_3"]))
H3K36me3_data <- data.frame("total"=length(All_H3K36me3$Peakid),"Set1"=length(H3K36me3_lookup$cluster[H3K36me3_lookup$cluster=="Set_1"]),"Set2"=length(H3K36me3_lookup$cluster[H3K36me3_lookup$cluster=="Set_2"]),"Set3"=length(H3K36me3_lookup$cluster[H3K36me3_lookup$cluster=="Set_3"]))
Region_dataframe <- bind_rows(list(H3K27ac=H3K27ac_data, H3K27me3=H3K27me3_data, H3K36me3=H3K36me3_data, H3K9me3=H3K9me3_data), .id="histone")
datatable(Region_dataframe)
50% counting histones
annotated_H3K27ac_50_per <- All_H3K27ac %>%
left_join(., (H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap>50) %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K27ac_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K27me3_50_per <- All_H3K27me3 %>%
left_join(., (H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap>50) %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K27me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K36me3_50_per <- All_H3K36me3 %>%
left_join(., (H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap>50) %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K36me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
annotated_H3K9me3_50_per <- All_H3K9me3 %>%
left_join(., (H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap>50) %>%
group_by(Peakid) %>%
summarize(.,
repClass=paste0(unique(repClass[pct_te_overlap>50]), collapse=";"),
repFamily=paste0(unique(repFamily[pct_te_overlap>50]),collapse=";"),
repName=paste0(unique(repName[pct_te_overlap>50]),collapse=";"))))%>%
mutate(TE_status = if_else(is.na(repClass), "wnot_TE","TE")) %>%
mutate(SINE_status = case_when(is.na(repClass)~ "wnot_SINE",
str_detect(repClass, "SINE") ~ "SINE",
TRUE ~"wnot_SINE")) %>%
mutate(LINE_status = case_when(is.na(repClass)~ "wnot_LINE",
str_detect(repClass, "LINE") ~ "LINE",
TRUE ~"wnot_LINE")) %>%
mutate(LTR_status = case_when(is.na(repClass)~ "wnot_LTR",
str_detect(repClass, "LTR") ~ "LTR",
TRUE ~"wnot_LTR")) %>%
mutate(DNA_status = case_when(is.na(repClass)~ "wnot_DNA",
str_detect(repClass, "DNA") ~ "DNA",
TRUE ~"wnot_DNA")) %>%
mutate(SVA_status = case_when(is.na(repClass)~ "wnot_SVA",
str_detect(repClass, "Retroposon") ~ "SVA",
TRUE ~"wnot_SVA")) %>%
left_join(H3K9me3_lookup, by = "Peakid") %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
# annotated_tables_50_per <- list("annotated_H3K27ac_50_per"=annotated_H3K27ac_50_per,
# "annotated_H3K27me3_50_per"=annotated_H3K27me3_50_per,
# "annotated_H3K36me3_50_per"=annotated_H3K36me3_50_per,
# "annotated_H3K9me3_50_per"=annotated_H3K9me3_50_per)
#
# saveRDS(annotated_tables_50_per,"data/TE_annotation/annotated_tables_50per_ol.RDS")
te_cols <- c("TE_status","SINE_status","LINE_status","DNA_status","LTR_status","SVA_status")
H3K27ac_counts <- map(te_cols, ~ {
annotated_H3K27ac %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K27ac_counts_long <- map2_dfr(
H3K27ac_counts, # the named list of wide tables
names(H3K27ac_counts), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / wnot_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K27ac_counts_long <- H3K27ac_counts_long %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K27me3_counts <- map(te_cols, ~ {
annotated_H3K27me3 %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K27me3_counts_long <- map2_dfr(
H3K27me3_counts, # the named list of wide tables
names(H3K27me3_counts), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / not_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K27me3_counts_long <- H3K27me3_counts_long %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K36me3_counts <- map(te_cols, ~ {
annotated_H3K36me3 %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K36me3_counts_long <- map2_dfr(
H3K36me3_counts, # the named list of wide tables
names(H3K36me3_counts), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / not_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K36me3_counts_long <- H3K36me3_counts_long %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K9me3_counts <- map(te_cols, ~ {
annotated_H3K9me3 %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K9me3_counts_long <- map2_dfr(
H3K9me3_counts, # the named list of wide tables
names(H3K9me3_counts), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / wnot_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K9me3_counts_long <- H3K9me3_counts_long %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
te_cols <- c("TE_status","SINE_status","LINE_status","DNA_status","LTR_status","SVA_status")
H3K27ac_counts_50per <- map(te_cols, ~ {
annotated_H3K27ac_50_per %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K27ac_counts_long_50per <- map2_dfr(
H3K27ac_counts_50per, # the named list of wide tables
names(H3K27ac_counts_50per), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / wnot_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K27ac_counts_long_50per <- H3K27ac_counts_long_50per %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K27me3_counts_50per <- map(te_cols, ~ {
annotated_H3K27me3_50_per %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K27me3_counts_long_50per <- map2_dfr(
H3K27me3_counts_50per, # the named list of wide tables
names(H3K27me3_counts_50per), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / not_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K27me3_counts_long_50per <- H3K27me3_counts_long_50per %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K36me3_counts_50per <- map(te_cols, ~ {
annotated_H3K36me3_50_per %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K36me3_counts_long_50per <- map2_dfr(
H3K36me3_counts_50per, # the named list of wide tables
names(H3K36me3_counts_50per), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / not_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K36me3_counts_long_50per <- H3K36me3_counts_long_50per %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
H3K9me3_counts_50per <- map(te_cols, ~ {
annotated_H3K9me3_50_per %>%
group_by(cluster, !!sym(.x)) %>% # dynamically use column
tally() %>%
pivot_wider(
id_cols = cluster,
names_from = !!sym(.x),
values_from = n
)
}) %>% set_names(te_cols)
H3K9me3_counts_long_50per <- map2_dfr(
H3K9me3_counts_50per, # the named list of wide tables
names(H3K9me3_counts_50per), # names of TE columns
~ .x %>%
pivot_longer(
cols = -cluster, # everything except 'cluster'
names_to = "status", # new column for TE status (e.g., TE / wnot_TE)
values_to = "count"
) %>%
mutate(TE_type = .y) # add TE_type column based on list element name
)
H3K9me3_counts_long_50per <- H3K9me3_counts_long_50per %>%
group_by(TE_type) %>%
tidyr::complete(cluster, status, fill = list(count = 0)) %>%
ungroup()
making the function
# # function to test enrichment for one TE type, one cluster pair
# test_pair_TE <- function(df_long, te_name, cluster1, cluster2) {
#
# # filter for TE type
# sub_df <- df_long %>% filter(TE_type == te_name)
#
# # get counts for cluster1 and cluster2
# c1 <- sub_df %>% dplyr::filter(cluster == cluster1)
# c2 <- sub_df %>% dplyr::filter(cluster == cluster2)
#
# # 2x2 table: rows = clusters, cols = TE / wnot_TE
# mat <- matrix(
# c(c2$count, sum(c2$count) - c2$count,
# c1$count, sum(c1$count) - c1$count),
# nrow = 2,
# byrow = TRUE,
# dimnames = list(
# cluster = c(cluster2, cluster1),
# category = c("TE", "wnot_TE")
# )
# )
#
# ft <- fisher.test(mat)
#
# tibble(
# TE_type = te_name,
# comparison = paste(cluster2, "vs", cluster1),
# odds_ratio = ft$estimate,
# lower_CI = ft$conf.int[1],
# upper_CI = ft$conf.int[2],
# p_value = ft$p.value
# )
# }
# Generic pairwise Fisher test
test_pair_TE_generic <- function(df_long, te_name, cluster1, cluster2) {
sub_df <- df_long %>% filter(TE_type == te_name)
# assume "status" column has TE vs wnot_TE automatically
statuses <- unique(sub_df$status)
if(length(statuses) != 2) {
# ensure we have exactly two categories, fill missing with 0
sub_df <- sub_df %>%
complete(cluster, status, fill = list(count = 0))
statuses <- unique(sub_df$status)
}
# extract counts for cluster1
c1_counts <- sub_df %>%
filter(cluster == cluster1) %>%
arrange(match(status, statuses)) %>% # ensure same order
pull(count)
# extract counts for cluster2
c2_counts <- sub_df %>%
filter(cluster == cluster2) %>%
arrange(match(status, statuses)) %>%
pull(count)
# build 2x2 matrix
mat <- matrix(
c(c2_counts, c1_counts),
nrow = 2,
byrow = TRUE,
dimnames = list(
cluster = c(cluster2, cluster1),
category = statuses
)
)
ft <- tryCatch(
fisher.test(mat, workspace = 2e8),
error = function(e) fisher.test(mat, simulate.p.value = TRUE, B = 1e5)
)
tibble(
TE_type = te_name,
comparison = paste(cluster2, "vs", cluster1),
odds_ratio = ft$estimate,
lower_CI = ft$conf.int[1],
upper_CI = ft$conf.int[2],
p_value = ft$p.value
)
}
applying to dataframes
TE_types <- te_cols
cluster_pairs_3 <- list(
c("Set_2","Set_1"),
c("Set_3","Set_1")
# add more pairs if needed or
)
cluster_pairs_2 <- list(
c("Set_2","Set_1"))
H3K27ac_results_1bp <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_3, function(pair){
test_pair_TE_generic(H3K27ac_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K27ac odds ratio 1bp"),
datatable(H3K27ac_results_1bp,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K27ac_counts_long %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K27ac 1 bp overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
###
H3K27ac_results_50per <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_3, function(pair){
test_pair_TE_generic(H3K27ac_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K27ac odds ratio >50% overlap results"),
datatable(H3K27ac_results_50per,
options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K27ac_counts_long_50per %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K27ac 50% TE overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
H3K27me3_results_1bp <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_2, function(pair){
test_pair_TE_generic(H3K27me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K27me3 odds ratio 1bp"),
datatable(H3K27me3_results_1bp,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K27me3_counts_long %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K27me3 1 bp overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
###
H3K27me3_results_50per <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_2, function(pair){
test_pair_TE_generic(H3K27me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K27me3 odds ratio >50% overlap results"),
datatable(H3K27me3_results_50per,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K27me3_counts_long_50per %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K27me3 50% TE overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
H3K36me3_results_1bp <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_2, function(pair){
test_pair_TE_generic(H3K36me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K36me3 odds ratio 1bp result"),
datatable(H3K36me3_results_1bp,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K36me3_counts_long %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K36me3 1 bp overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
###
H3K36me3_results_50per <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_2, function(pair){
test_pair_TE_generic(H3K36me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K36me3 odds ratio >50% overlap results"),
datatable(H3K36me3_results_50per,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K36me3_counts_long_50per %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K36me3 50% TE overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
H3K9me3_results_1bp <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_3, function(pair){
test_pair_TE_generic(H3K9me3_counts_long, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K9me3 odds ratio 1bp result"),
datatable(H3K9me3_results_1bp,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K9me3_counts_long %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K9me3 1 bp overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

| Version | Author | Date |
|---|---|---|
| 967b7a8 | reneeisnowhere | 2025-09-16 |
###
H3K9me3_results_50per <- map_dfr(TE_types, function(te){
map_dfr(cluster_pairs_3, function(pair){
test_pair_TE_generic(H3K9me3_counts_long_50per, te, cluster1 = pair[2], cluster2 = pair[1])
})
})
htmltools::tagList(
htmltools::h3("H3K9me3 odds ratio >50% overlap results"),
datatable(H3K9me3_results_50per,options = list(
scrollY = "600px",
scrollCollapse = TRUE,
pageLength = 20,
autoWidth = TRUE
),
rownames = FALSE))
H3K9me3_counts_long_50per %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=count, fill=status)) +
geom_bar(stat="identity", position = "fill") +
facet_wrap(~TE_type, scales="free_y")+
ggtitle("H3K9me3 50% TE overlap enrichment")+
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(face = "bold")
)

rmskr_nonoverlapping_H3K27ac_gr$te_len <- width(rmskr_nonoverlapping_H3K27ac_gr)
non_ol_H3K27ac_df <- rmskr_nonoverlapping_H3K27ac_gr %>%
as.data.frame() %>%
mutate(cluster="non_overlapping", Peakid="non_overlapping", pct_te_overlap=0) %>%
mutate(special_type=case_when(repClass=="SINE"~"SINE",
repClass=="LINE"~"LINE",
repClass=="LTR"~"LTR",
repClass=="DNA"~"DNA",
repClass=="Retroposon"~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
# dplyr::select(seqnames, start, end, width, strand, Peakid, annotation, geneChr, geneStart, geneEnd, geneLength, geneStrand,geneId,transcriptId, distanceToTSS, hit, overlap_bp, roi_len,pct_roi_overlap,te_len, pct_te_overlap, repName, repclass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
rmskr_nonoverlapping_H3K27me3_gr$te_len <- width(rmskr_nonoverlapping_H3K27me3_gr)
non_ol_H3K27me3_df <- rmskr_nonoverlapping_H3K27me3_gr %>%
as.data.frame() %>%
mutate(cluster="non_overlapping", Peakid="non_overlapping", pct_te_overlap=0) %>%
mutate(special_type=case_when(repClass=="SINE"~"SINE",
repClass=="LINE"~"LINE",
repClass=="LTR"~"LTR",
repClass=="DNA"~"DNA",
repClass=="Retroposon"~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
# dplyr::select(seqnames, start, end, width, strand, Peakid, annotation, geneChr, geneStart, geneEnd, geneLength, geneStrand,geneId,transcriptId, distanceToTSS, hit, overlap_bp, roi_len,pct_roi_overlap,te_len, pct_te_overlap, repName, repclass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
rmskr_nonoverlapping_H3K36me3_gr$te_len <- width(rmskr_nonoverlapping_H3K36me3_gr)
non_ol_H3K36me3_df <- rmskr_nonoverlapping_H3K36me3_gr %>%
as.data.frame() %>%
mutate(cluster="non_overlapping", Peakid="non_overlapping", pct_te_overlap=0) %>%
mutate(special_type=case_when(repClass=="SINE"~"SINE",
repClass=="LINE"~"LINE",
repClass=="LTR"~"LTR",
repClass=="DNA"~"DNA",
repClass=="Retroposon"~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
# dplyr::select(seqnames, start, end, width, strand, Peakid, annotation, geneChr, geneStart, geneEnd, geneLength, geneStrand,geneId,transcriptId, distanceToTSS, hit, overlap_bp, roi_len,pct_roi_overlap,te_len, pct_te_overlap, repName, repclass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
rmskr_nonoverlapping_H3K9me3_gr$te_len <- width(rmskr_nonoverlapping_H3K9me3_gr)
non_ol_H3K9me3_df <- rmskr_nonoverlapping_H3K9me3_gr %>%
as.data.frame() %>%
mutate(cluster="non_overlapping", Peakid="non_overlapping", pct_te_overlap=0) %>%
mutate(special_type=case_when(repClass=="SINE"~"SINE",
repClass=="LINE"~"LINE",
repClass=="LTR"~"LTR",
repClass=="DNA"~"DNA",
repClass=="Retroposon"~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
# dplyr::select(seqnames, start, end, width, strand, Peakid, annotation, geneChr, geneStart, geneEnd, geneLength, geneStrand,geneId,transcriptId, distanceToTSS, hit, overlap_bp, roi_len,pct_roi_overlap,te_len, pct_te_overlap, repName, repclass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
rmskr_std_gr$te_len <- width(rmskr_std_gr)
human_te_rmskr <- rmskr_std_gr %>%
as.data.frame() %>%
mutate(cluster="All_human_te", Peakid="human_tes", pct_te_overlap=0) %>%
mutate(special_type=case_when(repClass=="SINE"~"SINE",
repClass=="LINE"~"LINE",
repClass=="LTR"~"LTR",
repClass=="DNA"~"DNA",
repClass=="Retroposon"~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div)
H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE")%>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div) %>%
bind_rows(non_ol_H3K27ac_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K27ac 1bp")

H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
bind_rows(human_te_rmskr ) %>%
mutate(ppr_div=milliDiv/1e-9) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
# stat_compare_means(
# aes(group = cluster), # tell ggpubr we want to compare clusters
# method = "wilcox.test", # same test as before
# comparisons = list(c("Set1", "Set2"), c("Set1", "Set3")),
# label = "p.signif", # show stars instead of p-values
# hide.ns = TRUE # hide non-significant comparisons
# ) +
theme_classic()+
# facet_wrap(~ special_type, scales = "free_y") +
ggtitle("Divergence from overlapping TE H3K27ac 50% overlap")

### try to calculate wilcox.test
comparisons <- list(
c("Set_1", "Set_2"),
c("Set_1", "Set_3")
)
comparisons_2 <- list(
c("Set_1","Set_2"))
#
H3K27ac_wilcox_results <- H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
H3K27ac_wilcox_results_per50 <- H3K27ac_ols$all_H3K27ac %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K27ac_wilcox_results,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence overlapped TEs by 1bp H3K27ac"
))
datatable( H3K27ac_wilcox_results_per50,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence of >50% overlapped TEs H3K27ac"
))
# H3K27me3_ols$all_H3K27me3 %>%
# left_join(., H3K27me3_lookup) %>%
# mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
# dplyr::filter(cluster != "not_assigned") %>%
# mutate(special_type=case_when(repClass=="SINE"~"SINE",
# repClass=="LINE"~"LINE",
# repClass=="LTR"~"LTR",
# repClass=="DNA"~"DNA",
# repClass=="Retroposon"~"SVA",
# is.na(repClass)~"not_TE",
# TRUE~"Other")) %>%
# mutate(ppr_div=milliDiv/1e-9) %>%
# ggplot(aes(x=ppr_div))+
# geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
# facet_wrap(~special_type, scales="free_y") +
# theme_classic()
H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div) %>%
bind_rows(non_ol_H3K27me3_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K27me3 1bp")

H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K27me3 50% overlap")

H3K27me3_wilcox_results <- H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
H3K27me3_wilcox_results_per50 <- H3K27me3_ols$all_H3K27me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K27me3_wilcox_results,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence overlapped TEs by 1bp H3K27me3"
))
datatable( H3K27me3_wilcox_results_per50,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence of >50% overlapped TEs H3K27me3"
))
#
# H3K36me3_ols$all_H3K36me3 %>%
# left_join(., H3K36me3_lookup) %>%
# mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
# dplyr::filter(cluster != "not_assigned") %>%
# mutate(special_type=case_when(repClass=="SINE"~"SINE",
# repClass=="LINE"~"LINE",
# repClass=="LTR"~"LTR",
# repClass=="DNA"~"DNA",
# repClass=="Retroposon"~"SVA",
# is.na(repClass)~"not_TE",
# TRUE~"Other")) %>%
# mutate(ppr_div=milliDiv/1e-9) %>%
# ggplot(aes(x=ppr_div))+
# geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
# facet_wrap(~special_type, scales="free_y") +
# theme_classic()
#
H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div) %>%
bind_rows(non_ol_H3K36me3_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K36me3 1bp")

H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
ggtitle("Divergence from overlapping TE H3K36me3 50% overlap")

H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
ggplot(aes(y=ppr_div,x=special_type))+
geom_boxplot(aes(fill=cluster) ) +
# facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K36me3 50% overlap")

H3K36me3_wilcox_results <- H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
H3K36me3_wilcox_results_per50 <- H3K36me3_ols$all_H3K36me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K36me3_wilcox_results,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence overlapped TEs by 1bp H3K36me3"
))
datatable( H3K36me3_wilcox_results_per50,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence of >50% overlapped TEs H3K36me3"
))
#
# H3K9me3_ols$all_H3K9me3 %>%
# left_join(., H3K9me3_lookup) %>%
# mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
# mutate(special_type=case_when(repClass=="SINE"~"SINE",
# repClass=="LINE"~"LINE",
# repClass=="LTR"~"LTR",
# repClass=="DNA"~"DNA",
# repClass=="Retroposon"~"SVA",
# is.na(repClass)~"not_TE",
# TRUE~"Other")) %>%
# mutate(ppr_div=milliDiv/1e-9) %>%
# ggplot(aes(x=ppr_div))+
# geom_histogram(aes(fill=cluster, alpha = 0.4) ) +
# facet_wrap(~special_type, scales="free_y") +
# theme_classic()
H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type, ppr_div) %>%
bind_rows(non_ol_H3K27ac_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K9me3 1bp")

H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(y=ppr_div,x=cluster))+
geom_boxplot(aes(fill=cluster) ) +
facet_wrap(~special_type, scales="free_y") +
theme_classic()+
ggtitle("Divergence from overlapping TE H3K9me3 50% overlap")

H3K9me3_wilcox_results <- H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
H3K9me3_wilcox_results_per50 <- H3K9me3_ols$all_H3K9me3 %>%
as.data.frame() %>%
dplyr::filter(pct_te_overlap> 50) %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
dplyr::filter(cluster != "not_assigned") %>%
mutate(special_type=case_when(str_detect(repClass,"SINE")~"SINE",
str_detect(repClass, "LINE") ~ "LINE",
str_detect(repClass, "LTR") ~ "LTR",
str_detect(repClass, "DNA") ~ "DNA",
str_detect(repClass, "Retroposon") ~ "SVA",
is.na(repClass)~"not_TE",
TRUE~"Other")) %>%
dplyr::filter(special_type != "not_TE") %>%
mutate(ppr_div=milliDiv/1e-9) %>%
filter(cluster %in% c("Set_1", "Set_2","Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(ppr_div ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$ppr_div[dat$cluster == "Set_1"]),
median_other = median(dat$ppr_div[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K9me3_wilcox_results,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence overlapped TEs by 1bp H3K9me3"
))
datatable( H3K9me3_wilcox_results_per50,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on percent divergence of >50% overlapped TEs H3K9me3"
))
ungroup_H3K27ac_repclass <- H3K27ac_ols$all_H3K27ac %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
mutate(special_type=case_when(str_detect("SINE", repClass)~"SINE",
str_detect("LINE", repClass)~"LINE",
str_detect("LTR", repClass)~"LTR",
str_detect("DNA", repClass)~"DNA",
str_detect("Retroposon", repClass)~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other"))%>%
dplyr::filter(cluster != "not_assigned")
# H3K27ac_repclass_sum <- ungroup_H3K27ac_repclass %>%
# group_by(Peakid, special_type) %>%
# summarize(width= min(width),
# min_te_len=min(te_len),
# min_roi_len=min(roi_len),
# min_overlap=min(overlap_bp),
# min_dist_TSS=min(distanceToTSS),
# max_dist_TSS=max(distanceToTSS),
# cluster=paste0(unique(cluster), collapse = ":"))
#
# H3K27ac_repclass_sum %>%
# ggplot(aes(x=special_type,y=min_te_len))+
# geom_boxplot() +
# facet_wrap(~cluster)+
# theme_classic()+
# ggtitle("Overlapping TE minimum length per class in H3K27ac")+
# coord_cartesian(ylim=c(0,2500))
ungroup_H3K27ac_repclass %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type) %>%
bind_rows(non_ol_H3K27ac_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(x=cluster,y=te_len))+
geom_boxplot(aes(fill=cluster)) +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("All overlapping TE lengths in H3K27ac")+
coord_cartesian(ylim=c(0,1700))

H3K27ac_wilcox_te_len <- ungroup_H3K27ac_repclass %>%
dplyr::filter(special_type != "not_TE") %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
test <- wilcox.test(te_len ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"]),
median_other = median(dat$te_len[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K27ac_wilcox_te_len,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on TE length (>1 bp overlap) H3K27ac"))
All_H3K27ac_ROI_df <- H3K27ac_sets_gr$all_H3K27ac %>%
as.data.frame() %>%
left_join(., H3K27ac_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
All_H3K27ac_ROI_df %>%
mutate(cluster="all_regions") %>%
rbind(.,All_H3K27ac_ROI_df) %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=width))+
geom_boxplot()+
ggsignif::geom_signif(comparisons=list(c("all_regions","Set_1"),
c("all_regions","Set_2"),
c("all_regions","Set_3")),
y_position = c(2000, 2400, 2800))+
theme_classic()+
ggtitle(" All assigned ROI lengths in H3K27ac") +
coord_cartesian(ylim=c(0,8000))

ungroup_H3K27ac_repclass %>%
ggplot(aes(x=cluster,y=width))+
geom_boxplot() +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Width of TE_ROI overlaps in H3K27ac")+
coord_cartesian(ylim=c(0,1000))

ungroup_H3K27ac_repclass %>%
group_by(Peakid, special_type) %>%
summarize(distanceToTSS=min(distanceToTSS),
cluster=paste0(unique(cluster),collapse = ",")) %>%
ggplot(aes(x=cluster,y=distanceToTSS))+
geom_boxplot() +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Distance to TSS by class in H3K27ac of overlapping ROIs")+
coord_cartesian(ylim=c(-35000,35000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
ungroup_H3K27ac_repclass %>%
ggplot(aes(x=cluster,y=pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Percent of TE overlapped in H3K27ac data")

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
comparisons <- list(
c("Set_1", "Set_2"),
c("Set_1", "Set_3")
)
Per_TE_cov_H3K27ac <- ungroup_H3K27ac_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(pct_te_overlap ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$pct_te_overlap[dat$cluster == "Set_1"]),
median_other = median(dat$pct_te_overlap[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( Per_TE_cov_H3K27ac,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Percent of TE coverage; H3K27ac",
))
ungroup_H3K27me3_repclass <- H3K27me3_ols$all_H3K27me3 %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
mutate(special_type=case_when(str_detect("SINE", repClass)~"SINE",
str_detect("LINE", repClass)~"LINE",
str_detect("LTR", repClass)~"LTR",
str_detect("DNA", repClass)~"DNA",
str_detect("Retroposon", repClass)~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other"))%>%
dplyr::filter(cluster != "not_assigned")
# H3K27me3_repclass_sum <- ungroup_H3K27me3_repclass %>%
# group_by(Peakid, special_type) %>%
# summarize(width= min(width),
# min_te_len=min(te_len),
# min_roi_len=min(roi_len),
# min_overlap=min(overlap_bp),
# min_dist_TSS=min(distanceToTSS),
# max_dist_TSS=max(distanceToTSS),
# cluster=paste0(unique(cluster), collapse = ":"))
#
# H3K27me3_repclass_sum %>%
# ggplot(aes(x=special_type,y=min_te_len))+
# geom_boxplot() +
# facet_wrap(~cluster)+
# theme_classic()+
# ggtitle("Overlapping TE minimum length per class in H3K27me3")+
# coord_cartesian(ylim=c(0,2500))
ungroup_H3K27me3_repclass %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type) %>%
bind_rows(non_ol_H3K27me3_df) %>%
bind_rows(human_te_rmskr ) %>%
ggplot(aes(x=cluster,y=te_len))+
geom_boxplot(aes(fill=cluster)) +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("All overlapping TE lengths in H3K27me3")+
coord_cartesian(ylim=c(0,1500))

H3K27me3_wilcox_te_len <- ungroup_H3K27me3_repclass %>%
dplyr::filter(special_type != "not_TE") %>%
filter(cluster %in% c("Set_1", "Set_2")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# only run test if both groups are present
if (length(unique(dat$cluster)) == 2) {
test <- wilcox.test(te_len ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"]),
median_other = median(dat$te_len[dat$cluster == comp[2]])
)
} else {
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = NA_real_,
W = NA_real_,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"], na.rm = TRUE),
median_other = median(dat$te_len[dat$cluster == comp[2]], na.rm = TRUE)
)
}
})
}, .keep = TRUE) %>%
bind_rows()
datatable(H3K27me3_wilcox_te_len,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on TE length (>1 bp overlap) H3K27me3"
))
All_H3K27me3_ROI_df <- H3K27me3_sets_gr$all_H3K27me3 %>%
as.data.frame() %>%
left_join(., H3K27me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
All_H3K27me3_ROI_df %>%
mutate(cluster="all_regions") %>%
rbind(.,All_H3K27me3_ROI_df) %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=width))+
geom_boxplot()+
ggsignif::geom_signif(comparisons=list(c("all_regions","Set_1"),
c("all_regions","Set_2")
),
y_position = c(2000, 2400, 2800))+
theme_classic()+
ggtitle(" All assigned ROI lengths in H3K27me3") +
coord_cartesian(ylim=c(0,6000))

ungroup_H3K27me3_repclass %>%
ggplot(aes(x=special_type,y=width))+
geom_boxplot() +
facet_wrap(~cluster)+
theme_classic()+
ggtitle("Width of TE_ROI overlaps in H3K27me3")+
coord_cartesian(ylim=c(0,2000))

ungroup_H3K27me3_repclass %>%
group_by(Peakid, special_type) %>%
summarize(distanceToTSS=min(distanceToTSS),
cluster=paste0(unique(cluster),collapse = ",")) %>%
ggplot(aes(x=cluster,y=distanceToTSS))+
geom_boxplot() +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Distance to TSS by class in H3K27me3 of overlapping ROIs")+
coord_cartesian(ylim=c(-35000,35000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
distanceToTSS_H3K27me3 <- ungroup_H3K27me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(distanceToTSS ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$distanceToTSS[dat$cluster == "Set_1"]),
median_other = median(dat$distanceToTSS[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( distanceToTSS_H3K27me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Distance to TSS; H3K27me3",
))
ungroup_H3K27me3_repclass %>%
ggplot(aes(x=cluster,y=pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Percent of TE overlapped in H3K27me3 data")

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
ungroup_H3K27me3_repclass %>%
dplyr::filter(special_type=="Other") %>%
ggplot(aes(repFamily, y = pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
theme_classic()+
ggtitle("Percent of TE (just the Others) overlapped in H3K27me3 data") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1) )

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
Per_TE_cov_H3K27me3 <- ungroup_H3K27me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(pct_te_overlap ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$pct_te_overlap[dat$cluster == "Set_1"]),
median_other = median(dat$pct_te_overlap[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( Per_TE_cov_H3K27me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Percent of TE coverage; H3K27me3",
))
ungroup_H3K36me3_repclass <- H3K36me3_ols$all_H3K36me3 %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
mutate(special_type=case_when(str_detect("SINE", repClass)~"SINE",
str_detect("LINE", repClass)~"LINE",
str_detect("LTR", repClass)~"LTR",
str_detect("DNA", repClass)~"DNA",
str_detect("Retroposon", repClass)~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other"))%>%
dplyr::filter(cluster != "not_assigned")
# H3K36me3_repclass_sum <- ungroup_H3K36me3_repclass %>%
# group_by(Peakid, special_type) %>%
# summarize(width= min(width),
# min_te_len=min(te_len),
# min_roi_len=min(roi_len),
# min_overlap=min(overlap_bp),
# min_dist_TSS=min(distanceToTSS),
# max_dist_TSS=max(distanceToTSS),
# cluster=paste0(unique(cluster), collapse = ":"))
#
# H3K36me3_repclass_sum %>%
# ggplot(aes(x=special_type,y=min_te_len))+
# geom_boxplot() +
# facet_wrap(~cluster)+
# theme_classic()+
# ggtitle("Overlapping TE minimum length per class in H3K36me3")+
# coord_cartesian(ylim=c(0,2500))
ungroup_H3K36me3_repclass %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type) %>%
bind_rows(non_ol_H3K36me3_df) %>%
ggplot(aes(x=cluster,y=te_len))+
geom_boxplot(aes(fill=cluster)) +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("All overlapping TE lengths in H3K36me3")+
coord_cartesian(ylim=c(0,2000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
H3K36me3_wilcox_te_len <- ungroup_H3K36me3_repclass %>%
dplyr::filter(special_type != "not_TE") %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# only run test if both groups are present
if (length(unique(dat$cluster)) == 2) {
test <- wilcox.test(te_len ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"]),
median_other = median(dat$te_len[dat$cluster == comp[2]])
)
} else {
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = NA_real_,
W = NA_real_,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"], na.rm = TRUE),
median_other = median(dat$te_len[dat$cluster == comp[2]], na.rm = TRUE)
)
}
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K36me3_wilcox_te_len,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on TE length (>1 bp overlap) H3K36me3"
))
All_H3K36me3_ROI_df <- H3K36me3_sets_gr$all_H3K36me3 %>%
as.data.frame() %>%
left_join(., H3K36me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
All_H3K36me3_ROI_df %>%
mutate(cluster="all_regions") %>%
rbind(.,All_H3K36me3_ROI_df) %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=width))+
geom_boxplot()+
ggsignif::geom_signif(comparisons=list(c("all_regions","Set_1"),
c("all_regions","Set_2")
),
y_position = c(2000, 2400, 2800))+
theme_classic()+
ggtitle(" All assigned ROI lengths in H3K36me3") +
coord_cartesian(ylim=c(0,5000))

ungroup_H3K36me3_repclass %>%
ggplot(aes(x=special_type,y=width))+
geom_boxplot() +
facet_wrap(~cluster)+
theme_classic()+
ggtitle("Width of TE_ROI overlaps in H3K36me3")+
coord_cartesian(ylim=c(0,2000))

ungroup_H3K36me3_repclass %>%
group_by(Peakid, special_type) %>%
summarize(distanceToTSS=min(distanceToTSS),
cluster=paste0(unique(cluster),collapse = ",")) %>%
ggplot(aes(x=cluster,y=distanceToTSS))+
geom_boxplot() +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Distance to TSS by class in H3K36me3 of overlapping ROIs")+
coord_cartesian(ylim=c(-35000,35000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
distanceToTSS_H3K36me3 <- ungroup_H3K36me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(distanceToTSS ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$distanceToTSS[dat$cluster == "Set_1"]),
median_other = median(dat$distanceToTSS[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( distanceToTSS_H3K36me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Distance to TSS; H3K36me3",
))
ungroup_H3K36me3_repclass %>%
ggplot(aes(x=cluster,y=pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Percent of TE overlapped in H3K36me3 data")

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
ungroup_H3K36me3_repclass %>%
dplyr::filter(special_type=="Other") %>%
ggplot(aes(cluster, y = pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~repFamily, ncol=5)+
theme_classic()+
ggtitle("Percent of TE (just the Others) overlapped in H3K36me3 data") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1) )

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
Per_TE_cov_H3K36me3 <- ungroup_H3K36me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons_2, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(pct_te_overlap ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$pct_te_overlap[dat$cluster == "Set_1"]),
median_other = median(dat$pct_te_overlap[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( Per_TE_cov_H3K36me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Percent of TE coverage; H3K36me3",
))
ungroup_H3K9me3_repclass <- H3K9me3_ols$all_H3K9me3 %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster)) %>%
mutate(special_type=case_when(str_detect("SINE", repClass)~"SINE",
str_detect("LINE", repClass)~"LINE",
str_detect("LTR", repClass)~"LTR",
str_detect("DNA", repClass)~"DNA",
str_detect("Retroposon", repClass)~"SVA",
is.na(repClass)~"not_TE",
TRUE~"Other"))%>%
dplyr::filter(cluster != "not_assigned")
# H3K9me3_repclass_sum <- ungroup_H3K9me3_repclass %>%
# group_by(Peakid, special_type) %>%
# summarize(width= min(width),
# min_te_len=min(te_len),
# min_roi_len=min(roi_len),
# min_overlap=min(overlap_bp),
# min_dist_TSS=min(distanceToTSS),
# max_dist_TSS=max(distanceToTSS),
# cluster=paste0(unique(cluster), collapse = ":"))
#
# H3K9me3_repclass_sum %>%
# ggplot(aes(x=special_type,y=min_te_len))+
# geom_boxplot() +
# facet_wrap(~cluster)+
# theme_classic()+
# ggtitle("Overlapping TE minimum length per class in H3K9me3")+
# coord_cartesian(ylim=c(0,2500))
ungroup_H3K9me3_repclass %>%
dplyr::select(seqnames, start, end, width, strand, Peakid, te_len, pct_te_overlap, repName, repClass, repFamily, milliDiv, milliDel, milliIns, cluster, special_type) %>%
bind_rows(non_ol_H3K9me3_df) %>%
ggplot(aes(x=cluster,y=te_len))+
geom_boxplot() +
facet_wrap(~special_type)+
theme_classic()+
ggtitle("All overlapping TE lengths in H3K9me3")+
coord_cartesian(ylim=c(0,2000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
H3K9me3_wilcox_te_len <- ungroup_H3K9me3_repclass %>%
dplyr::filter(special_type != "not_TE") %>%
filter(cluster %in% c("Set_1", "Set_2", "Set_3")) %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# only run test if both groups are present
if (length(unique(dat$cluster)) == 2) {
test <- wilcox.test(te_len ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"]),
median_other = median(dat$te_len[dat$cluster == comp[2]])
)
} else {
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = NA_real_,
W = NA_real_,
median_Set1 = median(dat$te_len[dat$cluster == "Set_1"], na.rm = TRUE),
median_other = median(dat$te_len[dat$cluster == comp[2]], na.rm = TRUE)
)
}
})
}, .keep = TRUE) %>%
bind_rows()
datatable( H3K9me3_wilcox_te_len,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test on TE length (>1 bp overlap) H3K9me3"
))
All_H3K9me3_ROI_df <- H3K9me3_sets_gr$all_H3K9me3 %>%
as.data.frame() %>%
left_join(., H3K9me3_lookup) %>%
mutate(cluster = if_else(is.na(cluster), "not_assigned", cluster))
All_H3K9me3_ROI_df %>%
mutate(cluster="all_regions") %>%
rbind(.,All_H3K9me3_ROI_df) %>%
dplyr::filter(cluster != "not_assigned") %>%
ggplot(aes(x=cluster, y=width))+
geom_boxplot()+
ggsignif::geom_signif(comparisons=list(c("all_regions","Set_1"),
c("all_regions","Set_2"),
c("all_regions","Set_3")),
y_position = c(2000, 2400, 2800),
tip_length = 0.01)+
theme_classic()+
ggtitle(" All assigned ROI lengths in H3K9me3") +
coord_cartesian(ylim=c(0,5000))

ungroup_H3K9me3_repclass %>%
ggplot(aes(x=special_type,y=width))+
geom_boxplot() +
facet_wrap(~cluster)+
theme_classic()+
ggtitle("Width of TE_ROI overlaps in H3K9me3")+
coord_cartesian(ylim=c(0,2000))

ungroup_H3K9me3_repclass %>%
ggplot(aes(x=cluster,y=distanceToTSS))+
geom_boxplot(aes(fill=cluster)) +
geom_signif(comparisons = list(c("Set_1","Set_2"),
c("Set_1","Set_3")),
map_signif_level = FALSE,
test = "wilcox.test",
y_position = c(10,50,100))+# note values are scaled internally by geomsignif
# step_increase = 0.1)+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Distance to TSS by class in H3K9me3 of overlapping ROIs")+
coord_cartesian(ylim=c(-45000,45000))

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
ungroup_H3K9me3_repclass %>%
ggplot(aes(x=cluster,y=distanceToTSS))+
geom_boxplot(aes(fill=cluster)) +
geom_signif(comparisons = list(c("Set_1","Set_2"),
c("Set_1","Set_3")),
map_signif_level = FALSE,
test = "wilcox.test",
# y_position = c(10,50,100))+# note values are scaled internally by geomsignif
step_increase = 0.1)+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Distance to TSS by class in H3K9me3 of overlapping ROIs")

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
distanceToTSS_H3K9me3 <- ungroup_H3K9me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(distanceToTSS ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$distanceToTSS[dat$cluster == "Set_1"]),
median_other = median(dat$distanceToTSS[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( distanceToTSS_H3K9me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Distance to TSS; H3K9me3",
))
ungroup_H3K9me3_repclass %>%
ggplot(aes(x=cluster,y=pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~special_type)+
theme_classic()+
ggtitle("Percent of TE overlapped in H3K9me3 data")

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
ungroup_H3K9me3_repclass %>%
dplyr::filter(special_type=="Other") %>%
ggplot(aes(cluster, y = pct_te_overlap))+
geom_boxplot(aes(fill=cluster))+
facet_wrap(~repFamily, ncol=5)+
theme_classic()+
ggtitle("Percent of TE (just the Others) overlapped in H3K9me3 data") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1) )

| Version | Author | Date |
|---|---|---|
| 1f6e044 | reneeisnowhere | 2025-10-03 |
Per_TE_cov_H3K9me3 <- ungroup_H3K9me3_repclass %>%
group_by(special_type) %>%
group_map(~ {
stype <- unique(.x$special_type) # capture the TE type
map_dfr(comparisons, function(comp) {
dat <- .x %>% filter(cluster %in% comp)
# Skip if both clusters are not present
if(length(unique(dat$cluster)) < 2) return(NULL)
test <- wilcox.test(pct_te_overlap ~ cluster, data = dat)
tibble(
special_type = stype,
comparison = paste(comp, collapse = "_vs_"),
p.value = test$p.value,
W = test$statistic,
median_Set1 = median(dat$pct_te_overlap[dat$cluster == "Set_1"]),
median_other = median(dat$pct_te_overlap[dat$cluster == comp[2]])
)
})
}, .keep = TRUE) %>%
bind_rows()
datatable( Per_TE_cov_H3K9me3,caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-weight: bold;',
"Wilcox test of Percent of TE coverage; H3K9me3",
))
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ggpubr_0.6.1
[2] ggsignif_0.6.4
[3] plyranges_1.26.0
[4] ChIPseeker_1.42.1
[5] TxDb.Hsapiens.UCSC.hg38.knownGene_3.20.0
[6] GenomicFeatures_1.58.0
[7] AnnotationDbi_1.68.0
[8] Biobase_2.66.0
[9] BiocParallel_1.40.2
[10] GenomicRanges_1.58.0
[11] GenomeInfoDb_1.42.3
[12] IRanges_2.40.1
[13] S4Vectors_0.44.0
[14] BiocGenerics_0.52.0
[15] genomation_1.38.0
[16] DT_0.33
[17] lubridate_1.9.4
[18] forcats_1.0.0
[19] stringr_1.5.1
[20] dplyr_1.1.4
[21] purrr_1.1.0
[22] readr_2.1.5
[23] tidyr_1.3.1
[24] tibble_3.3.0
[25] ggplot2_3.5.2
[26] tidyverse_2.0.0
[27] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] splines_4.4.2
[2] later_1.4.2
[3] BiocIO_1.16.0
[4] bitops_1.0-9
[5] ggplotify_0.1.2
[6] R.oo_1.27.1
[7] XML_3.99-0.18
[8] lifecycle_1.0.4
[9] rstatix_0.7.2
[10] rprojroot_2.1.1
[11] processx_3.8.6
[12] lattice_0.22-7
[13] vroom_1.6.5
[14] crosstalk_1.2.2
[15] backports_1.5.0
[16] magrittr_2.0.3
[17] sass_0.4.10
[18] rmarkdown_2.29
[19] jquerylib_0.1.4
[20] yaml_2.3.10
[21] plotrix_3.8-4
[22] httpuv_1.6.16
[23] ggtangle_0.0.7
[24] cowplot_1.2.0
[25] DBI_1.2.3
[26] RColorBrewer_1.1-3
[27] abind_1.4-8
[28] zlibbioc_1.52.0
[29] R.utils_2.13.0
[30] RCurl_1.98-1.17
[31] yulab.utils_0.2.1
[32] rappdirs_0.3.3
[33] git2r_0.36.2
[34] GenomeInfoDbData_1.2.13
[35] enrichplot_1.26.6
[36] ggrepel_0.9.6
[37] tidytree_0.4.6
[38] codetools_0.2-20
[39] DelayedArray_0.32.0
[40] DOSE_4.0.1
[41] tidyselect_1.2.1
[42] aplot_0.2.8
[43] UCSC.utils_1.2.0
[44] farver_2.1.2
[45] matrixStats_1.5.0
[46] GenomicAlignments_1.42.0
[47] jsonlite_2.0.0
[48] Formula_1.2-5
[49] tools_4.4.2
[50] treeio_1.30.0
[51] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[52] Rcpp_1.1.0
[53] glue_1.8.0
[54] SparseArray_1.6.2
[55] xfun_0.52
[56] qvalue_2.38.0
[57] MatrixGenerics_1.18.1
[58] withr_3.0.2
[59] fastmap_1.2.0
[60] boot_1.3-32
[61] callr_3.7.6
[62] caTools_1.18.3
[63] digest_0.6.37
[64] timechange_0.3.0
[65] R6_2.6.1
[66] gridGraphics_0.5-1
[67] seqPattern_1.38.0
[68] colorspace_2.1-1
[69] GO.db_3.20.0
[70] gtools_3.9.5
[71] dichromat_2.0-0.1
[72] RSQLite_2.4.3
[73] R.methodsS3_1.8.2
[74] utf8_1.2.6
[75] generics_0.1.4
[76] data.table_1.17.8
[77] rtracklayer_1.66.0
[78] httr_1.4.7
[79] htmlwidgets_1.6.4
[80] S4Arrays_1.6.0
[81] whisker_0.4.1
[82] pkgconfig_2.0.3
[83] gtable_0.3.6
[84] blob_1.2.4
[85] impute_1.80.0
[86] XVector_0.46.0
[87] htmltools_0.5.8.1
[88] carData_3.0-5
[89] fgsea_1.32.4
[90] scales_1.4.0
[91] png_0.1-8
[92] ggfun_0.2.0
[93] knitr_1.50
[94] rstudioapi_0.17.1
[95] tzdb_0.5.0
[96] reshape2_1.4.4
[97] rjson_0.2.23
[98] nlme_3.1-168
[99] curl_7.0.0
[100] cachem_1.1.0
[101] KernSmooth_2.23-26
[102] parallel_4.4.2
[103] restfulr_0.0.16
[104] pillar_1.11.0
[105] vctrs_0.6.5
[106] gplots_3.2.0
[107] promises_1.3.3
[108] car_3.1-3
[109] evaluate_1.0.5
[110] cli_3.6.5
[111] compiler_4.4.2
[112] Rsamtools_2.22.0
[113] rlang_1.1.6
[114] crayon_1.5.3
[115] labeling_0.4.3
[116] ps_1.9.1
[117] getPass_0.2-4
[118] plyr_1.8.9
[119] fs_1.6.6
[120] stringi_1.8.7
[121] gridBase_0.4-7
[122] Biostrings_2.74.1
[123] lazyeval_0.2.2
[124] GOSemSim_2.32.0
[125] Matrix_1.7-3
[126] BSgenome_1.74.0
[127] hms_1.1.3
[128] patchwork_1.3.2
[129] bit64_4.6.0-1
[130] KEGGREST_1.46.0
[131] SummarizedExperiment_1.36.0
[132] igraph_2.1.4
[133] broom_1.0.9
[134] memoise_2.0.1
[135] bslib_0.9.0
[136] ggtree_3.14.0
[137] fastmatch_1.1-6
[138] bit_4.6.0
[139] ape_5.8-1