博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
用哪个版本的基因组和注释文件好?| 亲测
阅读量:5887 次
发布时间:2019-06-19

本文共 4393 字,大约阅读时间需要 14 分钟。

What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

这是一个很细节也很实际的问题,到底用哪个版本?

参考:

toplevel

ensembl

 

First part options:

  • dna_sm - Repeats soft-masked (converts repeat nucleotides to lowercase)
  • dna_rm - Repeats masked (converts repeats to to N's)
  • dna - No masking

Second part options:

  • .toplevel - Includes haplotype information (not sure how aligners deal with this)

  • .primary_assembly - Single reference base per position

 

大部分都推荐使用soft-mask版本的,也就是没有把repeat替换为N。

 

下载hg19基因组:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/

参考:

从genecode下载hg19注释文件:ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/

UCSC也可以下载,不过只能从网页导出。http://genome.ucsc.edu/cgi-bin/hgTables

注:genecode貌似出了问题,https://www.gencodegenes.org/releases/26lift37.html,里面ebi的链接无法下载了。

参考:http://www.biotrainee.com/thread-2035-1-1.html

 

基因组不是越新越好的,看看最新的CNS,里面很少有用最新版本的基因组,为什么?因为注释没跟上,你做出来的东西可能和别人对不上。

 

亲测

用不同版本的基因组效果会怎么样?

我做了转录组的测试,用的hg19和GRCh38

结论如下:

1. reads比对到基因组上的情况大致相同,基本没有差别;

2. 用不同的注释文件,基因表达的结果差距非常大。同样都是用featureCounts

GRCh38的结果:

Assigned        306852Unassigned_Unmapped     0Unassigned_MappingQuality       0Unassigned_Chimera      0Unassigned_FragmentLength       0Unassigned_Duplicate    0Unassigned_MultiMapping 36280Unassigned_Secondary    0Unassigned_Nonjunction  0Unassigned_NoFeatures   56950Unassigned_Overlapping_Length   0Unassigned_Ambiguity    19771
//================================= Running ==================================\\||                                                                            |||| Load annotation file /home/lizhixin/databases/ensembl/release91/Homo_s ... ||||    Features : 1199851                                                      ||||    Meta-features : 58302                                                   ||||    Chromosomes/contigs : 47                                                ||||                                                                            |||| Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||||    Paired-end reads are included.                                          ||||    Assign fragments (read pairs) to features...                            ||||                                                                            ||||    WARNING: reads from the same pair were found not adjacent to each       ||||             other in the input (due to read sorting by location or         ||||             reporting of multi-mapping read pairs).                        ||||                                                                            ||||    Read re-ordering is performed.                                          ||||                                                                            ||||    Total fragments : 419853                                                ||||    Successfully assigned fragments : 306852 (73.1%)                        ||||    Running time : 0.05 minutes                                             ||

  

hg19的结果:

Assigned        586467Unassigned_Unmapped     0Unassigned_MappingQuality       0Unassigned_Chimera      0Unassigned_FragmentLength       0Unassigned_Duplicate    0Unassigned_MultiMapping 66997Unassigned_Secondary    0Unassigned_Nonjunction  0Unassigned_NoFeatures   133437Unassigned_Overlapping_Length   0Unassigned_Ambiguity    47278
//================================= Running ==================================\\||                                                                            |||| Load annotation file /home/lizhixin/databases/cellranger_ref/refdata-c ... ||||    Features : 1130716                                                      ||||    Meta-features : 32738                                                   ||||    Chromosomes/contigs : 45                                                ||||                                                                            |||| Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||||    Paired-end reads are included.                                          ||||    Assign fragments (read pairs) to features...                            ||||    Total fragments : 834179                                                ||||    Successfully assigned fragments : 586467 (70.3%)                        ||||    Running time : 0.05 minutes                                             ||

 

不同的注释文件千万不要乱用!!!  

  

 

转载地址:http://zsgix.baihongyu.com/

你可能感兴趣的文章
工具类
查看>>
vue-webpack 引入echarts 注意事项
查看>>
指针的应用
查看>>
ORACLE 总结
查看>>
实战部署FAST Search Server 2010 for SharePoint (转闪电)
查看>>
二.Python基本数据类型
查看>>
python常用模块---转载
查看>>
web框架-(七)Django补充---models进阶操作及modelform操作
查看>>
kali访问宿主机Web页面解决方案
查看>>
html简介
查看>>
Android利用文本分割拼接开发一个花藤文字生成
查看>>
哈夫曼树的实现
查看>>
12-18Windows窗体应用小程序之记事本(1)
查看>>
02-18 报表
查看>>
毕业论文一次性修改所有字母和数字的字体
查看>>
结构体:HASH表模板
查看>>
[转]理解Linux文件系统之inode
查看>>
在i3 Cpu上允许64位系统
查看>>
视频编解码学习之五:差错控制及传输
查看>>
String:自动进行空间扩展
查看>>