Demo entry 6691785

123

   

Submitted by 123 on Jan 14, 2018 at 07:19
Language: Python 3. Code size: 960 Bytes.

'''
    @author [吴九玉]
    @email [wujy@geneskies.com]
    @create date 2018-01-13 11:59:53
    @modify date 2018-01-13 11:59:53
    @desc [description]
'''
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re

#从NCBI下载最新CCDS记录文件
os.system('wget ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS.current.txt')

cds_length = 0 #CDS长度尚未计算
exon_all = [] #存储外显子范围
with open('CCDS.current.txt', 'r') as f: #打开文件并逐行读取
    for line in f:
        if line.startswith('#'):
            continue #跳过标题行
        lists = line.rstrip().split('\t') #去掉每行数据最后空格,并分割数据
        exons = re.findall('[0-9]+-[0-9]+', lists[-2]) #倒数第二列数据是CDS位置信息
        for exon in exons:
            exon = lists[0]+'-'+exon #加上染色体信息,保证后面set去重时只去除同一染色体的重叠区
            exon_all.append(exon)

exon_all = set(exon_all) #重复元素在set中自动被过滤
for i in exon_all:
    exon_one = i.split('-')
    cds_length += int(exon_one[2])-int(exon_one[1])
print(cds_length)

This snippet took 0.00 seconds to highlight.

Back to the Entry List or Home.

Delete this entry (admin only).