RichieJu: 2011

2011年12月30日星期五

ID To Namelist

ID To Namelist

import os
import time, sys
from Bio import SeqIO

filename=raw_input("Pls enter the name of the output txt file of Blastall.exe: ")
Approach=raw_input("Pls enter the the approach of your Datebase: ")

# Timing start
start=time.time()
wrerr = sys.stderr.write

# Create a file named 'Results' in current dir
if os.path.exists('Results'):
for root, dirs, files in os.walk('Results'):
for name in files:
os.remove(os.path.join(root,name))
else:
os.mkdir('Results')

f=open(filename,'r')
f1=open('Results//ID_datebase.txt','w')

a={}
b=[]

# Search every dirs and files at F:\python missions\YY\Datebase
for root,dirs,files in os.walk(Approach):
for file in files:
for record in SeqIO.parse(os.path.join(root, file), 'fasta'):
b.append(record.id)
a[os.path.join(root, file).split('\\')[-2]+'-'+os.path.join(root, file).split('\\')[-1]]=b
b=[]

# Output ID_datebase
f1.write('Namelist'+'\t'+'Corresponding ID'+'\n')
for key in sorted(a.keys()):
f1.write(str(key)+'\t'+str(a[key])+'\n')

# Name the IDs
num=0

for line in f:
num+=1
for key in a.keys():
if line.split('\t')[1].strip() in str(a[key]):
f2=open('Results\\'+key+'.txt','a')
f2.write(str(line).strip()+'\t'+key+'\t'+'\n')
if num%100000==0:
print str(num)+" sequences have been processed!"

end=time.time()
wrerr("OK, All Work Finished in %3.2f secs\n" % (end-start))
raw_input("Press <Enter> to close this window: ")

2011年11月10日星期四

用Python的split输出excel文件

Here’s an example of some data where the dates not formatted well for easy import into Excel:

20 Sep, 263, 1148, 0, 1, 0, 0, 1, 12.1, 13.9, 1+1, 19.9

20 Sep, 263, 1118, 0, 1, 0, 360, 0, 14.1, 15.3, 1+1, 19.9

20 Sep, 263, 1048, 0, 1, 0, 0, 0, 14.2, 15.1, 1+1, 19.9

20 Sep, 263, 1018, 0, 1, 0, 360, 0, 14.2, 15.9, 1+1, 19.9

20 Sep, 263, 0948, 0, 1, 0, 0, 0, 14.4, 15.3, 1+1, 19.9

The first column has the day and month separated by a space. The second column is year-day, which we’ll ignore. The third column has the time. The data we’re interested in is in the 9th column (temperature). The goal is to have a simple Excel file where the first column is date, and the second column is temperature.

以上从某博客上见到的采用xlrd模块来写excel，感觉很复杂。自己也编写了一个python脚本来完成相同任务，如下所示：

f = open('weather.data.example.txt','r')

f1= open('result.xls','w')

for line in f:

L = line.strip().split(',')

if len(L) < 12:

continue

date = L[0].strip()

time = L[2].strip()

temperature = str(L[8])

f1.write(date+'-'+time+'-'+'\t'+temperature+'\n')

print 'OK, finished'

利用list(set())和sorted处理列表或字典

1.利用list(set(a))来删除列表a或者字典中重复的元素。

>>> a=['2','2','4','3','4','0','1','1']

>>> list(set(a))

['1', '0', '3', '2', '4']

2.利用sorted 来排序字典a。排序原则：将列表b中元素按照其在字典a中对应值的大小来排序。Reference: http://wiki.python.org/moin/HowTo/Sorting/

>>> a={'a': '3', 'c': '2', 'e': '1', 'd': '0'}
>>> b=['a','e']
>>> sorted(b,key=a.__getitem__)
['e', 'a']

特殊地：
对字典a的健值列表(a.keys()) 按其对应值的大小进行排序，结果输出到列表中

>>> a={'a':'3','c':'2','d':'0','e':'1'}
>>> sorted(a.keys(),key=a.__getitem__)
['d', 'e', 'c', 'a']