My most recent task: extract several columns from a fixed width file with over 22 million lines, remove leading zeros from two of the columns (making them a proper number) and write the results to a new CSV file.
Here's what I came up with -- it's quick and dirty, but works.
Python code:
import re
i = open("zipcty10")
o = open("10.csv","w")
o = open("10.csv","w")
line = i.readline()
while line:
print >>o,line[0:5] + "," + re.sub(r"^[0]*","",line[15:19].strip()) + "," + re.sub(r"^[0]*","",line[19:23].strip()) + "," + line[23:25] + "," + line[25:28] + "," + line[28:53].strip()
print >>o,line[0:5] + "," + re.sub(r"^[0]*","",line[15:19].strip()) + "," + re.sub(r"^[0]*","",line[19:23].strip()) + "," + line[23:25] + "," + line[25:28] + "," + line[28:53].strip()
line = i.readline()
i.close()
o.close()
o.close()
0 comments:
Post a Comment