Wednesday, September 21, 2011

Parsing Files with Python

Recently, I've had to work with a lot of data stored in various text files: comma and tab delimited, fixed width, etc... So I decided, based on a little research, Python would be my tool of choice to parse and manipulate these files prior to inserting them into an Oracle database.

My most recent task: extract several columns from a fixed width file with over 22 million lines, remove leading zeros from two of the columns (making them a proper number) and write the results to a new CSV file.

Here's what I came up with -- it's quick and dirty, but works.

Python code:

import re
i = open("zipcty10")
o = open("10.csv","w")

line = i.readline()
while line:
    print >>o,line[0:5] + "," + re.sub(r"^[0]*","",line[15:19].strip()) + "," + re.sub(r"^[0]*","",line[19:23].strip()) + "," + line[23:25] + "," + line[25:28] + "," + line[28:53].strip()
    line = i.readline()
i.close()
o.close()

0 comments:

Post a Comment