Sometimes when dumping data, it makes more sense from a performance perspective to not worry about removing duplicate data when constructing a SQL query and do it after the fact instead. This is accomplished really easy on the shell as follows:
cat filename.csv | sort --buffer-size=32M | uniq > filename_uniq.csv
You can omit the buffer-size argument to sort in favor of the default size or set it to whatever you want.
Thursday, March 27, 2008
Subscribe to:
Post Comments (Atom)
1 comment:
Even better, sort has a -u option so you don't have to use uniq.
Post a Comment