In the world of data, CSV is the cockroach of file formats. It's simple, resilient, and seemingly impossible to kill off. While flashier formats have come and ...
While CSV and Excel can be enough for many kinds of data-handling, it is also very error-prone. Handling of commas and quotes is tricky, and excel adds another layer of complexity by randomly interpreting data as dates or currency or whatever.
Most (not all) of the interpretation issues in Excel are because people don’t know how to use it and click ok without reading message boxes.
Modern Office 365 Excel on the desktop asks if I want to convert numbers as text with leading zeroes into numerals (eliminating the leading zeroes) whenever I open a csv that has them in it, and has had this feature for at least a month now. You just hit no.
For more complicated data, open a new spreadsheet, go to the data tab, import from text or csv, select your csv. In the data import wizard thing that pops up, select every column and set format to text. Boom. Problem solved, no data mangling. Delete the link back to the source csv so there’s no weird sync being attempted, and the data is just flat data in your spreadsheet in table form.
For commas in “cell” contents in your source csv file, wrap the contents in quotes and excel won’t treat the comma within as a column separator. Exporting csvs from PowerShell does this automatically for string data.
Personally, I always try to keep the flat csv output as a separate copy from the xlsx file I format for human reading. Csv for at rest data storage, xlsx for display. Non-cosmetic edits get worked back into the program generating the csv, or I whip up a basic PowerShell script to import the csv data in, work with it as objects, then export back to csv.
Mixing the two use cases of display and data processing is where the footguns are all hidden. Business users absolutely deserve a better set of tools than Excel for data manipulation. It works so much better as just formatting.
You’re absolutely right. I only learned recently that there are different import options for CSV, and just opening the file doesn’t give all the options. You have to go to data -> import to have all the extra options, like setting all the desired data types.
While CSV and Excel can be enough for many kinds of data-handling, it is also very error-prone. Handling of commas and quotes is tricky, and excel adds another layer of complexity by randomly interpreting data as dates or currency or whatever.
Please pardon the ramble.
Most (not all) of the interpretation issues in Excel are because people don’t know how to use it and click ok without reading message boxes.
Modern Office 365 Excel on the desktop asks if I want to convert numbers as text with leading zeroes into numerals (eliminating the leading zeroes) whenever I open a csv that has them in it, and has had this feature for at least a month now. You just hit no.
For more complicated data, open a new spreadsheet, go to the data tab, import from text or csv, select your csv. In the data import wizard thing that pops up, select every column and set format to text. Boom. Problem solved, no data mangling. Delete the link back to the source csv so there’s no weird sync being attempted, and the data is just flat data in your spreadsheet in table form.
For commas in “cell” contents in your source csv file, wrap the contents in quotes and excel won’t treat the comma within as a column separator. Exporting csvs from PowerShell does this automatically for string data.
Personally, I always try to keep the flat csv output as a separate copy from the xlsx file I format for human reading. Csv for at rest data storage, xlsx for display. Non-cosmetic edits get worked back into the program generating the csv, or I whip up a basic PowerShell script to import the csv data in, work with it as objects, then export back to csv.
Mixing the two use cases of display and data processing is where the footguns are all hidden. Business users absolutely deserve a better set of tools than Excel for data manipulation. It works so much better as just formatting.
You’re absolutely right. I only learned recently that there are different import options for CSV, and just opening the file doesn’t give all the options. You have to go to data -> import to have all the extra options, like setting all the desired data types.