A CSV file is a text file that uses commas to separate values into columns. This data format has been around for along time and is extremely easy to use, but is also widely misunderstood. With such a simple definition you might be wondering why I’m writing this article. That, or you’re one of the people I work with on a daily basis who are utterly confused when I suggest that a CSV file is anything other than a Microsoft Excel file.
Before I go any further, let’s look at some content that would be typical of a CSV file:
The example above brings me to the first point I’d like to make and it’s one I already made in my opening statement:
A CSV file is a text file.
A text file is a file that contains nothing other than plain old text. Just good old letters, numbers and symbols with no formatting, images or anything fancy. Granted, some text editors (the programs used to read text files, like the infamous Windows Notepad displayed above) add quite a few bells and whistles such as colored text, but for all they do, they’re still aimed at one simple target, the plainest of all files, even if you add a color or two, the text file.
Personally, I think this fact is most often obscured by a major player in the CSV file reading game: Microsoft Excel. As far back as I can remember, computers with Excel installed on them would light up CSV files like Christmas trees and jump at the opportunity to open them when double-clicked. This brings me to my second point:
Be cautious when opening CSV files without a text editor.
Microsoft Excel and their ilk are much more insightful tools than text editors. They’ll try to make sense of the data you’re presenting to them. Give them a value made up exclusively with numbers and they’ll think you’re providing a literal number even if you’re not. Perhaps the following scenario sounds familiar to you:
- You’re given a CSV file that contains values made up purely of numbers that are not actual numbers (e.g., SSNs, employee IDs, etc). Some of the values having leading zeroes.
- You double-click the file and it opens in Excel.
- Upon inspecting the data, you find that leading zeroes, legitimate characters that should be a part of your data collection, have disappeared.
What happened? Excel (or your spreadsheet application of choice) read your data and formatted it as best it could. Would the same thing have happened in a text editor? Absolutely not. So, with a text editor, you get an unadulterated, true vision of the data contained within your CSV file, but with Excel, you don’t.
It’s important to note that despite its flaws, opening a CSV file in Excel, even one that might have trouble in some areas, is something you can do if there are benefits you need to to reap (e.g., if you need to sum a set of numbers). Just don’t save your work. Optionally, if you’re required to use a CSV file in Excel and want to have more control over it’s formatting, consider using the Text Import Wizard. With that, my third point surfaces:
The CSV file extension is more aesthetic than useful.
As I’ve implied throughout this article by constantly equating a CSV file to a text file, you don’t need a CSV extension to have a CSV file. The content of a text file determines whether it’s a CSV file or not, not the extension. So, sure, someone might tell you that they require their CSV files to have a CSV extension, but they’re surely not doing that because there’s a truly legitimate need for it. Instead, someone just put themselves in an awkward box that will limit their accessibility for absolutely no good reason not to mention cause confusion and prompt people like me to write articles like this.
Before you go…
There are some other considerations you should make in regards to using CSV files and even in spite of them:
- Qualify Real Commas – What happens when you need to use a real comma? Qualify the value by wrapping it with something like double quotes, e.g., My Company, Inc. can be written as “My Company, Inc.” Doing so would tell programs reading your data not to consider the comma within the quotes as one that’s separating two different values.
- Commas Aren’t the Only Delimiter – Delimiting data can be done in much more efficient terms than with commas. Tabs, pipes, tildes and any other lesser used characters are probably better delimiters than commas should you have that choice.
- Other Data Formats – Flat files that use a single delimiter to separate one value from another are an aging breed. Newer formats like XML and JSON offer so much more both in terms of clearly identifying values as well as adding a whole new world of possibilities to the dataset as a whole. The next time you have an opportunity to select the data format you work with, considering trying one of them.