Within the past decade, advances in data-driven technologies have fundamentally transformed the corporate landscape. In the past, most business problems could be tackled in Microsoft Excel - today this is no longer the case. There is simply too much data which is only accessible via programming, be it querying SQL databases or key-value stores, fetching data from APIs, scraping data from websites or downloading files from web servers. Transforming the data increasingly requires programming as well, such as parsing text files or applying complex analytical functions. Sometimes the size of data alone is the limiting factor - Excel for instance supports no more than 1,048,576 rows. As time goes on, programming will be the only way to adequately manage and analyze all of this data.
Despite this structural shift in how businesses approach problems, many firms - especially large, established ones - have been slow to extricate themselves from their dependence on Excel. Barring rare exceptions, there is little motivation to train analysts with a 21st-century skill set.
But while there is no direct cost of increasingly obsolete analytical skills, there is an enormous opportunity cost. Generalist analysts (such as investment banking analysts, management consulting analysts, business analysts, etc.) spend hundreds of hours manually traversing and copying data from flat files or PDFs, updating Excel workbooks and aggregating data found across remote corners of the Internet. In short, these are highly manual, highly automatable tasks. Business insights are overlooked when analysts allocate more time to manual work than analytical work. Even worse, analysts are prone to committing costly mistakes due to the sheer volume of manual labor.
The ability to program therefore offers a sizable multiplier on analyst productivity. After all, these are the same savvy analysts now technically equipped to automate otherwise tedious and redundant work. Employers need not (and should not) hire programmers to fill this role - rather, they must train their business-minded graduates to program. They must combine the business analyst skill set with the data analyst skill set. They must teach their analysts to code.
From point-and-click to type-and-enter
Before diving into the exact programming skills required for today's data analyst, it's important to understand conceptually why programming is so important. Using programming and data languages offers the following benefits:
- Adaptability: should the underlying dataset or schema change, the analysis can be quickly reproduced
- Replicability: there is a clear trail of what operations were performed on the data (by contrast, Excel's row or column deletion are "destructive," irreversible operations)
- Modularity: frequently used data operations can be applied to new projects (eg. timezone conversion, report generation, FX conversions)
- Scalability: data can be distributed across file systems (eg. HDFS) and databases (eg. Amazon Redshift), as well as analyzed across multiple cores. This not only means larger data sets can be accessed, but also analyzed at a faster speed
- An open source community: the vast majority of useful libraries for data analysis are open-source and actively maintained (eg. pandas, sci-kit learn, dplyr, ggplot2), meaning they generally get new features and bug fixes faster than closed-source counterparts (eg. Tableau, Excel)
- Systematic validation: unit tests can be used to validate the data at various stages along the pipeline (eg. ingestion, transformation)
- Advanced statistical modeling: most complicated and novel statistical models are not offered by closed-source software
- Complex data transformations: unlike software with a UI, whose data operations are constrained to what the core developers expose, programming languages have no limitation on what you can do with the data (eg. up/downsampling, string manipulations)
- Rapid visualization: easily facet the data across different dimensions and produce graphs without transforming the underlying data
While these benefits primarily concern data analysis, there is an added bonus: programming facilitates an understanding of non-analytical fields in data science, such as data engineering or webapp development. This "vertical integration" along the data science stack is non-trivial - it means you are able to manage your data from start to finish.
Hard technical skills for the modern business analyst
Data analysts should be able to tackle any business problem which involves, perhaps unsurprisingly, both (1) data and (2) analysis. For example, if data is stored in a database, then a programming language alone is insufficient to solve the business problem - a data language is required. Therefore, proficiency in these technologies (ordered from most to least commonly used) should cover the vast majority business problems faced by analysts:
- Microsoft Excel for basic data operations, formatting and organization
- SQL for large-scale data storage and querying
- A programming language for data-driven workflows
- A version control system (VCS) for saving periodic work revisions
- Bash scripting for file system operations and navigation
- NoSQL for schemaless data storage and fetching
- A business intelligence (BI) tool for analyzing and visualizing data
- Web development for sharing data analyses and visualizations (optional)
- Big data familiarity (this is a broad topic which I explore in a separate post)
I hope this post clarified the necessity of programming in solving business problems. Business analysts will always be on the front lines when it comes to working with data, and as a result they should be equipped with the right tools to handle the data.
These data analysis skills need not replace their business acumen - on the contrary, learning to code will only enhance it.