An open letter: Why analytics organizations in Southeast Asia must leave SAS behind to progress

Nick Huber
8 min readFeb 25, 2020

After I graduated from college in 2009 with my shiny, near-top-of-my-class degree in Economics from Harvard, I took a 3-month internship in Facebook’s Monetization Analytics team. My newly formed team’s core responsibility was to use various Facebook data sources (e.g. engagements, clicks, surveys, status updates, etc.) to analyze the effectiveness of large brand’s (e.g. Pampers, Gatorade, etc.) Facebook campaigns, and to share these insights with our advertisers.

At first, everything was going so well. My manager was a caring, details-oriented analyst with past experience at Google and an adorable dog. The canteen served scrambled eggs every morning and wonderful brown sugar lattes in the afternoon. I was ready to slice and dice some big data into meaningful stories for our clients. Then I discovered that the closed-source, proprietary languages that I had spent so much effort mastering in my undergrad were completely unused at Facebook. Rather than commercial analytics tools like STATA or SAS, we were instead expected to learn and use Python or R, the core tools that were already the lingua franca of the burgeoning field of data science.

Since teaching myself Python and becoming a rabid enthusiast and practitioner over the next 10 years, I have to admit my former Facebook team was right — these open languages (let’s save the Python vs. R debate for another time) are vastly superior — and, in this post, I seek to convince business, analytics, and IT executives in the Southeast Asia of their critical strategic value over their closed-source, commercial alternatives. In this post, I mostly focus on the comparison of Python vs. SAS specifically, as the former has become my analytics language of choice and the latter is still hugely present across Southeast Asia.

The core deficiencies of SAS vs. Python/R

There are so many reasons to favor Python/R over SAS — take your pick!

#1: Anything you can do in SAS, you can do in Python/R — and more.

Essentially the only people that develop new features, interfaces, and libraries for the SAS language/platform are the SAS Institute developers. In contrast, millions of users all around the world write, contribute, test, and extend Python and R every day through their most popular packages/modules. For instance, it took SAS more than 10 years to incorporate random forests, a standard data science technique, into their basic toolkit, as compared to one year for R:

Random forests is a case in point. Breiman and Cutler published their seminal article describing the technique in October, 2001; the following year, they published the randomForest package in R. In December, 2012, SAS released an “experimental” version of what it calls “HP Forests” in SAS High Performance Analytics, and in 2013 included the PROC in SAS Enterprise Miner 13.1.

In Python, when a new deep learning framework is discovered, there are contributors all around the world racing to be the first to implement and share it in PyTorch or an equivalent library. When developing variants of neural network architectures got too cumbersome, Keras was developed to provide higher-level objects to more easily design these models, becoming the “grammar of neural nets” that other libraries later freely copied (with no complaint from the Keras authors that I’m aware of). Core data science Python libraries (numpy, scipy, pandas, jupyter, matplotlib, astropy) were even used to capture the first human-visible picture of a black hole last year. Simply none of this happens in the SAS ecosystem!

#2: Python/R leverage large ecosystems of tools built on top of them, giving their users superpowers!

Beyond the many contributors to the core modules of Python/R, these languages also have a vibrant ecosystem of tools on top of the language itself that make them even more powerful.

For instance, the Jupyter notebook was originally developed for Python and has since become a core tool for interactive data exploration and modeling, allowing an analyst to see the results of their code execution interactively. Google Collaboratory, ML Flow, Weights and Biases, Mode Analytics, and a whole host of other browser-based editors have further built on this core abstraction, adding in their unique combination of collaboration, logging, and versioning features. None of them run or have a SAS option. Airflow jobs, a core tool for orchestrating ETL jobs across multiple sources, can be templated and deployed in Python, but not in SAS.

On a practical level, this combination of a more expressive language and a vibrant ecosystem of related tools means that it was common in my previous team for a junior data scientist, in 4+ weeks, using standard Python tools to be able to improve on a model or line of analysis that a team of 3 analysts using SAS had spent 6–12 months unsuccessfully trying to improve. Your mileage may vary, but, at least, this has been mine.

#3: It’s completely unclear how to scale big data jobs in SAS.

Because SAS is a proprietary language, it’s non-trivial to scale up to more machines if you wanted to, for some reason, run a bigger data processing job in SAS than a single machine can handle. Server pricing isn’t even specified on the SAS website, and is apparently only provided on a custom quote basis. In contrast, for Python (and its associated most common modules), anyone can get an instance running in minutes on their favorite cloud platform, be it AWS, GCP, or Azure. While for truly very large data processing jobs, you’ll likely leave both Python and R behind in favor of the more specialized massively parallel processing languages/frameworks, you’re certainly in better shape with an open language than with SAS if/when this becomes a requirement.

In writing this blog post, I attempted to write some SAS code again to do a side-by-side comparison of implementing a few standard benchmarks, and subsequently discovered it doesn’t even run on MacOS!

#4: SAS is horrifically expensive vs. open-source.

SAS nominally starts $8,700 per user per year for its most basic version, but common to spend $10–15k/seat/year for extra features. However, it even gets worse — prices typically rise 20–30% after the first year, as teams and their processes get locked into using SAS. Frustratingly, even within organizations, it’s very common for only 1–2 people to truly understand the SAS budget line item, as SAS pricing is so opaque — even the SAS How to Buy website frustratingly emphasizes that “one size does not fit all.”

From talking to analytics leaders at banks and telcos across Southeast Asia, I’ve gathered it’s not uncommon for teams of 30–100 analysts to be spending $250,000 to $1 million/year on SAS.

Don’t get me wrong, there was a time and place for SAS. And during the time of client-server computing they were instrumental in making usable analytics software available to large crowds of analysts. But in this day and age of cloud-computing no aspect of the SAS technology justifies this massive recurring investment in my opinion. Much more value (and at a lower cost) can be unlocked with modern, open-source equivalents.

It should be mentioned that open-source technologies like Python/R aren’t completely free of course: you still need some virtual or physical hardware to run them and a handful of managed services fill in and improve upon gaps in the SAS suite, but they are peanuts compared to an analytics team’s typical yearly SAS license fees and opportunity cost in lost productivity and man-hours.

If SAS is outdated, why is it still so prominent?

SAS has grown in Southeast Asia mostly through four key pillars: its penetration into education, effective enterprise sales strategies, brand strength/recall, and inertia.

Education

Education was always key to SAS’s growth strategy, perhaps knowing that the tools students learned in school would later create demand for these tools in student’s future workplaces. In 1997, the SAS Institute created an education division to focus on education partnerships and, in 2000, ran a $30 million nationwide television and radio campaign to advertise SAS’s free licenses it had so generously provided to North Carolina schools. [link] After the 2000s, SAS seemingly used a similar growth strategy to grow outside of the US, and has since developed a geographically diverse revenue base (e.g. giving free/discounted SAS licenses to colleges and universities in SEA), with ~40% of its revenue coming from Europe and ~15% from Asia in 2018.

Sales strategy

Arguably SAS has had a very successful enterprise sales strategy across Southeast Asia focused on building a strong community of SAS users

  1. provide free or discounted books and training materials on data transformation to analytics and IT teams that focuses on the role of SAS
  2. provide free or discounted licenses to schools and universities to generate demand/pre-employment vendor lock-in

Open-source solutions and web-based solutions historically couldn’t offer this high-touch education/enterprise sales experience.

In recent years, SAS has not taken the rise of open-source technologies lying down. While open-source software such as Python, R, Airflow, Spark and too many others to mention have played a foundational role in modern big data, massively parallel infrastructures and advances in machine learning and artificial intelligence, SAS has taken the opposite tack and even hired communications firms to issue a report warning CIOs on the dangers of having too much open-source in their technology stacks…all while packaging open-source software with its core offerings.

Brand

SAS was able to become synonymous with data analytics in the 1990s-2000s when this first was just emerging as a critical area for investment with businesses. After years of enterprise sales investments, lots of CIOs/CEOs naturally thinking analytics = SAS, as in “Oh, do we do analytics? Yeah, we have a great relationship with SAS.”

Inertia

We all hate change! Whether it’s people starting new work-out routines, trying to stop smoking, or organizations trying to go digital, there is a large activation energy threshold to any kind of change, moreso the larger the organizations. Even if the C-suite starts to suspect SAS might not be worth the investment, any good leader knows the difficulty of any transition is almost always under-estimated in terms of complexity and cost. Further, perhaps proposing such a transition historically didn’t justify the risk.

Throwing down a challenge

Based on past experience and conversations with the largest analytics-consuming firms in the Southeast Asia (e.g. banks, telecom operators, life science and pharmaceuticals, etc), SAS revenue in the region will be in the millions of dollars. And I have yet to see a use case or SAS feature that could not be delivered faster, cheaper, and better but equally secure on Python/R tools.

With this in mind, I would like to throw down a challenge — I am issuing an open promise to any analytics and IT leader in the Southeast Asia: If you are willing to open up about your licensing costs (no judgment!) and team processes, use cases, and requirements, I am happy to respectfully partner with you to deploy (1) significantly better analytics capabilities, (2) that your team will love, (3) and will cost your organization 50 to 90% less vs. your existing closed-source analytics software suite, simply by leveraging a combination of modern, open-source, and cloud-/web-native solutions tailored to your specific needs.

It’s difficult transitioning towards new tooling (I had to do it myself in the decade after Facebook), but it’s so, so worth the effort — for yourself and your organization.

--

--

Nick Huber

Hi, I’m Nick! 👋, a self-taught data scientist 📈, programmer 🖥️, and part-time investor 💵. VP at Thinking Machines, prev data science at Airbnb, Quora, FB.