Happy new year 2020! This is my first article post in 2020, as I completed my career transition as an excel analyst to R data analyst, I have met a lot of people who have a similar background who asked me “how did you become a data analyst?”, “can I become a data analyst too even though I am not a computer science graduate?”, “how should I begin?”, etc etc etc.
If you take a look at my LinkedIn profile, you will notice I have a weird background and oddly out of place for a data analyst. I graduated from political science and development studies, working in public relations firm, but somehow managed to pick up some working knowledge of a data analyst.
According to Stack Overflow 2019 developer polling, only 1.8% people with social science background ended up working for STEM jobs. I personally think data analyst jobs are the non-STEM-friendly jobs for a non-STEM graduate to enter the data professional market, the entry barrier is probably much lower compared to software developers because sometimes you do not need to learn how to code at all because of the popularity of Microsoft Excel.
Excel: a weird love story
Actually it’s neither as magical nor hard as it sounds. During the early days of my career, I actually had a short internship at an investment institution. Even though I didn’t continue my career as a financial analyst, I was taught by my coworker how to use vlookup and excel for financial analysis, which I kept using to manage my own investment portfolio. When I studied for my graduate degree, I also used Excel to do my data analysis which was conducted in mixed methodology method: qualitative coding and online survey. During my tenure at Edelman, I also did a lot of manual coding for sentiment analysis, data aggregation with pivot tables, and thematic coding for our clients during communication crisis.
However, I never explored the full extent of automation until I moved to my current office.
The experience was not the most pleasant, as the previous data analyst basically committed the excel users’ regular sins: copy pasting tables from various excel files into a single excel file which is referenced as a “database”. Moreover, the previous analyst did not use vlookup and spent a lot of time connecting one table with another using search button on pivot tables where the delimiter of the key has been separated. I am actually impressed that the previous data analyst could stay sane and worked with such method for 5 years, since it took me 3 months before I almost lost my sanity due to the amount of the manual labor. I had a very hard time meeting my coworker’s expectations to finish my reports on time (btw the previous data analyst is a graduate of computer science school, I am still amazed she didn’t code).
That’s when I finally googled the keyword in google: “Automation in Excel”, and I learned that you could actually automate your data cleaning in Excel using this add-on called Power Query.
After watching various tutorials here and there, I decided to implement Power Query in my work, replacing the manual labor with Excel. So whenever my manual labor has finished, I will go home and tried to reproduce what I did at the office, and it took me about 1-2 months before I was finally able to automate everything with power query. Finally I could breathe a sigh of relief, I finally had more time to create insight instead of racing with time to copy paste tables.
After using Power Query, I decided to up my game and learned Power BI as well, because it is very similar with Power Query, I had no trouble to switch to Power BI and learning to visualize my data and created a shared dashboard with my team as a part of transparency and accountability.
Now, at this level, I have entered a comfort zone as I only needed to press a button to do my data transformation, but because my office laptop is not that powerful (lenovo V310 with 8 GB ram), it often crashed, and I ended up hard resetting my laptop. Whenever I googled “data transformation”, the result was always “learn how to code!”. That was the moment when I wondered to myself “Me? Learn how to code? impossible, I probably wouldn’t be able to do it”.
However, after I read so many news here and there about big data and how excel analysts should pick up one or two languages to ensure they keep themselves updated with the market’s trend, I couldn’t help but to feel FOMO: If I don’t learn it, I may risk myself being irrelevant a few years from now on.
R or Python?
It was not an easy decision when I decided which first language I should learn. I was aware about Python popularity among tech companies as if it’s the magical language that runs in everything and that R is a very specific, niche language more catered to the academics and statisticians. I decided to learn R because I feel THIS is the language that should have been taught during my academic study, in my mind, I was 14 years late of learning R.
I took several R courses in Udemy taught by Kiril Eremenko from SuperDataScience, and was impressed by the course contents as the teacher made it as if learning a programming language “fun” and “easy”. It took me 3-6 months finishing basic and advanced R courses from super data science.
When did I find time to learn you ask? After office hours of course. There were days when I sacrificed my free time, learning how to code by watching the R courses till 11 PM during weekdays and weekends.
After finishing the course, I picked up Jonathan Ng’s Tidyverse course on Udemy and learned about the existence of tidyverse. That’s when things began to change. I found coding in R became so intuitive with tidyverse.
After learning enough basic R and Tidyverse I was finally confident enough to reproduce my work in Power Query with R. And yes, as usual, I used my after office hours and my weekend to code. It took a lot of trials and errors but I finally managed to automate my task with R. I still use excel to maintain a spreadsheet, as not everything can be automated and there are some manual labor to do such as data entry, but it’s quite manage-able.
Perhaps, my greatest regret is, I didn’t learn to code sooner.
Most of my “default” network aka my friends from university are political science graduates. We often joked that we studied political science so we didn’t need to learn maths or complicated stuffs such as coding. I graduated from international relations department. Only one teacher in the faculty had a quantitative economics background, and I only got one introduction to statistics class where we were taught to calculate stuffs using… papers (this was 2005).
Similarly during my time studying at Massey University, my lecturers were more proficient in qualitative and participatory research approach, and their paradigm is like “let’s not oversimplify development in numbers”. Well that’s quite a valid point, but after I graduated, I found qualitative research skills are valued less than quantitative research skills. My sister got her doctoral degree in psychology from the University of Queensland, but she didn’t learn to code either, so no one ever told me how useful it was.
However, after learning to code with R and Pandas, I think coding is not that incredibly difficult if you know what you want to do and know how to achieve them using the verbs. Just like learning a natural language, there is a learning curve. When I studied English for first time, I struggled to create a sentence because I couldn’t find the right verbs and nouns to form my sentence and express my opinion. It’s similar with formal language such as R or python: you need to know the right verbs and noun to create a sentence and create something out of them. In 2020, I decided to write more in python as I am confident with my proficiency in R, and it’s time for me to move out from another comfort zone.
I am not sure whether my lack of experience in coding is universal or not, because I noticed that there are some people in my network who actually graduated from political science and picking up R during their graduate study, but never used the skill on their career.
My biggest criticism is probably directed to my previous schools and all current schools who don’t teach students how to code: you are not equipping your graduates with the basic skills in the big data era. Ok, not everyone need to learn how to code, but I think it’s good to have a non-compulsory option to learn it so they can decide whether they want to join the market or not.
- Can a non-stem graduate learn how to code? Absolutely. Look at me, someone who has been avoiding programming like a plague for almost his entire life.
- Should I learn R and/or python immediately? Depends! I would recommend you to learn Excel first, and familiarize yourself with pivot tables, vlookup, power query, and power BI.
- Why should I learn Excel first if you say R/Python superior for data analysis? Because I personally think excel is still useful to visualize what you can do with R/Python. If you know how to do data transformation in power query, you could visualize what you are doing in R/Python with the user-friendly GUI. If you are feeling confident, no one forbids you to go ahead and learn R/Python.
- Should I learn VBA? Maybe not if you are not working in finance where VBA seems pretty popular. Microsoft didn’t seem to develop VBA anymore, and I personally think it’s a legacy language. I didn’t learn VBA because Power Query is enough to do everything.
- Which one should I pick? R or Python? I recommend to learn both and pick your “favorite”. I learned R first and then picked up tidyverse, and ended up liking R more. Python is more widely used and will open you for more job market. I coded in R exclusively in the past one year and now trying to polish my python coding skill, maybe it would have been better had I learned python first. Whatever language you choose, you must polish one language before learning another. I chose to code in python in 2020 because I gained enough confidence that I will not forget what I learn in R.
- Which courses / resources should I use? There are plenty of resources out there, choose the ones that suit your time and pocket. I prefer to buy from udemy as the courses seem more practical. Believe me you will spend a lot of time reading discussion in stack overflow when you are stuck.
- Is data analyst a fun job? Depends on your personality. I found data analysis and coding as “fun” as it creates a value from something untidy, just like cooking and building IKEA furniture. For some people it’s a pure torture. I know some people who learn R during their undergrad or grad school but didn’t use it since they are not interested with data analyst jobs, and they don’t feel pressure to become a data analyst at all.
So tell me, as a non-STEM graduate, are you still interested to walk in the uncharted territory?
crossposted from my linkedin article