The Mathematician behind Data Scientist!!

As the field of Data Science is becoming more and more popular, we see more and more enthusiasts seeking the dream of becoming Data Scientist or trying to enter into this segment. However, the biggest challenge still remains for an entry level Data Scientist enthusiast is “How much Mathematics?”. Data Science is the new new. There are numerous blogs and online write-ups complemented by the online courses on the subject of Data science. The Internet is flooded with resources and courses such as Become a Data Scientist in a jiffy, learn data science etc, but the only thing most of us don’t really understand is where to start our learning. Even though there are learning paths, elaborate steps etc. available, it is actually very difficult to stop oneself from browsing more and more and getting confused as to which path to follow that inturn often leads to procrastination and hence becoming difficult to even start. Below is a screenshot to a Quora question, and you can clearly notice the numerous related questions on the subject.

jyoti

How good you need to be in Mathematics in order to become a Data Scientist?

As I say all this, I don’t claim myself to be an expert or a data scientist. I am one of those enthusiasts who just crossed that initial stage. In other words, I do have some knowledge but need to learn a lot more and it seems like a never ending sea. When you get into this field, you will realize that everything you learn leads to some other concept and you will feel a novice all along. People who are into the field will probably understand the feeling better. Having said that, I have no intentions to discourage people who want to become the most sought after tech people in the industry, but yes, if you are planning to enter the field, be prepared to feel like a lifelong student and be ready to keep updating yourself to be at the cutting edge. Data Science is an active research field with loads and loads of research papers being published every day. It is hard to keep track of algorithms becoming outdated with every new research that comes in. However, there is one thing that remains the same behind all the change and updates and that is the Mathematics behind the algorithms. So that is a good place to start. Mathematics is the basic foundation of data science and if you skip that part, trust me, you would never feel comfortable with the subject.

Now one would think that he/she has learnt a lot of mathematics in school/college days and just need to maybe revise the concepts and is good to go. Or there would be another category of people who would find it overwhelming to start at the basics and reach to the advanced mathematics in a short span. (I used to be in the second category :|).

Now my advise to the first category would be – Great work guys!! If you have that confidence I salute you. Because I do not remember more than 10% of the mathematics that I was taught at school. (I was a lazy student back then. Ok. Maybe even now I’m no better!! ;]). However, you do need to go through all the concepts with a different mindset here. We are not going to solve equations etc (Computers are made to do that part for us). So it is better to take an application oriented approach while revisiting the concepts. (I assure you that you have already learnt more than 80% of the required basic mathematics in some or the other form if you have a Science background).  For every mathematical concept you read, try learning the applications of the same in Data Science. There are many common intuitions which have some heavy terminologies which you are expected to know. For example – We all know that division by zero is invalid in maths, so is operations with infinity. Also, we work with approximations like “- 0.00000000000004 ~0” or “infinity + 3 = infinity” in real mathematics. This is a very common intuition. However, such numerical computations are given terms like underflow and overflow. So if you go through the applications of the concepts in data science, you will have a thorough understanding which will help you a lot while digging further into algorithms and applying tweaks etc.

2010d40011bc590a

Source: (http://davidmlane.com/hyperstat/humor.html)

Coming to the second category, let me tell you, I know the pain. I have felt it myself. The field of mathematics is huge and we feel the time to learn is limited. But, do not worry. I will list some resources, that you can use for grasping the concepts and applications in short amount of time. But the key is persistence. And in this case too, avoid just cramming up the concepts but go in details about the applications. The applications will make you understand even the concepts in a better way. I am listing the most basic branches and few topics from the same together with the links below which will  help you get started. And who knows, once you finish this, you might already know most of the data science basic algorithms already.

Know these Concepts

  • Linear Algebra
    • Scalars, Vectors, Matrices, Tensors
    • Types of Matrices
    • Trace, Span, Determinant and Rank of Matrices
    • Norms
    • Eigenvalues and Eigenvectors
    • Eigendecomposition
    • Singular Value Decomposition
  • Probability
    • Random Variables
    • Probability Distribution Functions (Discrete and Continuous)
    • Marginal and Conditional Properties
    • Chain Rule
    • Expectation, Variance and Covariance
    • Independence and Conditional independence of events
    • The Central Limit Theorem
  • Information Theory
    • Common Probability Distributions
    • Properties of Common Functions
    • Bayes’ Rule
    • Technical Details of Continuous Variables
  • Optimization
    • Gradient-Based Optimization
    • Constrained Optimization
    • Lagrange Multipliers

P.S.:- This is not an exhaustive list and in no particular order.

Resources needed to give you a head start

Finally, about the resources. There are lots of resources readily available out there, which can help you to go to that extra mile to make you learn as you want to. One such website that I am totally in love with is https://www.metacademy.org/. They have worked really hard to figure out the knowledge graphs together with there resources. Apart from that, there are a lot many courses available online on Data Science to learn and understand what’s inside a Data Scientist. A list of such courses is provided here.

This is my attempt to provide introductory help to be a Mathematician before being an actual Data Scientist. So I have mostly skipped many useful advance concepts in this article. Follow the above mentioned concepts, it should not take you long before you will yourself figure out what all advance concepts you need to know using the resources available online.

It takes a lot of hardwork, courage and patience to become a Data Scientist. The field does not let you rest. You need to be transcending in order to be a Data Scientist. All these things put together makes Data Scientist the most demanding job profile in the world.

P.S.:- This is my first attempt at tech-blogging and I tried to keep things as simple as possible. I would appreciate if you could provide your valuable feedback/suggestions on the same. Also, stay tuned for more at ParalleDots Blog


ParallelDots AI APIs , are a Deep Learning powered web service by ParallelDots Inc, that can comprehend a huge amount of unstructured text and visual content to empower your products. You can check out some of our text analysis APIs and reach out to us by filling this form here or write to us at apis@paralleldots.com

Leave a Reply