Data Science for Mathematicians
The Transition of Mathematics into Code
Mathematics is a science that deals with the logic of different things like quantity, shape and arrangement. Mathematics is everywhere— and as it is all around us , it has become an important part of our daily lives. As society has evolved, so has its usage and application.
The evolution of mathematics has led to some problems becoming too complicated and time consuming for a human brain to solve alone, and therefore an increase in the demand for quantum computing has arisen. Even if a mathematician tried to solve these types of complex problems, they could never actually test whether they were correct in the real world e.g. predicting weather patterns and climate modelling. Therefore utilising computers to do this is the next obvious step.
Mathematics is the base component of all programming languages due to its structure and logical reasoning. Therefore, all languages have built in mathematical functions to make it easier to write mathematical expressions.
One of the most prominent programming languages that mathematicians use is Python.
Python’s Popularity
It contains many useful mathematical packages that are quite straightforward to understand and the simple syntax of Python helps them to be used freely.
The prominence of Python within data science jobs has increased over the past few years and you can now see more of these jobs on the market than all other jobs combined. This popularity can be noted when using LinkedIn, Glassdoor, Indeed and other reputable job sites.
The reason it has become so prominent is because it’s actually developed online by volunteers and therefore belongs to a very large opensource community. Developers who use Python are able to focus more on problem solving, rather than on the code itself, due to its ease of use.
Python has an extensive standard library (covered in the following section) and is well suited to areas such as data science, web development and task automation.
Standard Libraries in Python
The standard libraries are NLTK, TextBlob, NumPy, SciPy, Pandas, Matplotlib, Theano, Keras, TensorFlow, Scikit-Learn, Seaborn, and Plotly. All these libraries can be utilised by data science programmers to build their programmes.
Once a package is downloaded, importing the libraries from it is very simple. The image below shows how to import Linear Regression function to fit a model and predict the result.
Jupyter Notebook’s Usefulness
It is particularly effective for teaching purposes as students can immediately see what happens when changes are made within sections of code. Notebook is useful for research and development as it documents the code in a text format. It uses markdown functionality and packages like Pandoc and Nbconvert to convert between Notebook and formats such as Word, PDF and HTML.
Anaconda — an Open Source Platform for Scientific Computing
Its main aim is to simplify package management and deployment, and eliminate the need to install packages separately i.e. a distribution was created which includes data science packages suitable for Windows, Linux and macOS users that can be installed as one group rather than individually.
Anaconda comes with over 250 packages — many that are useful for data science programmers.
Data Visualisation
Two essential Python libraries which are used for data visualisation are ‘Seaborn’ and ‘Matplotlib’. Importing these libraries is a very straight forward process i.e. “import matplotlib.pyplot as plt” and “import seaborn as sns”.
Aliases are used when importing these libraries. The ‘import’ alias is used to provide a shorter alternative to the library’s package name. The ‘as’ keyword is used to modify the names of modules and functions of Python. Aliases are very useful in modular programming, as then different names can be used for calling same functions from a package.
Multiple graphic displays can be made using these libraries e.g. bar plot, line plot, box plot, histogram, scatter plot, pie chart and heat maps.
Python for Machine Learning and Deep Learning
Machine learning is the way a machine ‘learns’ from data without specific programming or human intervention e.g. Inputting images of cats and dogs that have been labelled as ‘cat’ or ‘dog’, then having the machine learn the difference from that data provided. In order to utilise machine learning, a programmer must have a complete knowledge of Python and its libraries.
After this, they should move towards machine learning algorithms e.g. supervised learning algorithms (where data is labelled) like nearest neighbour and linear regression and unsupervised learning algorithms (where data is unlabelled) like k-means clustering and singular value decomposition.
Once a person understands these algorithms, they can move on to more complex problems using an advanced learning technique called ‘Deep learning’. Deep learning involves multiple layers of algorithms which each analyse the data provided. This network of algorithms is referred to as a neural network and basically operates in a similar way to the human brain.
Summary
There are many programming languages available on the market but top tech companies like Facebook, Google and Amazon prioritise Python over other languages. This is due to its simplicity and rich set of libraries. Python helps data scientists apply machine learning and deep learning algorithms with ease and is a practical and exciting language to learn.