Data Science for Mathematicians

Data Science for Mathematicians

The Transition of Mathematics into Code

Mathematics is a science that deals with the logic of different things like quantity, shape and arrangement. Mathematics is everywhere— and as it is all around us , it has become an important part of our daily lives. As society has evolved, so has its usage and application.

 

The evolution of mathematics has led to some problems becoming too complicated and time consuming for a human brain to solve alone, and therefore an increase in the demand for quantum computing has arisen. Even if a mathematician tried to solve these types of complex problems, they could never actually test whether they were correct in the real world e.g. predicting weather patterns and climate modelling. Therefore utilising computers to do this is the next obvious step.

 

Mathematics is the base component of all programming languages due to its structure and logical reasoning. Therefore, all languages have built in mathematical functions to make it easier to write mathematical expressions.

 

One of the most prominent programming languages that mathematicians use is Python.

Python’s Popularity

Python is an open source language that is extremely easy to work with as a programming language, and therefore has become popular amongst mathematicians and the data science community.

It contains many useful mathematical packages that are quite straightforward to understand and the simple syntax of Python helps them to be used freely.

The prominence of Python within data science jobs has increased over the past few years and you can now see more of these jobs on the market than all other jobs combined. This popularity can be noted when using LinkedInGlassdoorIndeed and other reputable job sites.

The reason it has become so prominent is because it’s actually developed online by volunteers and therefore belongs to a very large opensource community. Developers who use Python are able to focus more on problem solving, rather than on the code itself, due to its ease of use.

Python has an extensive standard library (covered in the following section) and is well suited to areas such as data science, web development and task automation.

Standard Libraries in Python

One of the reasons that Python is so popular is the substantial number of libraries available and how easy they are to use. Libraries are reusable pieces of code that a programmer can use multiple times with a single call – rather than writing all code from scratch. Libraries solve the first and foremost rule that a software developer needs to follow — code reusability.

The standard libraries are NLTK, TextBlob, NumPy, SciPy, Pandas, Matplotlib, Theano, Keras, TensorFlow, Scikit-Learn, Seaborn, and Plotly. All these libraries can be utilised by data science programmers to build their programmes.

Once a package is downloaded, importing the libraries from it is very simple. The image below shows how to import Linear Regression function to fit a model and predict the result.
Code-Snippet der Programmiersprache Python
Figure 1: Importing Linear Regression function ‍

Jupyter Notebook’s Usefulness

There are multiple platforms available but the most well-known one is Jupyter Notebook. Notebook allows a developer to author code in Python, and all the previously mentioned packages are available within it. It runs on the local host and allows the user to execute single lines one by one, rather than waiting to run the entire code in one go. This greatly assists with understanding what each line is actually doing, and also makes debugging much easier.

It is particularly effective for teaching purposes as students can immediately see what happens when changes are made within sections of code. Notebook is useful for research and development as it documents the code in a text format. It uses markdown functionality and packages like Pandoc and Nbconvert to convert between Notebook and formats such as Word, PDF and HTML.

Anaconda — an Open Source Platform for Scientific Computing

To create an environment where every person can easily integrate Python and its libraries into their workstation, Anaconda Inc. has created an open-source platform for scientific computing.

Its main aim is to simplify package management and deployment, and eliminate the need to install packages separately i.e. a distribution was created which includes data science packages suitable for Windows, Linux and macOS users that can be installed as one group rather than individually.

Anaconda comes with over 250 packages — many that are useful for data science programmers.
Screenshot showing some of the platforms available for download from Anaconda
Figure 2: Screenshot showing some of the platforms available for download from Anaconda
Data science uses both scientific and mathematical methods to extract information from structured and unstructured data. There are many online courses available for new developers to learn and then apply it to projects. Scientific libraries like Pandas, NumPy, Scikit-learn, SciPy, Seaborn and Matplotlib are the core topics for beginners to master. They provide the functionality to manipulate data in the desired format for analysis and visualisation.

Data Visualisation

Data visualisation is the graphic representation of data and is particularly significant because it makes data simpler for people to analyse i.e. It’s easier to see the trend of data on a line graph than just viewing the same numbers in a list.

Two essential Python libraries which are used for data visualisation are ‘Seaborn’ and ‘Matplotlib’. Importing these libraries is a very straight forward process i.e. “import matplotlib.pyplot as plt” and “import seaborn as sns”.

Aliases are used when importing these libraries. The ‘import’ alias is used to provide a shorter alternative to the library’s package name. The ‘as’ keyword is used to modify the names of modules and functions of Python. Aliases are very useful in modular programming, as then different names can be used for calling same functions from a package.
Code-Snippet der Programmiersprache Python
Figure 3: Code showing how to import packages and libraries in Python using aliases ‍

Multiple graphic displays can be made using these libraries e.g. bar plot, line plot, box plot, histogram, scatter plot, pie chart and heat maps.

Graph einer Statistik vom Verhältnis Höhe zu Gewicht, je nach Geschlecht
Figure 4: Example of a scatter plot created using Python

Python for Machine Learning and Deep Learning

When we talk about data science, it’s not just about utilising and managing data — it also refers to more complex scenarios that include machine learning and deep learning.

Machine learning is the way a machine ‘learns’ from data without specific programming or human intervention e.g. Inputting images of cats and dogs that have been labelled as ‘cat’ or ‘dog’, then having the machine learn the difference from that data provided. In order to utilise machine learning, a programmer must have a complete knowledge of Python and its libraries.

After this, they should move towards machine learning algorithms e.g. supervised learning algorithms (where data is labelled) like nearest neighbour and linear regression and unsupervised learning algorithms (where data is unlabelled) like k-means clustering and singular value decomposition.

Once a person understands these algorithms, they can move on to more complex problems using an advanced learning technique called ‘Deep learning’. Deep learning involves multiple layers of algorithms which each analyse the data provided. This network of algorithms is referred to as a neural network and basically operates in a similar way to the human brain.

Summary

Python programming is extremely useful for mathematicians as it helps them automate complex problems. Instead of manually working through every scenario/possible outcome, a mathematician can use Python to run simulations to check code, or even write the code and have the computer do it all automatically.

There are many programming languages available on the market but top tech companies like Facebook, Google and Amazon prioritise Python over other languages. This is due to its simplicity and rich set of libraries. Python helps data scientists apply machine learning and deep learning algorithms with ease and is a practical and exciting language to learn.

Recent Posts