An Introduction to Data Science and Real-World Applications

Data science combines mathematics, statistics, and computer science, in a way that helps identify patterns within data and draw insights from it. From this, data can be modelled to solve real-world problems.

What is Data Science and why is it important?

The advancements in technology and the ability to collate large amounts of data has made data analysis an easier task. Both individuals and organisations can now access real-world data with ease and this can be used to gain critical insights e.g. Organisations can gather data on consumer spending habits and use this to plan product releases.

The use of data includes not only data analysis by creating dashboards and reports, but also data extraction from multiple sources and the cleaning of that data to drive important knowledge for clarity and interpretation.

In the current job market, data science skills such as hands-on programming tools, statistical knowledge, visualisation of data and networks, machine learning and deep learning are in high demand. Jobs utilising them form the largest employable job sector with an annual hiring growth of 74% and three times as many job postings than job searches.

Data science has a versatile nature and every industry would benefit from a data scientist to interpret the data that is applicable to their field. Industries are now understanding the importance of data scientists and are hiring them not just to examine the patterns in their data, but also to enrich the usage of data which previously left unanalysed.

Uses in the Real World

Data science is constantly evolving and assisting industries to maximise their capabilities. Every industry can utilise data science in multiple ways.
Let’s look at seven key industries and view their current and future capabilities.

1. Security Industry:

The establishment of digital trust and safety is considered the most important requirement in industries that provide products and services.

Fraud detection is of key concern for banks, businesses, and security departments. Data science and machine learning algorithms help companies detect fraud early and prevent them from occurring. These firms use insights from data to prevent customer account hacking, account takeover (taking over a customer’s online account e.g. credit card by using legitimate details), payment fraud and any customer trust violation incidents.

The use of intelligent cameras with 3D human pose estimation that can analyse pose and movement and contain behavioural pattern recognition technology help these industries prevent crimes on a large scale.

JP Morgan Chase, one of the leading banks today, released a case study explaining their use of data science. They have gathered an astounding 150 petabytes of data from 3.5 billion current users which provides them with some staggering information. This has contributed to them being chosen by the US government to assist with financial economic statistics, based on the data they have collected and analysed from their customers and customer transactions.

2. Agricultural Industry:

The introduction of data science to the agricultural industry provides information for farmers to learn more about the crops they produce, the management of crop diseases and pests, and the way climate change affects their crops. This in turn, increases crop yield.

Agrosmart, a Brazilian company specialising in collecting data through Internet Of Things (IOT), uses that unstructured data to assist farmers by cost-effectively controlling pests with minimised environmental effect.
Developments in satellite imagery and image processing lets farmers analyse the condition of plants and crops in real time and therefore react more quickly to problems that may lead to a poor yield.

Farmers Edge is a Canadian company known for developing data-driven technologies that help farmers run efficient operations while producing more food. They use satellite imagery and data from more than 5,000 connected weather stations to get daily updates about the potential problems that could affect their registered farmers’ yield. Innovation and advancements in technology have boosted their production of crops over the years but the introduction of advanced analytics and predictive modelling has increased production exponentially.

3. Healthcare Industry:

The healthcare industry is filled with data in both digital and non-digital form which can be of great help for both doctors and patients. Data about heart rates, blood glucose levels, blood oxygen levels, the effectiveness and side effects of different medicines, etc. are now abundant and able to be utilised to produce change in the medical industry.

Availability of healthcare data will help doctors to improve their diagnostic accuracy and efficiency. Data including audio, video and images can be used as a source for neural networks to learn and understand problems and then react accordingly. For new learners, many data sets such as BrainWeb and fastMRI are available online which can be utilised to find solutions e.g. producing precise medication for a patient depending on their medical condition. This will improve the success rate of the health industry as it removes human error by solving many problems automatically.

Advances in research and technology improve the early detection of diseases and the likelihood of discovering cures. Deep learning is a type of machine learning that is based on the way biological neurons process information in the brain. It’s playing a vital role in this area as it helps doctors understand new and different types of cancer using image segmentation which is a key topic in medical image processing and computer vision.

Currently data regarding COVID-19 is available online and updated daily for people to utilise for research. Countries are using data for detecting new and confirmed cases and monitoring the outbreak.

4. Insurance Industry:

Preventing financial loss is one of the biggest problems for insurance companies. There are multiple ways to detect fraudulent activities and subtle behavioural patterns. Data science helps predict and prevent these activities by using advanced analytical techniques involving statistical models.

Each insurance company uses a different algorithm for their price optimisation. Price optimisation algorithms can be updated using analytical techniques to detect new behavioural patterns in data. The policies for cost, expenses, and claims can be updated accordingly.

Customer service can be improved using personalised marketing. Every customer is updated and provided with personalised service using personalised recommendations, policies, pricing and offers. This can be achieved using demographic representation, industrial marketing, communication, and branding based on their location.

Customer segmentation, a process of dividing customers into groups based on common characteristics, is one of the most important techniques that help insurance industries sell their policies and get a return on claims. TPL-Insurance, one of the biggest insurance companies in Pakistan, utilises data collected about customers in order to build new recommendation systems.

Deep learning techniques like neuro-linguistic programming (NLP), data mining, text analytics, etc. enable the industry to form more accurate predictions.

5. Education Industry:

Data science can be used in the education sector both as an area of interest for students to study within the curriculum, and also as a way to improve how the education sector itself operates.

Data science is beneficial to the education industry in multiple ways such as advanced image analysis which can help teachers detect the misuse of technology by students. Plagiarism can be detected by teachers using advanced Optical character recognition (OCR) methods. Paper checking can be done using image recognition that uses Intelligent Character Recognition (ICR) which is a deep learning version of ‘handwriting analysis’.
Students, on the other hand, can be taught using a virtual assistant which is a digitally generated character that provides information via voice.

Data science plays a key role in the success of classrooms world-wide. The University of Georgia (GSU) use various machine learning tools to analyse their student data e.g. GPS Advising is a tool that helps to identify issues of student retention and course completion. Using analysis, they were able to improve their student graduation rate from 32% to 54%. Similarly, other universities can monitor their student requirements, measure instructor performance, help students with their emotional and social skills, and innovate their curriculum with the use of data analysis.

IBM has created a platform named Congo Analytics which is a business intelligence solution that empowers users with AI-infused self-service capabilities to accelerate data preparation, analysis and report creation. It helps business to improve their decision-making with AI-powered analytics. Many universities are using this platform as the main source of quantifiable data to understand their student performance and reduce the student dropout rate.

6. Transportation Industry:

The transportation industry has used data science for the invention of self-driving trains, and now similar technology has been introduced for self-driving cars.

Data science is used by logistic companies to handle the large amounts of data they have to deal with in real time. Uber, a platform that connects drivers to passengers, has a lab in Pittsburgh to explicitly hire data scientists to manage and utilise their real-time data.

Other platforms like Careem, DHL, and FedEx also have their own teams to provide them with insight regarding customer experience and help manage their data.

7. Social Media:

Platforms like Facebook, Instagram, Snapchat, WhatsApp, etc. have become a major way for people to share content quickly, efficiently and most importantly — in real time. According to Facebook’s performance report for Q3 2020, it now has over 1.8 million daily active users and roughly 100 billion messages are exchanged every day on WhatsApp alone.

The current era of personalisation has changed all these platforms. To improve the user experience, these platforms are using advanced Convolution Neural Networks (CNN) to update their systems to attract new customers and to increase the length of time users stay on their platforms.

Facebook uses textual analytics via their in-house tool called DeepText to analyse the text-based data and extract meaning from it. DeepFace, another application built by Facebook, performs facial recognition on its users for the purpose of personalisation and identifying people in photos. Facebook uses variety of personalised recommendation models depending on the use case e.g. media houses can create their own Facebook page and use that page for targeted advertisements that act as a self-learning recommendation system.

LinkedIn is another social media platform that connects professionals across the globe. LinkedIn uses data science to provide a better experience to its users by providing them with recommendations that help them connect with people with similar interests. Human resource departments often use this platform to find candidates to fill job vacancies within their companies.

Summary

Data science is a growing area that is of huge benefit to the world in countless ways. Security, agriculture, healthcare, insurance, education, transportation, and social media are some of the key industries where it is actively being utilised to improve efficiency, quality of life and access to information. In the future we would expect even more growth, especially in the areas of machine and deep learning.

The author of this article

Zawar Khan is a Software Engineer at inspired consulting. He started his career with a bachelors degree focused in computer science from ‘National University Of Computer and Emerging Sciences’ — FAST(NUCES). As an experienced Data Scientist with demonstrated history of working in the information technology and service industry, he is now supporting our clients on simplifying their processes with data science and the various fields of applied mathematics which is core of Machine learning.

 

Data Science