Exploratory Data Analysis is the process of performing initial investigations on data to discover patterns, spot outliers, and test hypotheses with summary statistics and graphical representations.
To share my understanding of the concept. I’ll take an example of the loan acceptance data set and will explore it using python.

Data Description

Download the data from the Analytics Vidhya website. The below URL consists of the train and test CSV data.

URL: https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/

The data set provided in the above link is of Dream Housing Finance company that deals with granting home loans in urban, semi-urban, and rural areas.

It consists of the…

Ques 1: What is SQL?

Answer 1: SQL stands for Structured Query Language. It is the primary language used to interact with databases. With the help of SQL, one can extract data from a database, modify this data and also update it whenever there is a requirement.

Ques 2: What is DBMS?

A Database Management System (DBMS) is a program that controls the creation, maintenance, and use of a database. DBMS can be termed as File Manager that manages data in a database rather than saving it in file systems.

Ques 3: What is a SQL server?

SQL Server is…

There is no denial that shortcuts make one life easier. Some of the most important Microsoft Excel Shortcuts for windows are:-

1 Ctrl + 5: For putting a strikethrough in a cell

2 Ctrl + 9: For hiding a row

3 Ctrl + 0 : For hiding a column

4 Ctrl + ; : For Entering the current date in a cell

5 Ctrl + Shift + : : For Entering the Current Time in the cell

6 Ctrl + ‘: To copy the formula from the above cell

7 Ctrl + ` : For displaying formulas

8 Ctrl +…

There is no denial that shortcuts make one life easier. Some of the most important Microsoft Excel Shortcuts for windows are:-

1 Ctrl + N: Create a New Workbook

2 Ctrl+ O: To open a saved workbook

3 Ctrl + S: To save a workbook

4 Ctrl + A: For selecting all the contents in a workbook

5 Ctrl + B: To turn highlighted cells bold

6 Ctrl + C: To copy the highlighted cells

7 Ctrl + D: To fill the selected cell with the content of the cell right above.

8 Ctrl + F: For searching in the…

We extract information from a website by copying and pasting the info from there to our file. However, this manual process of extracting data can be cumbersome if we want to obtain large amounts of information from a website as quickly as possible. In such a situation, web scraping helps as it is a method to download large amounts of data from websites using codes or API.

Python is the most popular language used for web scraping as it has libraries like Scrapy, Beautiful Soup, and Selenium that make scrapping websites a cakewalk. Web scraping involves inspecting the web page…

Machine Learning (ML) is a technique that uses algorithms to learn from the data without being programmed explicitly. Due to the data abundance and efficient data storage, ML rose to the limelight in recent times, but the foundational research in this field was done in seventy’s and eighty’s.
Different ways for a computer to learn from data — supervised learning, unsupervised learning, and reinforcement learning.

Supervised Learning

A supervised learning algorithm takes labeled data while training the model, and then the model makes predictions in the presence of the new data. These problems could be divided into regression and classification problems.

  • Classification

The real-world datasets consist of missing values, and a data scientist spends a major amount of time on data preparation, including data cleaning. Missing Value can be a result of unrecorded observations or data corruption.

Types of Missing Data

  • Missing at Random (MAR) — It means that there is a relationship between the proportion of missing values and the observed data. For example,in the below graph we see that the proportion of missing values in the mileage column is correlated to the car’s manufacturing year.Therefore,this type of missing values in the data set can be predicted using other features.
Relationship Between Percentage of Missing Values and the Manufacturing Year of Car.
  • Missing…

Lending institutions determine the creditworthiness of the borrower before granting them the loan so as to mitigate the risk of credit default. All the lenders have their own criteria, but primarily the factors stated below are considered while approving a loan application.

  1. Credit Score: A loan provider needs to know which applicants are most and least likely to honor their loan obligations. Lenders use borrower’s credit scores to assess credit risk. These scores take payment history, the current level of indebtedness, length of credit history, opening of the new credit accounts, and the amount of credit outstanding while determining the…

In this article, we try to understand the profit and loss of the credit card issuing companies. It would help provide a framework for the product managers or credit risk managers to think about how a risk or marketing strategy impacts the revenue or expenses and hence the overall profits.

Profit, in general, is defined as the difference between revenue and expenses. Let’s start with the revenue drivers of the credit card business.

Profit = Revenue — Expense — Loss

Revenue Sources for a Card Issuer or an Issuing Bank:-

Interchange Fees: Merchant discount is paid by the merchant for…

The coefficient of determination or R-squared represents the proportion of the variance in the dependent variable which is explained by the linear regression model. It is a scale-free score i.e. irrespective of the values being small or large, the value of R square will be less than one.

The Figure below is the graphical depiction of the coefficient of determination.

Akshita Chugh

I am a Data Analyst and Consultant who likes to write articles.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store