Knowing how to use visualization and when to use it is essential.
Data = value = better decisions = more profits!
Data provides so much value and power to businesses leading to better decisions and more profits. Visualizing data using charts, graphs, and maps is one of the most impactful ways to communicate complex data. The good thing about modern analytics is that there are a million ways to present the data; the bad thing about it is that you have to choose the right one to make the data understandable.
I can freely say that today, data visualization is a necessary branch in this business world. It is vital to do best practice (as we do in our company Jaggaer) ?
Too many different types and types of charts, graphs, maps, and every detail on the dashboard affects the clarity of the visualization. Every wrong title, subtitle, margin, axis, container, position, font size, colour, layout, order, and border can lead to the wrong conclusion and spoil the entire analytics. I will list some visualizations. Some of them are easy to understand, and some of them are complex. Each serves its purpose, but not all are best practices:
Bar Chart, Column chart, Line Chart, Pie Chart, Donut Pie Chart, Multi-Level Pie Chart, Maps, Density Maps, Scatter Plot, Connected scatter Plot, Gantt Chart, Bubble Chart, Treemap chart, Word Cloud Chart, Heat Maps chart, Stacked Column Chart, Line chart, Multi-Line Chart, Area Chart, Stacked Area Chart, Stacked Bar Chart, Spline Chart, Card, Table chart, Gauge Chart, Histogram, Box Plot, Violin plot, Density plot, Sankey Chart, Chord Chart, Network Chart, Mosaic Plot or Mekko Charts, Population Pyramids, Spider Charts, Stock Charts, Flow Charts, Gantt Charts, Control Charts, Waterfall Charts, Hierarchy Chart, Trellis Plots, Trellis Bar Chart, Trellis Line Chart, Pareto Charts, Radar Chart, Function Plots, Sunburst Charts, Tree Chart, Contour Plots.
Before diving deeper into charts, you should get a better understanding of the data types. Choosing a type of chart depends on whether your variables are continuous (also known as quantitative variables) or discrete (also known as qualitative or categorical variables). Maybe I should write a blog post about this topic also.
Let’s explore HISTOGRAMs.
At first look, you think it is Bar Chart; please, do not mix these charts. At first glance you would probably think this is a Bar Chart; please, do not mix these charts. A bar chart compares different series on a single chart, while the histogram shows data distribution within a single data series.. ?
A histogram is my go-to choice when I need to show the distributional features of dataset variables. They make it easy to conclude where the distribution peaks are, whether the distribution is skewed or symmetric, and if there are any outliers.
When should you use it?
– If you have a single continuous variable;
– Want to see the shape of the distribution? E.g. you want to know the lowest and highest values and which values are most common;

How to read the histogram:
The x-axis is the variable that we are interested in – the ages. These ages are grouped into ‘bins’, that is, intervals. In this case, the bins are 20 to 25, 25 to 30, 30 to 35, and so on, up to 80 to 85 years. The y-axis is the percentage of actresses awarded in each age bin. So, for example, the first thing you can see is that more than 25% of actresses who won were between 30 and 35 years old and also there are no winners between 50 to 60 years old etc.
You need to know that bandwidth is also significant. Sometimes bin width can be too small because it shows too much individual data. On the other side, the width can be too large, and we will not be able to find the underlying trend in the data.
This is an example of 2 histograms using the same data, but the bins too small or too large for some conclusions:


How could you change this graph to get more information from it?
You could increase the number of bins, and then I might know more about the ages of people in attendance.
Knowing the best bandwidth before you draw the plot is complex. The best way is to experiment with several values.
Types of distributions:

Mean, medians, and modes in symmetric, right-skewed, and left-skewed data

Individual histograms are great, but what if we need to draw many of them?
For example, we need to see the distribution of ages for each Continent. We can not just add colour to our histogram!

What a mess! ?
You should consider another visualization!
Like BOX PLOTS!
Box plots split a continuous variable like -age by a categorical variable like -continent, allowing us to compare the resulting distributions efficiently.
Box plot we also call Box-and-whisker. It is a diagram using a number line to break the data into groups of 25%. It identifies the minimum, maximum, lower quartile, upper quartile and median.


How do you calculate the median, lower and upper quartile?
First of all, you need to find five numbers from the data set: minimum, maximum, Q1 (lower quartile), Q3(upper quartile) and median.
Your first step will be to find the median by counting data values from the left and right side towards the middle until you reach one or two numbers in the middle, depending on the number of values in the series. If the number is even, there will be two numbers. If it’s odd then just one. To find the Q1, you must do the same thing between the most negligible value and the median. For Q3, you should see the middle between the median and the greatest value. This make sense because the median cuts the data in half. And if you cut it in half one more time, you will get quarters.
The picture below shows an example of a broken graph into categories:

Do you know, In a graph, what outliers are?
Outliers are data points that are not common and far away from the range with the greatest frequency.
Here is an example where you can compare how the same data set looks with a different visualization.

The dataset behind both histograms generates the same box plot in the center panel.

What do box plots tell you that histograms don’t?
Box plots tell you the least and greatest value in the data and the median. They also tell the lower and upper quartiles. Those values can only be estimated from histograms.
Pros and Cons
Histogram
#Pros
1. It divides the numeric data into uniform intervals and displays the number of data values within each bin.
2. They group data into small chunks. They help summarize numeric data in that they show the rough distribution of values
#Cons
1. The histogram doesn’t show information about what is happening within each bin of the graph.
# 2. It shows the number of values within an interval but not the actual values
Box Plot
#Pros
1. It is an excellent way to summarize large amounts of data.
# 2. It is easier to read the minimum value, median, outliers, quantiles, and maximum value.
#Cons
# It is hard to identify the original data
# I will use a boxplot if I have to display the range and distribution of data.
# Histogram will show the number of values within an interval.
