![]() Imagine you’re analyzing monthly spending habits from your close friend group (let’s pretend we have this many friends), and you have a hunch that monthly spending and monthly income are related, so you plot them on a graph together and get a little something that looks like this.Ī bit of an unfortunate disclaimer in the efforts of being transparent, nothing is ever this obvious in real world data, because again, I’ve just made up this data.īut just for the sake of this example, let’s assume for now that this is what we see. With visualizations, this task falls onto you so to better understand how to identify clusters using visualization, let’s take a look at this through an example that I made up using some random data that I generated. There’s a whole field of unsupervised machine learning dedicated to this though, called clustering, if you’re interested. These algorithms use a series of mathematical techniques to find general rules that can be used on any data set, and hence, become pretty intricate, which is why we won’t go into any more detail on them. Clustering algorithms basically look for group-related or data points that are closer together, while separating different, or distant, data points. We won’t get into the algorithms here, but I’ll provide a simple overview. There are many approaches that you can take to identify clusters, but they can be simplified to be either: What to Use Scatter Plots For: 3 Applications of Scatter Plots So let’s take a real look at how scatter plots can be used. ![]() Of course, plotting a random distribution of numbers is more for showing what can be done, rather than for being practical. Plt.title(“Colored and sized scatter plot”,fontsize=20)Īnd ta-dah! We get this impressive lookin’ and fancy scatter plot. Plt.scatter(uniquePoints,uniquePoints,s=counts,c=dists,cmap=plt.cm.jet) Now that we have our data prepared, all we have to do is: UniquePoints, counts = np.unique(x圜oords, return_counts=True,axis=0)ĭists = np.sqrt(np.power(uniquePoints,2)+np.power(uniquePoints,2)) We then also calculate the distance from the origin for each pair of points to use for scaling the color. Then we’ll create a new variable that contains the pair of x-y points, find the number of unique points we are going to plot and the number of times each of those points showed up in our data. To do that, we’ll just quickly create some random data for this: How about creating something that looks like this fancy scatter plot where we scale the points based on how many values there are at that point, and changing the color based on the distance to the origin? Plt.scatter(secondXData,secondYData,color=”orange”,marker=”x”) Plt.scatter(firstXData,firstYData,color=”green”,marker=”*”) In Matplotlib, all you have to do to change the colors of your points is this: Let’s say we want to compare two sets of data, and we want to have them be different symbols and colors to easily let us differentiate between them. In addition to being so easy to create graphs in, Matplotlib also allows for a ton of cool, fancy customizations. Tip: if you don’t have any data on hand that you want to plot, but still want to try this code out for fun, you can just generate some random data using numpy like this: In this code, your “xData” and “yData” are just a list of the x and y coordinates of your data points. If you’re not sure what programming libraries are or want to read more about the 15 best libraries to know for Data Science and Machine learning in Python, you can read all about them here.īut long story short: Matplotlib makes creating a scatter plot in Python very simple.Īll you have to do is copy in the following Python code: The easiest way to create a scatter plot in Python is to use Matplotlib, which is a programming library specifically designed for data visualization in Python. Humans are visual creatures and thus, making data easy often means making data visual. ![]() There is a very logical reason behind why data visualization is becoming so trendy.Īs we enter the era of big data and the endless output and storing of exabytes (1 exabyte aka 1 quintillion bytes aka a whole, whole lot) of data, being able to make data easy to understand for others is a real talent. What do correlations mean? How do you use/make use of correlations?. ![]() Identifying Correlations in Scatter Plots
0 Comments
Leave a Reply. |