NVD3: How to Create a Simple Scatter Plot?

Author: Liron Shimrony

NVD3 is an easy to use visualization library. It is based on D3.js and the big advantage of this library is the ability to create reusable components. You can look at it like the difference between C and Python. Using C, you have control over everything, even at the very low level. Using python is easier but you can not get the same control as you get in C.

Notes:

  1. The Wines Data Set can be downloaded from UCI
  2. The code for this tutorial can be downloaded from my GitHub repo

This tutorial is divided into 7 parts:

Create a basic template to draw your chart on Create a basic chart with 4 hard coded points Extend the previous figure by adding a legend and have the ability to get data when we hovering over points Getting the Wines Data Set and convert it from CSV to JSON Create a scatter plot of alcohol vs. color intensity using the wines dataset Extend the previous chart by visualizing the alcohol vs. color intensity and adding different colors for each wine class and a variable point size according the the wine hue. Conclusions

Step 0 - Create a basic template

In order to create the graphs, we need a simple HTML template. The following HTML code will be sufficient.
Note:
When you write the javascript code, make sure to change the d3 selector id to a one that will match with the template
(e.g basicChart)

Step 1 - Create a basic chart with 4 hard coded points

The following code, will create a simple scatter plot with 4 points. For simplicity, we removed the legend for now however we will put it back in part 2.

As you can see, this is a plain chart. Lines 3 to 13 defines the chart attributes, line 17 sets the data for the chart and lines 20 to 25 will add the chart to the page. You can find more information about the chart attributes in the code comments.

Step 2 - Adding a Legend and Setting the Axes Tick Format

Now, that we have a basic figure, let's make some modifications.
Most likely, we will not deal only with discrete data. In order to deal with continues data, we need to change the axis format. In this case, two digits after the decimal point will be sufficient.
Line 19 and 20 will set the axes ticks format to 2 digits after the decimal point. A legend is an important thing to have, thus we change the value in line 8 from false to true .
Lastly, we add the ability hover over the points to see their coordinates. We do this by changing the values in line 11 from false to true

Step 3 - Converting Data to JSON

The Data Set comes in a CSV format. However, when using JavaScript, JSON is my favorite data format. The following Python code will convert the data from a CSV to JSON.

You can copy the code and paste it in a file. Then run it by calling python filename.py. Make sure that the CSV file is in the same folder as the script. Also, on line 6, change the file name in the infile variable so it will match the CSV file. On line 24, change the output filename. In this tutorial I will call this file data.json

Step 4 - Plotting the Wines Data Set

Now, after we converted our data into JSON format, we can use NVD3 to visualize it. The following code will create a scatter plot of the data. Notice that NVD3 chooses the axes range for us and thus the X axis does not start from 0. Also notice that on the Y axis, we have some points that are above the axes ticks. We will take care of that in the next step.

The only change that was made to the first function is in lines 30 to 31 that will add the axes labels. We also added the function getDataPlain which will read our data from file and will return it to the chart.
This function is basically a simple ajax request. Now, it is important to mention that the data structure for NVD3 is a list of object. For now, we will use only one object which includes 2 keys:

  1. key (wines)
  2. values which is an array of objects. Each object has the following keys:
    • x: the x axis coordinate
    • y: the y axis coordinate
From looking on the figure above, it is hard to derive conclusions. In the next step we will organize the data in a way that will actually teach us something about this data.

Step 5 - Visualizing the Wines Data Set

In the previous step, we plotted the data and realized that it is hard to learn something new about it from the current plot. We can visualize more than 2 dimensions on this scatter plot by creating a 3rd dimension using different color for every wine class and a 4th dimensions by changing the size of every point according to the wine hue. More hue means larger point and vice versa.

The only change we made in the main function is in lines 16 - 17 which tells NVD3 the desired range for our axes. This will fix the points that are out of the graph in the previous steps. We also changed the way we construct the data in getData function. Now we create a an object for each wine class. From inspecting the data, we know that we have only 3 classes. lines 52 - 53 are creating those objects. Lines 55 to 63 putting each wine in the right class. Notice that we also added the hue data as the size of the point. Larger size value means greater hue.

Conclusion

After inspecting the new plot, we can find immediately find some relationships between the different wine classes. For example, we can see that almost all the wines from class 2 has less alcohol from the wines in class 1. Also, the the color intensity of wines from class 3 is higher than the color intensity of wines from class 2. Lastly, we can see that most of the wines in class 3 has a low hue since their points size is the smallest.

References

  1. M. Lichman. UCI machine learning repository, 2013.