Data visualization

ggplot2: Data visualization on "Vehicle accidents"​ dataset

What is ggplot?

ggplot2 is the most popular and fully featured data visualization package for the R programming language. It allows to build and customize the graphics using the concept of grammar of graphics.

I used ggplot2 on the dataset “Vehicle accidents” to produce below visualization, and I must say it is one of the powerful techniques in R to showcase customizable data graphics.

Dataset info:

NZ Police reports all the traffic crash data is to the Transport Agency. The Agency then feeds all the reported data into Crash Analysis System (CAS). By default, all crashes in the New Zealand roadways or any areas where public have legal access is recorded in the CAS. The dataset used in this case includes crash variables, with no personal data.

Challenges:

  • Managing the different data types.
  • Values within the variable either do not belong to the relevant data type, is missing or is an NA.
  • Frequent error message while plotting (Error message: Error: cannot allocate vector of size 5.4 Gb).

Solutions:  

  • Conversion into factor for better analysing.
  • Breaking down into tibble or simple data frames really helps.
  • Removing the non-relevant data value from the data frame to avoid conflicting results.
  • Use smaller vectors for plotting to avoid error. Alternative solution could be to increase the memory or use of special packages.