Exploring the popularity of Pig and Hive
I first found the most popular tag for the associated technologies at Stackoverflow. Then I used the public data explorer on StackExchange and entered the tags as queries. I then downloaded the csv file and brought it in to R for some visualizations.
## Warning: Installed Rcpp (0.12.12) different from Rcpp used to build dplyr (0.12.11). ## Please reinstall dplyr to avoid random crashes or undefined behavior.
## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse: readr ## Loading tidyverse: purrr ## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats ## lag(): dplyr, stats
pig <- read.csv("pig.csv", header = TRUE) hive <- read.csv("hive.csv", header = TRUE) pighive <- rbind(pig, hive) #combine the data to one dataframe pighive$mo <- strptime(x = as.character(pighive$mo), format = "%Y-%m-%d %H:%M:%S") ggplot(pighive, aes(mo, Total.Votes)) + geom_line(aes(color = TagName)) + ggtitle("Popularity of Pig vs Hive on Stack Overflow") + ylab("Tag Votes") + xlab("Time")
The number of posts with apache-pig as the tag has plataeued and slightly droped from its peak in 2014. Hive has gained in popularity and has more than 3x the number of posts. Seems like a clear winner for Hive here.