![]() ![]() Hflights %.% group_by(Year, Month, DayofMonth) %.% select(Year:DayofMonth, ArrDelay,ĭepDelay) %.% summarise(arr = mean(ArrDelay, na.rm = TRUE), dep = mean(DepDelay, dplyr can work fine with ames like this, but converting it to a tbl_df object gives a nice summary view of the data: hflights_df 30 | dep > 30) There are over a quarter of a million records and 21 variables, which is good sized. # $ Origin : chr "IAH" "IAH" "IAH" "IAH". # $ UniqueCarrier : chr "AA" "AA" "AA" "AA". # Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted # FlightNum TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin # Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier # The following objects are masked from 'package:base':Īs a data source to illustrate properties with we'll use the flights data that we're already familiar with. # The following objects are masked from 'package:stats': ![]() setwd("~/Documents/Computing with Data/24_dplyr/") dplyr builds on plyr and incorporates features of Data.Table, which is known for being fast snf efficient in handling large datasets. To increase it's applicability, the functions work with connections to databases as well as ames. ![]() It is also very fast, even with large collections. The package dplyr provides a well structured set of functions for manipulating such data collections and performing typical operations with standard syntax that makes them easier to remember. Working with large and complex sets of data is a day-to-day reality in applied statistics. Using dplyr to group, manipulate and summarize data ![]()
0 Comments
Leave a Reply. |