Aggregation
import pandas as pd
bikes = pd.read_csv('../data/indego-trips.csv')
(
bikes
.groupby('bike_id')
.agg(
min_duration=pd.NamedAgg(column='duration', aggfunc='min'),
max_duration=pd.NamedAgg(column='duration', aggfunc='max'),
med_duration=pd.NamedAgg(column='duration', aggfunc='median'),
avg_duration=pd.NamedAgg(column='duration', aggfunc='mean'),
total_trips=pd.NamedAgg(column='trip_id', aggfunc='count')
)
.reset_index()
.head()
)
We can access groups from a
groupby
object
(
bikes
.groupby('start_station')
.get_group(3020)
.head()
)
Another way to do multiple aggregation. Actually, I think it's just shorthand NamedAgg
(
bikes
.groupby('bike_id', as_index=False)
.agg(
min_duration=('duration', 'min'),
max_duration=('duration', 'max'),
med_duration=('duration', 'median'),
avg_duration=('duration', 'mean'),
total_trips=('trip_id', 'count')
)
.head()
)