In [1]:
import pandas as pd

## Explode

> also new for pandas 0.25

### List
If have a list within a column and you want to create a row for each item in that list.

In [3]:
df = pd.DataFrame({
    'article': ['a', 'b', 'c'],
    'tags': [['happy', 'fun'], ['sad', 'frustrating', 'stressful'], ['ಠ_ಠ']],
    'views': [100, 10, 10000]
})

df

Unnamed: 0,article,tags,views
0,a,"[happy, fun]",100
1,b,"[sad, frustrating, stressful]",10
2,c,[ಠ_ಠ],10000


In [6]:
df.explode('tags')

Unnamed: 0,article,tags,views
0,a,happy,100
0,a,fun,100
1,b,sad,10
1,b,frustrating,10
1,b,stressful,10
2,c,ಠ_ಠ,10000


### Comma-seperated

If have comma seperated values within a column and you want to create a row for each item in the string

In [7]:
df = pd.DataFrame({
    'article': ['a', 'b', 'c'],
    'tags': ['happy,fun', 'sad,frustrating,stressful', 'ಠ_ಠ'],
    'views': [100, 10, 10000]
})

df

Unnamed: 0,article,tags,views
0,a,"happy,fun",100
1,b,"sad,frustrating,stressful",10
2,c,ಠ_ಠ,10000


In [11]:
(
    df
    .assign(tags=df.tags.str.split(","))
    .explode('tags')
)

Unnamed: 0,article,tags,views
0,a,happy,100
0,a,fun,100
1,b,sad,10
1,b,frustrating,10
1,b,stressful,10
2,c,ಠ_ಠ,10000


## Query

> for easy to read sql-like slices, good for chaining

In [14]:
(
    df
    .query('views > 10 & views < 10000')
)

Unnamed: 0,article,tags,views
0,a,"happy,fun",100


> can also pass variables with @

In [15]:
threshold = 100

In [16]:
(
    df
    .query('views == @threshold')
)

Unnamed: 0,article,tags,views
0,a,"happy,fun",100


## Multiple Conditions

> Sometimes we want to filter our dataframe using a list

In [18]:
some_list = ['a', 'b']

In [19]:
df[df.article.isin(some_list)]

Unnamed: 0,article,tags,views
0,a,"happy,fun",100
1,b,"sad,frustrating,stressful",10


> we can reverse this easily with ~

In [20]:
df[~df.article.isin(some_list)]

Unnamed: 0,article,tags,views
2,c,ಠ_ಠ,10000


## Assign

Skipping basics since you can just look here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html

> Create columns with variable names, also f strings (prob cleaner to use than `{}.format()`

In [21]:
col_name = 'cat'

In [24]:
(
    df
    .assign(
        **{f'some_{col_name}': ['haku', 'nagi', 'poki']}
    )
    .style.set_caption("some random dataframe I imagined")
)

Unnamed: 0,article,tags,views,some_cat
0,a,"happy,fun",100,haku
1,b,"sad,frustrating,stressful",10,nagi
2,c,ಠ_ಠ,10000,poki


> can also set a caption for the dataframe lol