core

Monkey patches for pandas.

Utils


source

dummydf

 dummydf ()

A dummy DataFrame.

Transformations


DataFrame.repetitions

 DataFrame.repetitions (col)

Counts the number of repetitions for each element.

df = pd.DataFrame({'a': [1, 2, 3, 4, 4, 5, 5, 6, 6, 6], 'b':[1, 1, 1, 1, 2, 2, 2, 3, 3, 4]})
df.repetitions('b')
b
1    4
2    3
3    2
4    1
dtype: int64
test(df.repetitions('b'), pd.Series({1:4, 2:3,3:2, 4:1}), all_equal)

DataFrame.repetition_counts

 DataFrame.repetition_counts (col)

Counts the number of groups with the same number of repetitions.

In the following example there are three groups with one element, two groups with two elements, and one group with three elements.

df.repetition_counts('a')
1    3
2    2
3    1
dtype: int64
test(df.repetition_counts('a'), pd.Series({1: 3, 2:2, 3:1}), all_equal)

DataFrame.single_events

 DataFrame.single_events (col)

DataFrame.single_events

 DataFrame.single_events (col)

Returns rows that appear only once.

df.single_events('a')
a b
0 1 1
1 2 1
2 3 1
test_eq(df.single_events('a'), df.loc[[0, 1, 2]])

Functions as methods

Pandas functions that are easier to execute as DataFrame/Series methods.


DataFrame.crosstab

 DataFrame.crosstab (index, column, **kwargs)

DataFrame.len

 DataFrame.len ()

Series.len

 Series.len ()

One-liners

These methods allow fast exploration of the data in one line.


Index.l

 Index.l ()

Series.minmax

 Series.minmax ()

DataFrame.page

 DataFrame.page (page, page_size=5)

Shows rows between page*page_size and (page+1)*page_size

df = pd.DataFrame({'a': range(12), 'b': range(12)})
df.page(3)
a b
10 10 10
11 11 11

Series.page

 Series.page (page, page_size=5)

Shows rows between page*page_size and (page+1)*page_size

s = pd.Series(range(15))
s.page(2)
5    5
6    6
7    7
8    8
9    9
dtype: int64

source

L.page

 L.page (page, page_size=10)

Shows elements between page*page_size and (page+1)*page_size

Method Variations

These methods are slight variations from DataFrame ones.


DataFrame.renamec

 DataFrame.renamec (d, *args, **kwargs)
df = dummydf()
df.renamec({'col_1': 'col_a'}, 'col_2', 'bar')
col_a bar
0 100 a
1 101 b
2 102 c
3 103 d
4 104 e

Series.notin

 Series.notin (values)

Series.mapk

 Series.mapk (fun, **kwargs)

DataFrame.sort

 DataFrame.sort (by, **kwargs)
temp = df.sample(df.len())
test_eq(temp.sort('col_1'), df)

Move columns to the front/back


DataFrame.c2back

 DataFrame.c2back (cols2back)

DataFrame.c2front

 DataFrame.c2front (cols2front)
df = dummydf()
df.c2back(['col_1'])
col_2 col_1
0 a 100
1 b 101
2 c 102
3 d 103
4 e 104
df.c2back('col_1')
col_2 col_1
0 a 100
1 b 101
2 c 102
3 d 103
4 e 104
df.c2front('col_2')
col_2 col_1
0 a 100
1 b 101
2 c 102
3 d 103
4 e 104
df.c2front(['col_2'])
col_2 col_1
0 a 100
1 b 101
2 c 102
3 d 103
4 e 104

DataFrame.reorderc

 DataFrame.reorderc (to_front=[], to_back=[])

Reorder DataFrame columns.

df['col_3'] = df['col_1']
df.reorderc(['col_3'], ['col_1'])
col_3 col_2 col_1
0 100 a 100
1 101 b 101
2 102 c 102
3 103 d 103
4 104 e 104