source

DataFrameTransformer

 DataFrameTransformer (transformer=None, input_cols=None,
                       output_cols=None, prev_step=None, append=False,
                       print_input_cols=False, print_output_cols=False,
                       print_out_df_cols=False)

Applies a transformer to a set of columns of pandas DataFrame and it outputs a DataFrame too.

from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
X = pd.DataFrame(
    {'city': ['London', 'London', 'Paris', 'Sallisaw'],
     'title': ["His Last Bow", "How Watson Learned the Trick",
               "A Moveable Feast", "The Grapes of Wrath"],
     'expert_rating': [5, 3, 4, 5],
     'user_rating': [4, 5, 4, 3]})
X
city title expert_rating user_rating
0 London His Last Bow 5 4
1 London How Watson Learned the Trick 3 5
2 Paris A Moveable Feast 4 4
3 Sallisaw The Grapes of Wrath 5 3

The OneHotEncoder expects a two dimensional array as input, so we set the input_cols to a list of columns. DataFrameTransformer uses the

enc_city = DataFrameTransformer(transformer=OneHotEncoder(dtype='int'),
                                input_cols=['city'],
                                append=True)
enc_city.fit_transform(X)
city title expert_rating user_rating city_London city_Paris city_Sallisaw
0 London His Last Bow 5 4 1 0 0
1 London How Watson Learned the Trick 3 5 1 0 0
2 Paris A Moveable Feast 4 4 0 1 0
3 Sallisaw The Grapes of Wrath 5 3 0 0 1

CountVectorizer expects a one-dimensional array as input so we set input_cols to a string that will retrieve a one-dimensional array from the input DataFrame.

enc_title = DataFrameTransformer(transformer=CountVectorizer(), input_cols='title', append=True)
enc_title.fit_transform(X)
city title expert_rating user_rating bow feast grapes his how last learned moveable of the trick watson wrath
0 London His Last Bow 5 4 1 0 0 1 0 1 0 0 0 0 0 0 0
1 London How Watson Learned the Trick 3 5 0 0 0 0 1 0 1 0 0 1 1 1 0
2 Paris A Moveable Feast 4 4 0 1 0 0 0 0 0 1 0 0 0 0 0
3 Sallisaw The Grapes of Wrath 5 3 0 0 1 0 0 0 0 0 1 1 0 0 1

We can chain these two into one Pipeline.

pipe = Pipeline([('enc_city', enc_city), ('enc_title', enc_title)])
pipe.fit_transform(X)
city title expert_rating user_rating city_London city_Paris city_Sallisaw bow feast grapes his how last learned moveable of the trick watson wrath
0 London His Last Bow 5 4 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0
1 London How Watson Learned the Trick 3 5 1 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0
2 Paris A Moveable Feast 4 4 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0
3 Sallisaw The Grapes of Wrath 5 3 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 1