Skip to content

Reference

anti_join(right, by)

Returns a lambda function that selects rows from a pandas DataFrame that do not match any rows in another DataFrame.

This function allows the user to specify the right DataFrame and the columns to compare. The left DataFrame is passed as input to the returned lambda function.

Parameters:

Name Type Description Default
right pandas.DataFrame

The DataFrame to compare with the left DataFrame.

required
by str or list

The name(s) of the column(s) to use for the comparison. Columns must have the same name in both the left and right DataFrames.

required

Returns:

Name Type Description
function

A lambda function that takes a left DataFrame as input and returns a new DataFrame containing the rows from the left DataFrame that do not match any rows in the right DataFrame based on the specified columns.

arrange(*columns)

Returns a lambda function that can be used to rearrange the rows of a pandas DataFrame.

Parameters:

Name Type Description Default
*columns str

The names of the columns to use for sorting the rows, along with the desired sort order (ascending or descending). Columns should be specified in the format "column_name [desc]".

()

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame with the rows sorted according to the specified columns.

distinct(*columns, keep_all=False)

Returns a lambda function that can be used to select the unique rows of a pandas DataFrame based on the specified columns.

Parameters:

Name Type Description Default
*columns str

The names of the columns to use for selecting the unique rows.

()
keep_all bool

Whether or not to keep all columns in the resulting DataFrame. Defaults to False.

False

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame containing only the unique rows based on the specified columns.

filter(*criteria)

Filters a Pandas DataFrame based on the specified criteria.

Parameters:

Name Type Description Default
*criteria

A list of lambda functions that specify the filtering criteria.

()

Returns:

Type Description

A filtered Pandas DataFrame.

filter_index(custom=None, mode='custom')

Returns a lambda function that filters a pandas DataFrame based on a custom value or the max/min value of the index.

Parameters:

Name Type Description Default
custom int

The custom value to filter the index. Required if mode is 'custom'.

None
mode str

The mode of filtering. Can be 'custom', 'max', or 'min'. Defaults to 'custom'.

'custom'

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame filtered based on the specified value or max/min value of the index.

group_by(*columns)

Returns a lambda function that can be used to group the rows of a pandas DataFrame based on the specified columns.

Parameters:

Name Type Description Default
*columns str

The names of the columns to group the rows by.

()

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame with the rows grouped according to the specified columns."

head(n=5)

Returns a lambda function that retrieves the first n rows of a pandas DataFrame.

Parameters:

Name Type Description Default
n int

The number of rows to return. Defaults to 5.

5

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame containing the first n rows.

left_join(right, by)

Returns a lambda function that performs a left join of two pandas DataFrames.

This function allows the user to specify the right DataFrame and the columns to join on. The left DataFrame is passed as input to the returned lambda function.

Parameters:

Name Type Description Default
right pandas.DataFrame

The DataFrame to join with the left DataFrame.

required
by str or list

The name(s) of the column(s) to use for the merge. Columns must have the same name in both the left and right DataFrames.

required

Returns:

Name Type Description
function

A lambda function that takes a left DataFrame as input and returns a new DataFrame containing the left join of the left and right DataFrames.

mutate(**transformations)

Applies transformations to the columns of a Pandas DataFrame.

The transformation functions operate on the entire DataFrame, which is passed as a single argument to the function. This makes it suitable for transformations that uses pd.Series methods.

If you need individual row-level processing, plese refer to the mutate_row function.

Parameters:

Name Type Description Default
**transformations

A dictionary of new column names and transformation functions.

{}

Returns:

Type Description

A transformed Pandas DataFrame.

mutate_row(**transformations)

Applies transformations to columns of a Pandas DataFrame at a row-level.

This means that the transformation functions are applied to each row of the DataFrame, allowing for row-level processing.

If you need transformations that operate on the entire DataFrame, plese refer to the mutate function.

Parameters:

Name Type Description Default
**transformations

A dictionary of new column names and transformation functions.

{}

Returns:

Type Description

A transformed Pandas DataFrame.

pipeline(data, *funcs)

Applies a sequence of functions to the input data, in the order they are provided.

Parameters:

Name Type Description Default
data any

The input data to be processed.

required
*funcs function

A variable number of functions to be applied to the input data.

()

Returns:

Name Type Description
data any

The result of applying all of the functions in the sequence to the input data.

rename(*mappings)

Returns a lambda function that can be used to rename the columns of a pandas DataFrame.

Parameters:

Name Type Description Default
*mappings str

Strings representing mappings of old column names to new column names. Each string should be in the format 'old_name = new_name'.

()

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame with the specified column name mappings applied.

select(*columns)

Returns a lambda function that can be used to select the specified columns from a pandas DataFrame.

Parameters:

Name Type Description Default
*columns str

The names of the columns to select.

()

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame containing only the selected columns.

squeeze()

Returns a lambda function that squeezes a pandas DataFrame into a pandas Series if it has only one column or one row.

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a pandas Series if the DataFrame has only one column or one row, otherwise returns the original DataFrame.

summarize(*aggregations)

Returns a lambda function that can be used to apply aggregations to the groups of a pandas DataFrame.

Parameters:

Name Type Description Default
*aggregations str

A string in the format "column = aggregation_function()" specifying the column to aggregate and the aggregation function to use.

()

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame with the specified aggregations applied to each group.

tail(n=6)

Returns a lambda function that retrieves the last n rows of a pandas DataFrame.

Parameters:

Name Type Description Default
n int

The number of rows to return. Defaults to 5.

6

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and returns a new DataFrame containing the last n rows.

to_csv(filename, index=False, **kwargs)

Returns a lambda function that saves a pandas DataFrame to a CSV file.

Parameters:

Name Type Description Default
filename str

The name of the file to save the DataFrame to.

required
index bool

Whether or not to write row names (index). Defaults to False.

False
**kwargs

Additional keyword arguments to be passed to pandas.DataFrame.to_csv() function.

{}

Returns:

Name Type Description
function

A lambda function that takes a DataFrame as input and saves it to a CSV file.