How to read all the files inside a folder into a single dataframe

Have a bunch of csv files and don’t want to import them all individually? Well, I can’t blame you, and you shouldn’t anyways.

The function below reads all the files within a single folder and returns it as one Pandas dataframe. Of course, for it to work the files need to be of the same data structure. In other words, they all got to have the same column names in the same order.

# we import these first
# put them in the beginning of your file
import os
import pandas as pd

def read_data(folder):
    '''
    This function reads each the raw data files as dataframes and
    combines them into a single data frame.
    '''
    for i, file_name in enumerate(os.listdir(input_folder)):
        try:
            # df = pd.read_excel(os.path.join(input_folder, file_name)) # excel
            # df = pd.read_csv(os.path.join(input_folder, file_name), sep='\t') # tsv file
            df = pd.read_csv(os.path.join(input_folder, file_name)) # vanilla csv
            df['file_name'] = file_name
            if i == 0:
                final_df = df.copy()
            else:
                final_df = final_df.append(df)

        except Exception as e:
            print(f"Cannot read file: {file_name}")
            print(str(e))
    return final_df

# provide the folder path where your files reside
folder = 'G:/path/to/data/parent_folder_name'

# call the function
df = read_data(folder)