Creating Sub-Directories and Files from a Pandas DataFrame Using Python's Pandas Library

Creating Sub-Directories and Files from a Pandas DataFrame

In this article, we’ll explore how to create subdirectories and files from a pandas DataFrame. We’ll cover the necessary steps, use cases, and provide example code.

Introduction

Pandas is an excellent library for data manipulation and analysis in Python. However, often times, our data needs to be organized into directories and files for further processing or storage. In this article, we’ll show you how to achieve this using pandas and the os module.

Requirements

Python 3.x
Pandas library (pip install pandas)
The example dataset is included in the code snippet below.

Creating Sub-Directories and Files

We can accomplish this task by utilizing the groupby method of a pandas DataFrame. Here’s how we can do it:

import os
import pandas as pd

# Create a sample dataframe
data = {'user': [7, 7, 7, 7, 7, 7, 7, 11, 11, 11],
        'session_id': [15, 15, 15, 15, 31, 31, 31, 43, 43, 43],
        'logtime': ['2016-04-13 07:58:40','2016-04-13 07:58:41','2016-04-13 07:58:42',
            '2016-04-13 07:58:43','2016-04-01 20:29:37','2016-04-01 20:29:42',
            '2016-04-01 20:29:47','2016-03-30 06:21:59','2016-03-30 06:22:04',
            '2016-03-30 06:22:09'],
        'lat': [41.1872084,41.1870716,41.1869719,41.1868664,41.1471521,
                41.1472466,41.1473038,41.2372125,41.2371444,41.2369725],
        'lon': [-8.6038931,-8.6037318,-8.6036908,-8.6036423,-8.5878757,
                -8.5874314,-8.586632,-8.6720773,-8.6721269,-8.6718833]}

d = pd.DataFrame(data)

# Define the base folder
base_folder = '/Data'

# Create the data folder if needed, change the path if needed
os.makedirs(base_folder, exist_ok=True)

for (user_id,sess_id), data in d.groupby(['user', 'session_id']):
    # Create a subdirectory for each user
    user_folder = f'{base_folder}/{user_id}'
    os.makedirs(user_folder, exist_ok=True)
    
    # Write the logtime, lat, and lon columns to a CSV file within that folder
    filename = f'{user_folder}/file_{sess_id}.csv'
    data.drop(['user', 'session_id'], axis=1).to_csv(filename, index=False)

Additional Considerations

If you need more control over the naming of your files or want to use different names for the subdirectories, consider using two nested loops with groupby. Here’s an example:

for user_id, user_data in d.groupby('user'):
    # Create a subdirectory for each user
    user_folder = f'{base_folder}/{user_id}'
    os.makedirs(user_folder, exist_ok=True)
    
    for file_id, (sess_id, data) in user_data.groupby('session_id'):
        # Write the logtime, lat, and lon columns to a CSV file within that folder
        filename = f'{user_folder}/file_{file_id}.csv'
        data.drop(['user', 'session_id'], axis=1).to_csv(filename, index=False)

Conclusion

In this article, we demonstrated how to create subdirectories and files from a pandas DataFrame using the groupby method. By leveraging the os module, you can automate the creation of directories and CSV files for further data processing or storage.

With these steps, you’ll be able to efficiently organize your data into structured folders and files, making it easier to analyze and manipulate in various applications.

Last modified on 2025-04-03