Process Multiple Files in Python with One Code
In this article, we will explore a way to process multiple CSV files using Python and write the results into one single CSV file.
Introduction
Processing large amounts of data can be challenging, especially when dealing with multiple files. In this article, we will discuss how to use Python’s pandas library to process multiple CSV files and write the results into one single CSV file.
Step 1: Preparing Your Environment
Before you start processing your files, make sure you have the necessary libraries installed in your environment. The two most important libraries for this task are pandas and csv.
To install these libraries, use pip:
{< highlight bash >}
pip install pandas
{< /highlight >}</bash>
Understanding Pandas
Pandas is a powerful library used for data manipulation and analysis. It provides efficient data structures and operations for processing large datasets.
The pd.read_csv() function is used to read a CSV file into a DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types. The delimiter parameter specifies the delimiter used in the CSV file, and the skiprows parameter skips a specified number of rows at the beginning of the file.
Step 1.5: Processing Multiple Files
Once you have your files ready, you can use the os.walk() function to traverse the directory and find all CSV files.
Here’s an example code snippet that demonstrates how to process multiple files:
{< highlight python >}
import pandas as pd
import csv as csv
import os
# Initialize lists to store the results
list10 = []
list15 = []
# Process each file
for root, dir, files in os.walk('C:/Users/bla/bla/bla'):
for file in files:
if file.endswith('.csv'):
# Read the CSV file into a DataFrame
df = pd.read_csv(file)
# Calculate the mean of the specified columns
datamean10 = df[61:240].mean()
datamean15 = df[241:420].mean()
# Append the results to the lists
list10.append(datamean10.clip(0))
list15.append(datamean15.clip(0))
# Write the results to two separate CSV files
csvfile = "C:/Users/bla/bla/bla/list10.csv"
with open(csvfile, 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list10)
csvfile = "C:/Users/bla/bla/bla/list15.csv"
with open(csvfile, 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list15)
{< /highlight >}</python>
Step 2: Appending Results to an Existing List
To append the results to an existing list instead of starting a new list each time, you can use a with statement to open the file in write mode. This way, you don’t have to specify the full path of the file every time.
Here’s an updated code snippet that demonstrates how to do this:
{< highlight python >}
import pandas as pd
import csv as csv
import os
# Initialize lists to store the results
list10 = []
list15 = []
# Process each file
for root, dir, files in os.walk('C:/Users/bla/bla/bla'):
for file in files:
if file.endswith('.csv'):
# Read the CSV file into a DataFrame
df = pd.read_csv(file)
# Calculate the mean of the specified columns
datamean10 = df[61:240].mean()
datamean15 = df[241:420].mean()
# Append the results to the lists
list10.append(datamean10.clip(0))
list15.append(datamean15.clip(0))
# Write the results to two separate CSV files
with open("C:/Users/bla/bla/bla/list10.csv", 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list10)
with open("C:/Users/bla/bla/bla/list15.csv", 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list15)
{< /highlight >}</python>
Step 3: Optimizing Performance
If you are processing a large number of files, the above approach might not be efficient. To optimize performance, consider using the dask library instead.
Here’s an example code snippet that demonstrates how to use dask:
{< highlight python >}
import dask.dataframe as dd
import dask
# Process each file
for file in os.listdir('C:/Users/bla/bla/bla'):
if file.endswith('.csv'):
# Read the CSV file into a DataFrame using Dask
df = dd.read_csv(file)
# Calculate the mean of the specified columns
datamean10 = df[61:240].mean().compute()
datamean15 = df[241:420].mean().compute()
# Append the results to the lists
list10.append(datamean10.clip(0))
list15.append(datamean15.clip(0))
# Write the results to two separate CSV files
with open("C:/Users/bla/bla/bla/list10.csv", 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list10)
with open("C:/Users/bla/bla/bla/list15.csv", 'w') as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerows(list15)
{< /highlight >}</python>
Conclusion
In this article, we demonstrated how to process multiple CSV files using Python and write the results into one single CSV file. We discussed various approaches to achieve this task, including using pandas, appending results to an existing list, and optimizing performance using Dask.
By following these steps, you can efficiently process large amounts of data and write it to a single CSV file for analysis or further processing.
Last modified on 2025-03-28