Deriving a DataFrame from an Existing One: A Case Study on Data Transformation and Visualization

Deriving a DataFrame from an Existing One: A Case Study on Data Transformation and Visualization

In this article, we will explore the process of transforming a pandas DataFrame using various mathematical functions and then visualizing the results in a meaningful way. We will use Python with its popular libraries pandas, numpy, and matplotlib to achieve this.

Introduction

Pandas is a powerful library used for data manipulation and analysis in Python. It provides an efficient way to handle structured data, including tabular data such as spreadsheets and SQL tables. In our case study, we have a DataFrame containing company data with columns ‘cnpj’, ‘valor_cota’, ’ln_ativo’, ’ln_IBX’, ‘CDI’, and ’tracking’. We need to perform several calculations on this data, such as calculating the return of CDI (Cotacao Diaria Imobiliaria) for each company over a period of 5 years, the annualized return of the stock value, the volatility of the asset, the alpha and beta values, the Sharpe ratio, and the tracking error.

Step 1: Data Preparation

To start with, we need to prepare our data by importing necessary libraries, defining our DataFrame, and performing some basic data cleaning.

import numpy as np
import pandas as pd

# Define the company data
data = {'cnpj': ['cat1', 'cat1', 'cat1', 'cat2', 'cat2', 'cat2', 'cat3', 'cat3', 'cat3'],
        'valor_cota': [1114.7521, 1135.3557, 1149.3381, 500, 501, 510, 10000, 10050, 10100 ],
        'ln_ativo': [0.02715207, 0.01831395, 0.01224022, 0.00200200, 0.00199800, 0.01780462, -0.00697561, 0.00498754, 0.00496279 ],
        'ln_IBX': [-0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953 ],
        'CDI' : [23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001 ],
        'tracking' : [0.030810905, 0.014623144, 0.004280688, 0.005660841, -0.0016928, 0.009845095, -0.00331678, 0.001296739, -0.00299674]
    }

# Create the DataFrame
df = pd.DataFrame(data)

Step 2: Data Transformation

Now that we have our data prepared, let’s transform it by applying various mathematical functions to each column. We will calculate the return of CDI (Cotacao Diaria Imobiliaria) for each company over a period of 5 years, the annualized return of the stock value, the volatility of the asset, the alpha and beta values, the Sharpe ratio, and the tracking error.

# Function to calculate return of CDI for each company over a period of 5 years
def v1(g):
    first, last = g['CDI'].values[[0,-1]]
    return (last/first)-1

# Function to calculate annualized return of the stock value
def v2(g):
    first, last = g['valor_cota'].values[[0,-1]]
    result = last/first-1
    return ((1+result)**(1/5))-1

# Function to calculate volatility of the asset
def v4(g):
    return (g['ln_ativo']).std() * (252**0.5)

# Function to calculate alpha value
def v5(g):
    slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
    return intercept

# Function to calculate beta value
def v6(g):
    slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
    return slope

# Function to calculate Sharpe ratio
def v7(g):
    return (v2(g)-v1(g))/v4(g)

# Function to calculate tracking error
def v8(g):
    return ((g['tracking']).std())

# Function to calculate R-squared value
def v9(g):
    slope, intercept = np.polyfit(g['ln_ativo'], g['ln_IBX'], 1)
    r_squared = 1 - (sum((g['ln_IBX'] - (slope * g['ln_ativo'] + intercept))**2) / ((len(g['ln_IBX']) - 1) * np.var(g['ln_IBX'], ddof=1)))
    return r_squared

Step 3: Data Visualization

Now that we have transformed our data, let’s visualize the results in a meaningful way. We will group our DataFrame by ‘cnpj’ and apply each transformation function to get the desired output.

# Group the DataFrame by 'cnpj' and apply each transformation function
grouped = df.groupby('cnpj').apply(lambda g: [v1(g),v2(g),v4(g),v5(g),v6(g),v7(g),v8(g),v9(g)])

# Construct a new DataFrame with the result
out = pd.DataFrame(grouped.tolist(), index=grouped.index, 
                   columns=['ret_5_CDI','t_5_Asset','anual_5','vol','alfa',
                            'beta','sharpe','tracking_err','R^2']).reset_index()

# Display the output
print(out)

Conclusion

In this article, we explored how to transform a pandas DataFrame by applying various mathematical functions and then visualizing the results in a meaningful way. We used Python with its popular libraries pandas, numpy, and matplotlib to achieve this. The code provided can be easily extended or modified as per our requirements.

Hugo Shortcodes

https://gohugo.io/content-addressing/

https://pandas.pydata.org/

https://numpy.org/


Last modified on 2024-04-26