Step 1 : Put together the Setup and Retrieve Preliminary Knowledge
- Goal: Configure the scraping script with related enter variables, together with the goal establishment’s Central Index Key (CIK) and HTTP headers to emulate browser conduct.
- Aim: Collect a listing of submitting URLs from the EDGAR database for additional processing.
Right here’s how:
- Arrange Python together with Selenium for the online browser automation duties.
- Enter mandatory credentials, such because the establishment’s CIK, into your code as parameters.
- Use request headers to emulate legit browser exercise, serving to to attenuate blocking by servers.
# Setup preliminary variables
cik = '0001067983' # Instance CIK for Berkshire Hathaway
headers = {'Person-Agent': '[email protected]'}
# Fetch preliminary submitting URLs
response = requests.get(f"https://information.sec.gov/submissions/CIK{cik}.json", headers=headers)
# Dealing with of the response goes right here
Section 2 — Retrieving Complete Holdings Knowledge with Selenium: On this part, the script goes via every submitting URL to seek out and collect the XML URLs that comprise details about the holdings. This step requires real-time interplay with the online web page to ensure that every one required hyperlinks are recognized and picked up.
# Initialize Selenium WebDriver
driver = webdriver.Chrome()
for filing_url in filing_urls:
driver.get(filing_url)
# Code to find and extract XML URL
# Instance snippet:
xml_url = driver.find_element(By.XPATH, "//a[contains(@href, '.xml')]").get_attribute('href')
Section 3 — Knowledge Compilation and Refinement: As soon as the detailed information is collected, it’s parsed and arranged right into a well-structured format. This course of contains cleansing and reformatting the information to make sure its usefulness and integrity, akin to API outcomes however with extra guide intervention for adjustment.
# Instance snippet to parse and clear information
dataframes = []
for xml_url in xml_urls:
information = pd.read_xml(xml_url)
cleaned_data = information.dropna() # Simplified cleansing instance
dataframes.append(cleaned_data)
final_df = pd.concat(dataframes)
2. Analyzing Investor Portfolio Evolution Over Time
After efficiently retrieving institutional buyers’ portfolio information, we are able to start analyzing how their holdings change over time.
Step one includes changing the ‘date’ column right into a datetime format. This allows us to simply extract the 12 months and quarter from every information entry. Subsequent, we choose the related columns from the unique DataFrame, renaming some for readability. Particularly, we retain ‘12 months’, ‘quarter’, ‘investorName’, ‘image’ (which we rename to ‘ticker’), ‘weight’, and ‘marketValue’ (which we rename to ‘worth’).
A essential job is calculating every investor’s complete portfolio worth for each quarter. We use the groupby()
operate to combination the ‘worth’ by ’12 months’, ‘quarter’, and ‘investorName’, and retailer the entire portfolio worth in a brand new column known as ‘total_portfolio_value’.
With the entire portfolio worth computed, we are able to then derive the share weight of every asset inside the portfolio. That is achieved by creating a brand new column, ‘computed_weight’, which represents this share. Optionally, we are able to rename this column again to ‘weight’ to keep up consistency.
At this level, the reworked DataFrame incorporates all of the important data for analyzing modifications in investor portfolios over time, together with the 12 months, quarter, investor title, ticker, weight, and complete portfolio worth.
import requests
import pandas as pd
import matplotlib.pyplot as plt
# Set the bottom URL for API entry
api_endpoint = 'https://financialmodelingprep.com/api/v4/institutional-ownership/portfolio-holdings'# Outline the quarter-end dates for the information we wish to fetch
dates_of_interest = [
'2022-03-31', '2022-06-30', '2022-09-30', '2022-12-31',
'2023-03-31', '2023-06-30', '2023-09-30', '2023-12-31',
'2024-03-31'
]# Listing of institutional identifiers (CIKs) to question information for
investor_ciks = [
'0001067983',
'0000019617',
'0000070858',
'0000895421',
'0000886982'
]# Initialize an empty DataFrame to retailer all of the retrieved information
consolidated_data = pd.DataFrame()# Placeholder on your precise API key
api_token = ''# Loop over the CIKs and quarter-end dates to retrieve information
for cik in investor_ciks:
for date in dates_of_interest:
# Arrange the parameters for the API request
params = {
'date': date, # Use the present quarter-end date
'cik': cik, # Use the present investor CIK
'web page': 0, # We assume you are solely within the first web page of outcomes
'apikey': api_token
}# Make the API request for the present CIK and quarter
api_response = requests.get(api_endpoint, params=params)if api_response.status_code == 200:
# Course of the fetched information
# If the request succeeds, convert the information to a DataFrame and append it
fetched_data = api_response.json()
cik_data_df = pd.DataFrame(fetched_data)
consolidated_data = pd.concat([consolidated_data, cik_data_df], ignore_index=True)
else:
print(f"Didn't fetch information for CIK {cik} on {date}: {api_response.status_code}")
consolidated_data['date'] = pd.to_datetime(consolidated_data['date'])
consolidated_data['year'] = consolidated_data['date'].dt.12 months
consolidated_data['quarter'] = consolidated_data['date'].dt.quarter# Rework the information by choosing the related columns
portfolio_data = consolidated_data[['year', 'quarter', 'investorName', 'symbol', 'weight', 'marketValue']].copy()# Rename columns for higher readability
portfolio_data.rename(columns={'image': 'ticker', 'marketValue': 'worth'}, inplace=True)# Convert the 'worth' column to numeric for additional calculations
portfolio_data['value'] = pd.to_numeric(portfolio_data['value'])# Calculate complete portfolio worth by grouping information by investor and quarter
portfolio_totals = portfolio_data.groupby(['year', 'quarter', 'investorName'])['value'].sum().reset_index(title='total_portfolio_value')# Merge the entire portfolio worth again to the unique information
portfolio_data = pd.merge(portfolio_data, portfolio_totals, on=['year', 'quarter', 'investorName'])# Calculate every asset's weight within the complete portfolio
portfolio_data['calculated_weight'] = portfolio_data['value'] / portfolio_data['total_portfolio_value']# Drop the unique 'weight' column and rename the brand new weight column
portfolio_data.drop('weight', axis=1, inplace=True)
portfolio_data.rename(columns={'calculated_weight': 'weight'}, inplace=True)# Filter out belongings with weight lower than 5% for the ultimate dataset
final_portfolio_data = portfolio_data[portfolio_data['weight'] > 0.05].copy()# Create a string for the quarter interval
final_portfolio_data['period'] = final_portfolio_data['year'].astype(str) + ' Q' + final_portfolio_data['quarter'].astype(str)# Type information for higher readability within the plot
final_portfolio_data.sort_values(by=['period', 'investorName', 'ticker'], inplace=True)# Calculate complete worth for every investor for every interval
total_portfolio_values = final_portfolio_data.groupby(['period', 'investorName'])['value'].sum().reset_index(title='total_value')# Merge complete portfolio worth again to the ultimate information
final_portfolio_data = pd.merge(final_portfolio_data, total_portfolio_values, on=['period', 'investorName'], how='left')# Format the entire worth as a string for straightforward studying
final_portfolio_data['total_value_text'] = '$' + (final_portfolio_data['total_value'] / 1e9).spherical(2).astype(str) + 'B'# Initialize a plot for the bar chart
fig, ax = plt.subplots(figsize=(24, 12))# Create dictionaries to trace the distinctive labels
unique_periods = final_portfolio_data['period'].distinctive()
unique_investors = final_portfolio_data['investorName'].distinctive()
unique_tickers = final_portfolio_data['ticker'].distinctive()# Assign colours to every distinctive ticker
ticker_colors = {ticker: plt.cm.tab20(i/len(unique_tickers)) for i, ticker in enumerate(unique_tickers)}# Create index mappings for straightforward place plotting
period_index_map = {interval: i for i, interval in enumerate(unique_periods)}
investor_index_map = {investorName: i for i, investorName in enumerate(unique_investors)}# Set parameters for the bar plot
bar_width = 0.2
spacing_between_bars = 0.02# Create the bars for every interval and investor's portfolio
for interval in unique_periods:
period_index = period_index_map[period]for investorName in unique_investors:
investor_portfolio = final_portfolio_data[(final_portfolio_data['period'] == interval) & (final_portfolio_data['investorName'] == investorName)]
if not investor_portfolio.empty:
bar_bottom_position = 0
total_portfolio_text = investor_portfolio['total_value_text'].iloc[0]for _, row in investor_portfolio.iterrows():
bar_position = period_index + investor_index_map[investorName]*(bar_width + spacing_between_bars)
asset_weight = row['weight']
ax.bar(bar_position, asset_weight, backside=bar_bottom_position, width=bar_width, shade=ticker_colors[row['ticker']], edgecolor='white')
bar_bottom_position += asset_weight# Add textual content annotations inside the bar
ax.textual content(bar_position, bar_bottom_position - (asset_weight / 2), f"{row['ticker']}n{row['weight']*100:.1f}%n${row['value']/1e9:.1f}B", ha='heart', va='heart', fontsize=9, rotation=90)# Add complete portfolio worth beside the investor title
# Customise the plot format and show the graph
investor_label = f"{investorName} ({total_portfolio_text})"
ax.textual content(bar_position, bar_bottom_position + 0.03, investor_label, ha='heart', va='backside', fontsize=12, rotation=90, rework=ax.transData)
ax.set_xticks([i + (bar_width + spacing_between_bars) * len(unique_investors) / 2 - bar_width/2 for i in range(len(unique_periods))])
ax.set_ylim(0, ax.get_ylim()[1] * 1.4)
ax.set_xticklabels(unique_periods)
ax.set_ylabel('Weight (%)')
ax.set_title('Quarterly Investor Portfolio Breakdown (Belongings > 5% Weight)', fontsize=20)
plt.xticks(rotation=45)
plt.tight_layout()
plt.present()
Assessing the modifications within the portfolios of institutional buyers on a quarterly foundation reveals priceless insights into their buying and selling patterns.
1. Outline a customized colormap
We provoke a customized colormap utilizing LinearSegmentedColormap
from Matplotlib’s colours
module. This colormap visualizes relative portfolio weight fluctuations, the place:
- Pink signifies a discount.
- White signifies minimal or no change.
- Inexperienced suggests a rise.
2. Extract distinctive quarters
Subsequent, we extract all distinctive quarter-end dates from the DataFrame all_data_df
and type them in reverse order, beginning with the most recent quarter.
3. Course of every quarter’s information
For every quarter, we apply the next steps:
- Filter the information for that particular quarter.
- Group the information by investor title and inventory ticker (
image
), computing the common worth of therelative_change_pct
column. - Rework the grouped information right into a matrix format, the place:
- Rows signify buyers.
- Columns signify inventory tickers.
- Every cell holds the common change in portfolio weight for the corresponding investor and inventory.
4. Generate the heatmap plot
We use Seaborn’s heatmap()
operate to plot the information as a heatmap. The cells are coloured utilizing our beforehand outlined colormap, with the midpoint representing no change (worth = 0).
5. Improve the heatmap’s look
The determine’s dimension, gridlines, and tick labels are personalized to reinforce visible readability:
- Set the linewidths and line colours for the grid.
- Label the x and y-axes.
- Apply correct title and rotation settings for axis tick labels.
6. Show the heatmap
Lastly, the plot is displayed with plt.present()
, offering an accessible view of the investor exercise throughout the quarter, together with the purchase and promote actions for every ticker.
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colours import LinearSegmentedColormap
# Outline a customized colormap from crimson (lower) to white (no change) to inexperienced (improve)
colormap = ["red", "white", "green"] # Represents lower, no change, and improve
color_map_name = "custom_map"
custom_colormap = LinearSegmentedColormap.from_list(color_map_name, colormap)# Extract all distinct quarter-year mixtures
quarters_available = all_data_df['quarter_year'].distinctive()
quarters_available = sorted(quarters_available, key=lambda x: pd.Interval(x), reverse=True)# Iterate over the quarters to generate the heatmap for every one
for current_quarter in quarters_available:
# Filter the information for the present quarter
current_quarter_data = all_data_df[all_data_df['quarter_year'] == current_quarter]# Group the information by investor and asset ticker, calculating the imply relative change share
grouped_data = current_quarter_data.groupby(['investorName', 'symbol'])['relative_change_pct'].imply().reset_index()# Pivot the grouped information right into a matrix with buyers as rows, tickers as columns, and relative change share as values
pivot_data = grouped_data.pivot_table(index='investorName', columns='image', values='relative_change_pct', fill_value=0)# Plot the heatmap with customized determine dimension and gridline changes
plt.determine(figsize=(30, 15)) # Enhance the determine dimension for clearer visibility
ax = sns.heatmap(pivot_data, cmap=custom_colormap, heart=0, annot=False, cbar=True,
linewidths=0.05, linecolor='grey') # Customise gridline look
plt.title(f'Portfolio Change for {current_quarter} (Inexperienced: Enhance, Pink: Lower, White: Little to No Change)', fontsize=16)
plt.xlabel('Ticker (Asset)', fontsize=14)
plt.ylabel('Investor', fontsize=14)
plt.xticks(rotation=90, fontsize=12) # Rotate and dimension x-axis labels
plt.yticks(rotation=0, fontsize=12) # Regulate y-axis label look
plt.present()