LLMs

Large Language Models (LLMs) can be used to summarize, interpret, and contextualize findings from BDX data. By processing HVAC and energy system data, LLMs can generate insights in a readable, action-oriented format, assisting building operators, facility managers, and analysts in making data-driven decisions.

Example Use Case: VAV Air System Analysis There is an example using the OpenAI Python package to send BDX data for a VAV air system, prompting the model to generate a summary of airflow anomalies. The LLM can help highlight abnormal trends, suggest possible causes, and provide recommended actions based on the data.

Using BDXpy with OpenAI API

This guide explains how to use bdxpy with an OpenAI API key to analyze HVAC airflow data. The script retrieves airflow data from a BDX instance, detects anomalies, and generates a summary using OpenAI's API. Sending data to OpenAI is one piece that will be the focus of this example. But how you define data inputs, prompt engineering text, and select models is a crucial piece that each engineer or developer needs to review and account for.

OpenAI API Rate Limits

OpenAI imposes rate limits on API calls, which vary based on the model and subscription plan. As of recent updates, rate limits typically include: - GPT-4o & GPT-4: Limited to a set number of requests per minute (RPM) and tokens per minute (TPM). - GPT-3.5: More relaxed limits but still subject to RPM and TPM constraints. - Free-tier users have significantly lower limits compared to API subscription plans.

Refer to OpenAI’s official rate limits documentation for the most up-to-date details.

Why Sending Large Volumes of raw BDX Data is Not Recommended

The example script retrieves extensive airflow data from a BDX instance, but sending large datasets to OpenAI is inefficient due to:
1. API Rate Limits: Exceeding limits can result in throttling or failed requests.
2. High Token Costs: Large payloads consume more tokens, increasing costs.
3. Performance Delays: Processing large text blocks slows down response times and API calls.

Recommended Approach:

Use OpenAI as a Framework: Instead of processing large volumes, focus on targeted statistics and smaller timeframes.
Pre-process Data Locally: Summarize key metrics (e.g., top anomalies, AHU-wide trends) before sending to OpenAI.
Batch Requests: Instead of one large request, break it into smaller, meaningful prompts.

Importance of Prompt Engineering

Prompt engineering plays a critical role in obtaining useful outputs from OpenAI models.

Key Strategies:

Be Specific: Provide context and constraints to guide responses.
Use Formatting Cues: Structure prompts using bullet points, tables, or numbered lists for better parsing.
Avoid Ambiguity: Clearly define what constitutes an anomaly or significant event in BDX data.
Iterate and Refine: Test different prompts to improve accuracy and relevance.

Model Selection Impact

Different OpenAI models produce varying results:

GPT-4o (Best for complex, structured data insights)
GPT-3.5 (Faster and cheaper but less accurate for nuanced analysis)
Fine-tuned Models (Can be trained on historical BDX data for better contextual responses)

Choosing the Right Model

Use Case	Recommended Model
Detailed anomaly detection	GPT-4o
General HVAC summaries	GPT-3.5
Custom BDX optimizations	Fine-tuned model

Final Recommendations

Use OpenAI strategically to analyze key data points, not raw time-series data.
Fine-tune prompts to get actionable insights instead of general responses.
Optimize data selection before API calls to reduce costs and improve relevance.
Choose models based on complexity and budget.

By following these best practices, you can maximize OpenAI's value while efficiently leveraging BDX data for meaningful insights.

Example Code

Below is example code where you can insert BDXpy code to generate chart on the left and an html summary of two difference responses back from OpenAI's API.
Note: these are purely for API example purposes and need heavy modification for custom implementation elsewhere.

Show Code

    import openai
    import networkx as nx
    from pyvis.network import Network
    import pandas as pd
    from bdx.core import BDX
    from bdx.auth import UsernameAndPasswordAuthenticator
    from bdx.types import TimeFrame, AggregationLevel
    import matplotlib.pyplot as plt
    import matplotlib.colors as mcolors
    import json
    import os
    import markdown
    from dotenv import load_dotenv

    # Load environment variables
    load_dotenv()

    # OpenAI API Key
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

    # BDX Credentials
    BDX_URL = os.getenv("BDX_URL")
    USERNAME = os.getenv("BDX_USERNAME")
    PASSWORD = os.getenv("BDX_PASSWORD")
    BUILDING_NAME = "Apex Building" #building name to match component lookups on

    # AHUs to lookup VAVs on per matching logic below
    AHU_NUMBERS = [1, 2, 3, 4, 6, 8]

    # Connect to BDX
    auth = UsernameAndPasswordAuthenticator(USERNAME, PASSWORD)
    with BDX(BDX_URL, auth) as bdx:
        buildings = bdx.buildings.list()
        matching_buildings = [b for b in buildings if b.name.lower() == BUILDING_NAME.lower()]
        if not matching_buildings:
            print(f"No building found with the name: {BUILDING_NAME}")
            exit()

        BUILDING_ID = matching_buildings[0].componentInstanceId
        all_components = bdx.components.by_building(building_id=BUILDING_ID)

        # Map AHUs
        ahu_names = {f"AHU_{num}": f"AHU {num}" for num in AHU_NUMBERS}

        # Filter for VAVs
        vav_components = [
            comp for comp in all_components
            if "VAV_" in comp.path.displayName and any(comp.path.displayName.startswith(f"VAV_{ahu}_") for ahu in AHU_NUMBERS)
        ]

        # Map VAVs to AHUs
        vav_to_ahu = {}
        for vav in vav_components:
            ahu_number = vav.path.displayName.split("_")[1]
            if f"AHU_{ahu_number}" in ahu_names:
                vav_to_ahu[vav.path.displayName] = f"AHU_{ahu_number}"

        # Retrieve airflow data for two timeframes
        timeframe_current = TimeFrame.last_7_days()
        timeframe_previous = TimeFrame.last_n_days(14)

        properties = [{"componentPathId": vav.path.componentPathId, "propertyName": "airFlow"} for vav in vav_components]

        # Fetch Data
        trend_data_current = bdx.trending.retrieve_data(properties, timeframe_current, AggregationLevel.HOURLY)
        trend_data_previous = bdx.trending.retrieve_data(properties, timeframe_previous, AggregationLevel.HOURLY)

        df_current = trend_data_current.dataframe.fillna(0).set_index("time")
        df_previous = trend_data_previous.dataframe.fillna(0).set_index("time")
        df_previous = df_previous.iloc[:len(df_current)]

        # Compute Percent Differences
        anomalies = []
        all_percent_diffs = {}
        all_current_airflows = {}

        for vav in vav_components:
            comp_id = vav.path.componentPathId
            display_name = vav.path.displayName

            current_airflow = df_current.sum().get(f"{comp_id}_airFlow", 0)
            previous_airflow = df_previous.sum().get(f"{comp_id}_airFlow", 0)

            if previous_airflow != 0:
                percent_diff = ((current_airflow - previous_airflow) / previous_airflow) * 100
            else:
                percent_diff = 0

            all_percent_diffs[display_name] = percent_diff
            all_current_airflows[display_name] = current_airflow

            if abs(percent_diff) > 20:
                anomalies.append({
                    "VAV": display_name,
                    "Current Airflow": round(current_airflow, 2),
                    "Previous Airflow": round(previous_airflow, 2),
                    "Change (%)": round(percent_diff, 2)
                })

    # -------------------
    # Generate Summary with OpenAI (Using GPT-4o)

    # -------------------
    def generate_summary(anomalies):
        if not anomalies:
            return "<p>No significant anomalies detected in VAV airflow this week.</p>"

    # Customize this prompt depending on your model, needs, performance of the response, etc. **** this prompt engineering is a very important step 

        prompt_text = f"""
        Given the following data on airflow changes for VAVs in a building:
        {json.dumps(anomalies, indent=2)}

        ### **Summary Instructions**
        - **Only report the most significant anomalies** (up to **5 individual VAVs**) OR if there is a **system-wide AHU issue** (total airflow of all VAVs under an AHU changes drastically).
        - **Exclude moderate changes** – I only care about extreme cases that could indicate performance, comfort, or system inefficiencies.
        - **If there are no significant changes**, state: "No major anomalies detected this week."
        - **Airflow data provided in an accumulation of CFM so units are CF
        - **Format the response as concise bullet points**, using **Markdown formatting** for readability.

        ### **Response Format**
        - **Key Findings**  
            - **VAV_3_3:** Airflow increased **+89.09%** 
            🔹 Likely cause: [Occupancy shift / Calibration issue / Setpoint change]  
            🔹 Recommended action: [Verify control settings / Check mechanical operation]  

        - **If AHU-wide issues exist, summarize them separately**  
            - **AHU-1 System-Wide Change:** Total airflow increased by **+250,000 CF**, possibly due to [scheduling changes / pressure setpoint shift].  
            🔹 Recommended action: [Check AHU damper settings / Review scheduling].  

        Make sure the response is **short, direct, and action-oriented**.
        """

        client = openai.OpenAI(api_key=OPENAI_API_KEY)

    # if using a specific role/content/assistant in OpenAI make sure to correction specify in the client.chat.completions.create()

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are an expert in HVAC systems analyzing airflow changes."},
                {"role": "user", "content": prompt_text}
            ]
        )

        markdown_summary = response.choices[0].message.content
        return markdown.markdown(markdown_summary)  

    summary_text = generate_summary(anomalies)

    # -------------------
    # Generate PyVis Network Chart (With Hover Labels & Correct Sizing)
    # -------------------
    html_path = "VAV_network_summary_openai.html"
    net = Network(height="100vh", width="100%", notebook=True, directed=False)

    # Maintain PyVis physics settings
    net.barnes_hut(gravity=-7000, central_gravity=0.2, spring_length=50, spring_strength=0.03)

    # Add Building Node
    net.add_node("Building", size=100, color="#3e3e3e", label=f"Building: {BUILDING_NAME}", font={"size": 50})

    # Add AHU Nodes
    for ahu, ahu_label in ahu_names.items():
        net.add_node(ahu, size=50, color="#f5d76e", label=ahu_label, font={"size": 40})
        net.add_edge("Building", ahu)

    # Determine maximum airflow for scaling
    overall_max_airflow = max(all_current_airflows.values(), default=1)

    # Add VAV Nodes with Hover Labels and Correct Sizing
    colormap = plt.get_cmap("RdBu_r")
    vmin = min(all_percent_diffs.values(), default=-1)
    vmax = max(all_percent_diffs.values(), default=1)
    norm = mcolors.TwoSlopeNorm(vmin=vmin, vcenter=0, vmax=vmax)

    for vav_name, ahu_name in vav_to_ahu.items():
        percent_diff = all_percent_diffs.get(vav_name, 0)
        current_airflow = all_current_airflows.get(vav_name, 0)

        rgba_color = colormap(norm(percent_diff))
        hex_color = mcolors.to_hex(rgba_color)

        # Scale VAV size based on airflow
        node_size = 5 + (50 * (current_airflow / overall_max_airflow))

        net.add_node(
            vav_name, 
            size=node_size, 
            color=hex_color, 
            title=f"{vav_name} - % Change: {percent_diff:.2f}%, Airflow: {current_airflow:.2f}",
            font={"size": 30}
        )
        net.add_edge(ahu_name, vav_name, width=1)

    # Save chart
    net.save_graph(html_path)

    # -------------------
    # Write Final HTML
    # -------------------
    with open(html_path, "w", encoding="utf-8") as f:
        f.write(f"""
        <html>
        <head>
            <style>
                body {{
                    font-size: 12px; /* Base font size for body text */
                }}
                .container {{
                    display: flex;
                    height: 100vh;
                    width: 100%;
                }}
                .left {{
                    width: 50%;
                    height: 100%;
                    overflow-y: hidden; /* No scrollbar on left (chart) */
                }}
                .right {{
                    width: 50%;
                    background: #f4f4f4;
                    padding: 20px;
                    overflow-y: auto; /* Keep scrollbar on right (text) */
                    font-size: 10px !important; /* Force body text size with !important */
                    line-height: 1.5;
                    box-sizing: border-box;
                }}
                /* Force smaller header sizes in .right with higher specificity */
                .right h1 {{
                    font-size: 16px !important; /* Smaller headline */
                }}
                .right h2 {{
                    font-size: 14px !important; /* Smaller subhead */
                }}
                .right h3 {{
                    font-size: 12px !important; /* Match body text size */
                }}
                /* Ensure all text in .right inherits the base size */
                .right * {{
                    font-size: 10px !important; /* Apply to all elements in .right */
                }}
            </style>
        </head>
        <body>
            <div class="container">
                <div class="left">{net.generate_html()}</div>
                <div class="right">{summary_text}</div>
            </div>
        </body>
        </html>
        """)


    print(f"Final version saved in '{html_path}'. Open in a browser.")