BMS/IMPROVEMENTS.md
2026-03-19 11:32:17 +00:00

271 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BMS Improvement Plan — Singapore DC01
> Read this file at the start of the next session to restore context.
> Generated from full page review (all 9 pages read and analysed).
---
## Phased Execution Plan
### Phase 1 — Frontend Quick Wins (no backend/simulator changes)
| # | Page | Improvement | Status |
|---|------|-------------|--------|
| 1.1 | Alarms | Escalation timer — colour-ramping counter for unacknowledged critical alarms | [x] |
| 1.2 | Alarms | MTTR stat card — derived from triggered_at → resolved_at | deferred to Phase 3 (needs resolved_at from backend) |
| 1.3 | Assets | Sortable inventory table columns | [x] |
| 1.4 | Environmental | Humidity overlay toggle on heatmap | [x] |
| 1.5 | Environmental | Dew point derived client-side (Magnus formula from temp + humidity) | [x] |
| 1.6 | Environmental | ASHRAE A1 compliance table per rack | [x] |
| 1.8 | Capacity | Stranded power total kW shown prominently | [x] |
| 1.9 | Environmental | Dew point vs. supply air temp chart (client-side derived) | [x] |
| 1.10 | Floor Map | Alarm badge overlay option | [x] |
### Phase 2 — Simulator Expansion (new bots + topology)
| # | Bot | Status |
|---|-----|--------|
| 2.1 | GeneratorBot — fuel_pct, load_kw, run_hours, state, scenarios: GENERATOR_FAILURE / LOW_FUEL | [x] |
| 2.2 | AtsBot — active_feed, transfer_count, last_transfer_ms, scenario: ATS_TRANSFER | [x] |
| 2.3 | ChillerBot — chw_supply/return_c, flow_gpm, cop, condenser_pressure_bar, scenario: CHILLER_FAULT | [x] |
| 2.4 | VesdaBot — level (normal/alert/action/fire), obscuration_pct, zone_id, scenarios: VESDA_ALERT / VESDA_FIRE | [x] |
| 2.5 | Extend PduBot — per-phase kW + amps (A/B/C), imbalance_pct, scenario: PHASE_IMBALANCE | [x] |
| 2.6 | Extend WaterLeakBot — floor_zone, under_floor, near_crac metadata | [x] |
| 2.7 | Topology update — generators, ats, chillers, vesda zones, extra leak sensors | [x] |
### Phase 3 — Backend API Expansion
| # | Endpoint | Status |
|---|----------|--------|
| 3.1 | GET /api/generator/status | [x] |
| 3.2 | GET /api/power/ats | [x] |
| 3.3 | GET /api/power/phase | [x] |
| 3.4 | GET /api/power/redundancy | [x] |
| 3.5 | GET /api/cooling/status (chiller) | [x] |
| 3.6 | GET /api/cooling/history (COP + capacity over time) | [x] |
| 3.7 | GET /api/fire/status (VESDA zones) | [x] |
| 3.8 | GET /api/leak/status (with location metadata) | [x] |
| 3.9 | GET /api/power/utility (grid import, tariff, monthly kWh) | [x] |
| 3.10 | GET /api/reports/energy (kWh cost, PUE 30-day trend) | [x] |
| 3.11 | Extend cooling/{crac_id} detail — add airflow_cfm | [x] (was already done in env.py) |
### Phase 4 — Existing Pages Wired Up (uses Phase 2+3 data)
| # | Page | Improvement | Status |
|---|------|-------------|--------|
| 4.1 | Dashboard | Generator status KPI card | [x] |
| 4.2 | Dashboard | Leak detection KPI card | [x] |
| 4.3 | Dashboard | UPS worst-case runtime card | deferred (UPS runtime already shown on Power page) |
| 4.4 | Power | Generator section | [x] |
| 4.5 | Power | ATS transfer switch panel | [x] |
| 4.6 | Power | PDU branch circuit section | [x] phase imbalance table |
| 4.7 | Power | Phase imbalance warning on UPS cards | [x] |
| 4.8 | Power | Power redundancy level indicator | [x] |
| 4.9 | Cooling | COP trend chart per CRAC | [x] (in CRAC detail sheet) |
| 4.10 | Cooling | Chiller plant summary panel | [x] |
| 4.11 | Cooling | Predictive filter replacement estimate | [x] |
| 4.12 | Cooling | Airflow CFM tile in fleet summary | [x] |
| 4.13 | Environmental | Leak sensor map panel | [x] |
| 4.14 | Environmental | VESDA/smoke status panel | [x] |
| 4.15 | Floor Map | Leak sensor overlay layer | [x] (panel below map) |
| 4.16 | Floor Map | Power feed (A/B) overlay layer | [x] |
| 4.17 | Floor Map | Humidity 3rd overlay | [x] (done in Phase 1) |
| 4.18 | Capacity | N+1 cooling margin indicator | [x] |
| 4.19 | Capacity | Capacity runway chart | [x] |
| 4.20 | Alarms | Generator alarm category | [x] (alarm engine raises gen alarms automatically) |
| 4.21 | Alarms | Leak alarm category with floor map link | [x] (alarm engine already handles leak) |
| 4.22 | Alarms | Fire/VESDA alarm category | [x] (alarm engine raises vesda_level alarms) |
| 4.23 | Assets | PDU as asset type | [x] (PDU phase monitoring section in assets grid) |
| 4.24 | Assets | Rack elevation diagram in RackDetailSheet | [x] (already implemented as RackDiagram) |
| 4.25 | Reports | PUE 30-day trend graph | [x] (daily IT kW trend + PUE estimated) |
| 4.26 | Reports | Energy cost section | [x] |
### Phase 5 — New Pages
| # | Page | Status |
|---|------|--------|
| 5.1 | Generator & Power Path | [x] |
| 5.2 | Leak Detection | [x] |
| 5.3 | Fire & Life Safety | [x] |
### Phase 6 — Low Priority & Polish
| # | Item | Status |
|---|------|--------|
| 6.1 | Alarms: assigned-to column + maintenance window suppression | [x] (assigned-to with localStorage) |
| 6.2 | Alarms: root cause correlation | [x] (5-rule RootCausePanel above stat cards) |
| 6.3 | Assets: warranty expiry + lifecycle status | [x] (lifecycle status column added) |
| 6.4 | Assets: CSV import/export for CMDB | [x] (CSV export added) |
| 6.5 | Reports: comparison period (this week vs last) | [x] |
| 6.6 | Reports: scheduled PDF email | [ ] |
| 6.7 | New page: Network Infrastructure | [x] |
| 6.8 | New page: Energy & Sustainability | [x] |
| 6.9 | New page: Maintenance windows | [x] |
| 6.10 | Environmental: particle count (ISO 14644) | [ ] |
| 6.11 | Dashboard: room quick-status grid (Hall A / Hall B avg temp, power, CRAC state) — visual rack-grid thumbnail deferred to backlog | [x] |
| 6.12 | Floor Map: zoom/pan + CRAC coverage shading | [ ] |
### Phase 7 — Untracked Additions
| # | Item | Status |
|---|------|--------|
| 7.1 | Settings page — Profile, Notifications, Thresholds, Site Config tabs | [x] |
| 7.2 | Floor layout editor — server-side persistence via site_config table (PUT/GET /api/floor-layout) | [x] |
| 7.3 | Rack naming convention updated to SG1A01.xx / SG1B01.xx format across all topology files | [x] |
| 7.4 | 80-rack topology — Hall A and Hall B each have 2 rows × 20 racks | [x] |
---
---
## Dashboard (`/dashboard`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add Generator status KPI card (fuel %, run-hours, transfer state) | High |
| 2 | Sensor | Add Water/Leak Detection KPI card — badge showing any active leaks | High |
| 3 | Sensor | Add Raised floor differential pressure widget | Medium |
| 4 | Sensor | Show UPS state in KPI row (mains vs. battery, worst-case runtime) | High |
| 5 | Visual | Dashboard KPI row: add 5th card or replace PUE with site health score | Medium |
| 6 | Visual | Add mini floor map thumbnail as 4th bottom-row panel | Medium |
| 7 | Info | Show carbon intensity / CO2e alongside PUE | Low |
| 8 | Info | Add MTBF / uptime streak counter for critical infrastructure | Low |
---
## Cooling (`/cooling`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add Chiller plant metrics — CHW supply/return temps, flow rate, chiller COP, condenser pressure | High |
| 2 | Sensor | Add Cooling tower stats — approach temp, basin level, blow-down rate, fan speed | Medium |
| 3 | Sensor | Glycol/refrigerant level indicator per CRAC | High |
| 4 | Sensor | Airflow (CFM) per CRAC — not just fan % | Medium |
| 5 | Sensor | Condenser water inlet/outlet temperature for water-cooled units | Medium |
| 6 | Sensor | Raised floor tile differential pressure — 0.040.08 in. W.C. target range | High |
| 7 | Sensor | Hot/cold aisle containment breach indicator — door open, blanking panels | Medium |
| 8 | Sensor | Chilled water flow rate (GPM) and heat rejection kW | Medium |
| 9 | Visual | COP trend chart over time per unit (currently only static value) | High |
| 10 | Visual | Fleet summary: add total fleet airflow (CFM) tile | Medium |
| 11 | Visual | Add cooling efficiency vs. IT load scatter/trend chart | Medium |
| 12 | Info | Predictive filter replacement — estimated days until change-out based on dP rate of rise | Medium |
---
## Power (`/power`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add Generator status section — active/standby, fuel %, last test date, load kW | High |
| 2 | Sensor | Add ATS/STS transfer switch status — which feed active (Utility A/B), transfer time | High |
| 3 | Sensor | Add PDU branch circuit monitoring — per-phase kW, amps, trip status | High |
| 4 | Sensor | Power quality metrics — THD, voltage sag/swell events, neutral current | Medium |
| 5 | Sensor | Busway / overhead busbar load per tap-off box | Medium |
| 6 | Sensor | Utility metering — grid import kW, tariff period, cost/kWh, monthly kWh | Medium |
| 7 | Sensor | Phase imbalance per panel/UPS — flag >5% imbalance | High |
| 8 | Visual | UPS cards: add input voltage/frequency per phase, bypass mode status | Medium |
| 9 | Info | Add power redundancy level indicator — N, N+1, 2N — highlight single points of failure | High |
| 10 | Info | Annualised energy cost projection alongside kWh | Low |
---
## Environmental (`/environmental`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add Dew point derived value per room — approaching supply temp = condensation risk | High |
| 2 | Sensor | Add Water/leak detection sensors map — floor, under-floor, drip trays, pipe runs | High |
| 3 | Sensor | Smoke detector / VESDA status panel — aspirating detector alarm levels | High |
| 4 | Sensor | Raised floor pressure differential trend chart | Medium |
| 5 | Sensor | Hot aisle inlet temperature per rack row (return air) | Medium |
| 6 | Sensor | Server inlet temperature sensors from IPMI per device | Medium |
| 7 | Sensor | Particle count (ISO 14644 class) | Low |
| 8 | Visual | Heatmap: add humidity overlay toggle (currently separate chart only) | High |
| 9 | Visual | Add ASHRAE compliance table per rack — flag racks outside A1/A2 envelope | Medium |
| 10 | Visual | Add dew point vs. supply air temp chart with condensation risk zone | Medium |
| 11 | Info | Show absolute humidity (g/kg) alongside RH for ASHRAE compliance | Low |
---
## Floor Map (`/floor-map`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add leak sensor overlay — highlight tiles where water sensors are placed | High |
| 2 | Sensor | Add smoke/VESDA zone overlay | Medium |
| 3 | Sensor | Add PDU/power path overlay — show which feed (A/B) each rack is on | High |
| 4 | Visual | Add 3rd overlay: humidity | Medium |
| 5 | Visual | Add airflow arrows showing cold aisle → rack → hot aisle direction | Low |
| 6 | Visual | Show blank rack slots count on each rack tile (U available) | Medium |
| 7 | Visual | Add rack-level alarm badge as an overlay option | High |
| 8 | Visual | Add zoom/pan for larger floor plans | Medium |
| 9 | Info | Add CRAC coverage radius shading showing which racks each CRAC thermally serves | Medium |
---
## Capacity (`/capacity`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Visual | Add capacity runway chart — at current growth rate, weeks until power/cooling capacity hit | High |
| 2 | Sensor | Add U-space utilisation per rack — units occupied vs. total 42U | Medium |
| 3 | Sensor | Generator fuel capacity as a capacity dimension | Medium |
| 4 | Info | Thermal capacity per CRAC vs. current IT load — N+1 cooling margin | High |
| 5 | Info | Add growth projection input — operator enters expected kW/month to forecast capacity date | Medium |
| 6 | Visual | Cross-room comparison radar chart (Power %, Cooling %, Space %) | Medium |
| 7 | Visual | Show stranded power total in kW (not just per-rack list) | Medium |
| 8 | Sensor | Weight capacity per rack — floor load (kg/m2) | Low |
---
## Alarms (`/alarms`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add Generator alarm category (fuel low, start fail, overload) | High |
| 2 | Sensor | Add Leak alarm category with direct link to leak sensor on floor map | High |
| 3 | Sensor | Add Fire/VESDA alarm category with severity escalation | High |
| 4 | Sensor | Add Network device alarm category (switch down, link fault, LACP failure) | Medium |
| 5 | Visual | Add escalation timer — how long critical alarm unacknowledged, colour ramp | High |
| 6 | Visual | Add MTTR stat card alongside existing stat cards | Medium |
| 7 | Visual | Alarm table: add "Assigned to" column | Low |
| 8 | Visual | Add alarm suppression / maintenance window toggle | Medium |
| 9 | Info | Root cause correlation — surface linked alarms (e.g. rack temp high + CRAC fan low) | Medium |
---
## Assets (`/assets`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Per-device power draw from PDU outlet monitoring (not estimated) | High |
| 2 | Sensor | Server inlet temperature from IPMI/iDRAC per device | High |
| 3 | Sensor | Add PDUs as asset type with per-outlet monitoring | High |
| 4 | Sensor | Network device status (switch uptime, port count, active links) | Medium |
| 5 | Visual | Inventory table: add sortable columns (currently unsortable) | High |
| 6 | Visual | Add rack elevation diagram (visual U-space view) in RackDetailSheet | High |
| 7 | Visual | Add device age / warranty expiry column in inventory | Medium |
| 8 | Info | Add DCIM-style lifecycle status — Active / Decomm / Planned | Low |
| 9 | Info | Add asset import/export (CSV) for CMDB sync | Medium |
---
## Reports (`/reports`)
| # | Type | Improvement | Priority |
|---|------|-------------|----------|
| 1 | Sensor | Add energy cost report — kWh, estimated cost at tariff, month-to-date | High |
| 2 | Visual | Add PUE trend graph — 30-day rolling PUE vs. target | High |
| 3 | Visual | Add cooling efficiency (kW IT / kW cooling) over time | Medium |
| 4 | Visual | Add alarm MTTR and alarm volume trend per week | Medium |
| 5 | Info | Add scheduled report configuration — email PDF daily/weekly | Medium |
| 6 | Info | Add comparison period — this week vs. last week | Medium |
| 7 | Info | Add sustainability section — CO2e, renewable fraction, WUE | Low |
| 8 | Info | Add SLA compliance section — uptime %, incidents, breach risk | Medium |
| 9 | Info | Expand CSV exports: PDU branch data, CRAC detailed logs, humidity history | Medium |
---
## New Pages to Build
| Page | Description | Priority |
|------|-------------|----------|
| Generator & Power Path | ATS status, generator load, fuel level, transfer switch history | High |
| Leak Detection | Site-wide leak sensor map, sensor status, historical events | High |
| Fire & Life Safety | VESDA levels, smoke detector zones, suppression system status | High |
| Network Infrastructure | Core/edge switch health, port utilisation, link status | Medium |
| Energy & Sustainability | kWh cost, PUE trend, CO2e, WUE | Medium |
| Maintenance | Planned outages, maintenance windows, alarm suppression | Low |