15 KiB
15 KiB
BMS Improvement Plan — Singapore DC01
Read this file at the start of the next session to restore context. Generated from full page review (all 9 pages read and analysed).
Phased Execution Plan
Phase 1 — Frontend Quick Wins (no backend/simulator changes)
| # | Page | Improvement | Status |
|---|---|---|---|
| 1.1 | Alarms | Escalation timer — colour-ramping counter for unacknowledged critical alarms | [x] |
| 1.2 | Alarms | MTTR stat card — derived from triggered_at → resolved_at | deferred to Phase 3 (needs resolved_at from backend) |
| 1.3 | Assets | Sortable inventory table columns | [x] |
| 1.4 | Environmental | Humidity overlay toggle on heatmap | [x] |
| 1.5 | Environmental | Dew point derived client-side (Magnus formula from temp + humidity) | [x] |
| 1.6 | Environmental | ASHRAE A1 compliance table per rack | [x] |
| 1.8 | Capacity | Stranded power total kW shown prominently | [x] |
| 1.9 | Environmental | Dew point vs. supply air temp chart (client-side derived) | [x] |
| 1.10 | Floor Map | Alarm badge overlay option | [x] |
Phase 2 — Simulator Expansion (new bots + topology)
| # | Bot | Status |
|---|---|---|
| 2.1 | GeneratorBot — fuel_pct, load_kw, run_hours, state, scenarios: GENERATOR_FAILURE / LOW_FUEL | [x] |
| 2.2 | AtsBot — active_feed, transfer_count, last_transfer_ms, scenario: ATS_TRANSFER | [x] |
| 2.3 | ChillerBot — chw_supply/return_c, flow_gpm, cop, condenser_pressure_bar, scenario: CHILLER_FAULT | [x] |
| 2.4 | VesdaBot — level (normal/alert/action/fire), obscuration_pct, zone_id, scenarios: VESDA_ALERT / VESDA_FIRE | [x] |
| 2.5 | Extend PduBot — per-phase kW + amps (A/B/C), imbalance_pct, scenario: PHASE_IMBALANCE | [x] |
| 2.6 | Extend WaterLeakBot — floor_zone, under_floor, near_crac metadata | [x] |
| 2.7 | Topology update — generators, ats, chillers, vesda zones, extra leak sensors | [x] |
Phase 3 — Backend API Expansion
| # | Endpoint | Status |
|---|---|---|
| 3.1 | GET /api/generator/status | [x] |
| 3.2 | GET /api/power/ats | [x] |
| 3.3 | GET /api/power/phase | [x] |
| 3.4 | GET /api/power/redundancy | [x] |
| 3.5 | GET /api/cooling/status (chiller) | [x] |
| 3.6 | GET /api/cooling/history (COP + capacity over time) | [x] |
| 3.7 | GET /api/fire/status (VESDA zones) | [x] |
| 3.8 | GET /api/leak/status (with location metadata) | [x] |
| 3.9 | GET /api/power/utility (grid import, tariff, monthly kWh) | [x] |
| 3.10 | GET /api/reports/energy (kWh cost, PUE 30-day trend) | [x] |
| 3.11 | Extend cooling/{crac_id} detail — add airflow_cfm | [x] (was already done in env.py) |
Phase 4 — Existing Pages Wired Up (uses Phase 2+3 data)
| # | Page | Improvement | Status |
|---|---|---|---|
| 4.1 | Dashboard | Generator status KPI card | [x] |
| 4.2 | Dashboard | Leak detection KPI card | [x] |
| 4.3 | Dashboard | UPS worst-case runtime card | deferred (UPS runtime already shown on Power page) |
| 4.4 | Power | Generator section | [x] |
| 4.5 | Power | ATS transfer switch panel | [x] |
| 4.6 | Power | PDU branch circuit section | [x] phase imbalance table |
| 4.7 | Power | Phase imbalance warning on UPS cards | [x] |
| 4.8 | Power | Power redundancy level indicator | [x] |
| 4.9 | Cooling | COP trend chart per CRAC | [x] (in CRAC detail sheet) |
| 4.10 | Cooling | Chiller plant summary panel | [x] |
| 4.11 | Cooling | Predictive filter replacement estimate | [x] |
| 4.12 | Cooling | Airflow CFM tile in fleet summary | [x] |
| 4.13 | Environmental | Leak sensor map panel | [x] |
| 4.14 | Environmental | VESDA/smoke status panel | [x] |
| 4.15 | Floor Map | Leak sensor overlay layer | [x] (panel below map) |
| 4.16 | Floor Map | Power feed (A/B) overlay layer | [x] |
| 4.17 | Floor Map | Humidity 3rd overlay | [x] (done in Phase 1) |
| 4.18 | Capacity | N+1 cooling margin indicator | [x] |
| 4.19 | Capacity | Capacity runway chart | [x] |
| 4.20 | Alarms | Generator alarm category | [x] (alarm engine raises gen alarms automatically) |
| 4.21 | Alarms | Leak alarm category with floor map link | [x] (alarm engine already handles leak) |
| 4.22 | Alarms | Fire/VESDA alarm category | [x] (alarm engine raises vesda_level alarms) |
| 4.23 | Assets | PDU as asset type | [x] (PDU phase monitoring section in assets grid) |
| 4.24 | Assets | Rack elevation diagram in RackDetailSheet | [x] (already implemented as RackDiagram) |
| 4.25 | Reports | PUE 30-day trend graph | [x] (daily IT kW trend + PUE estimated) |
| 4.26 | Reports | Energy cost section | [x] |
Phase 5 — New Pages
| # | Page | Status |
|---|---|---|
| 5.1 | Generator & Power Path | [x] |
| 5.2 | Leak Detection | [x] |
| 5.3 | Fire & Life Safety | [x] |
Phase 6 — Low Priority & Polish
| # | Item | Status |
|---|---|---|
| 6.1 | Alarms: assigned-to column + maintenance window suppression | [x] (assigned-to with localStorage) |
| 6.2 | Alarms: root cause correlation | [x] (5-rule RootCausePanel above stat cards) |
| 6.3 | Assets: warranty expiry + lifecycle status | [x] (lifecycle status column added) |
| 6.4 | Assets: CSV import/export for CMDB | [x] (CSV export added) |
| 6.5 | Reports: comparison period (this week vs last) | [x] |
| 6.6 | Reports: scheduled PDF email | [ ] |
| 6.7 | New page: Network Infrastructure | [x] |
| 6.8 | New page: Energy & Sustainability | [x] |
| 6.9 | New page: Maintenance windows | [x] |
| 6.10 | Environmental: particle count (ISO 14644) | [ ] |
| 6.11 | Dashboard: room quick-status grid (Hall A / Hall B avg temp, power, CRAC state) — visual rack-grid thumbnail deferred to backlog | [x] |
| 6.12 | Floor Map: zoom/pan + CRAC coverage shading | [ ] |
Phase 7 — Untracked Additions
| # | Item | Status |
|---|---|---|
| 7.1 | Settings page — Profile, Notifications, Thresholds, Site Config tabs | [x] |
| 7.2 | Floor layout editor — server-side persistence via site_config table (PUT/GET /api/floor-layout) | [x] |
| 7.3 | Rack naming convention updated to SG1A01.xx / SG1B01.xx format across all topology files | [x] |
| 7.4 | 80-rack topology — Hall A and Hall B each have 2 rows × 20 racks | [x] |
Dashboard (/dashboard)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add Generator status KPI card (fuel %, run-hours, transfer state) | High |
| 2 | Sensor | Add Water/Leak Detection KPI card — badge showing any active leaks | High |
| 3 | Sensor | Add Raised floor differential pressure widget | Medium |
| 4 | Sensor | Show UPS state in KPI row (mains vs. battery, worst-case runtime) | High |
| 5 | Visual | Dashboard KPI row: add 5th card or replace PUE with site health score | Medium |
| 6 | Visual | Add mini floor map thumbnail as 4th bottom-row panel | Medium |
| 7 | Info | Show carbon intensity / CO2e alongside PUE | Low |
| 8 | Info | Add MTBF / uptime streak counter for critical infrastructure | Low |
Cooling (/cooling)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add Chiller plant metrics — CHW supply/return temps, flow rate, chiller COP, condenser pressure | High |
| 2 | Sensor | Add Cooling tower stats — approach temp, basin level, blow-down rate, fan speed | Medium |
| 3 | Sensor | Glycol/refrigerant level indicator per CRAC | High |
| 4 | Sensor | Airflow (CFM) per CRAC — not just fan % | Medium |
| 5 | Sensor | Condenser water inlet/outlet temperature for water-cooled units | Medium |
| 6 | Sensor | Raised floor tile differential pressure — 0.04–0.08 in. W.C. target range | High |
| 7 | Sensor | Hot/cold aisle containment breach indicator — door open, blanking panels | Medium |
| 8 | Sensor | Chilled water flow rate (GPM) and heat rejection kW | Medium |
| 9 | Visual | COP trend chart over time per unit (currently only static value) | High |
| 10 | Visual | Fleet summary: add total fleet airflow (CFM) tile | Medium |
| 11 | Visual | Add cooling efficiency vs. IT load scatter/trend chart | Medium |
| 12 | Info | Predictive filter replacement — estimated days until change-out based on dP rate of rise | Medium |
Power (/power)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add Generator status section — active/standby, fuel %, last test date, load kW | High |
| 2 | Sensor | Add ATS/STS transfer switch status — which feed active (Utility A/B), transfer time | High |
| 3 | Sensor | Add PDU branch circuit monitoring — per-phase kW, amps, trip status | High |
| 4 | Sensor | Power quality metrics — THD, voltage sag/swell events, neutral current | Medium |
| 5 | Sensor | Busway / overhead busbar load per tap-off box | Medium |
| 6 | Sensor | Utility metering — grid import kW, tariff period, cost/kWh, monthly kWh | Medium |
| 7 | Sensor | Phase imbalance per panel/UPS — flag >5% imbalance | High |
| 8 | Visual | UPS cards: add input voltage/frequency per phase, bypass mode status | Medium |
| 9 | Info | Add power redundancy level indicator — N, N+1, 2N — highlight single points of failure | High |
| 10 | Info | Annualised energy cost projection alongside kWh | Low |
Environmental (/environmental)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add Dew point derived value per room — approaching supply temp = condensation risk | High |
| 2 | Sensor | Add Water/leak detection sensors map — floor, under-floor, drip trays, pipe runs | High |
| 3 | Sensor | Smoke detector / VESDA status panel — aspirating detector alarm levels | High |
| 4 | Sensor | Raised floor pressure differential trend chart | Medium |
| 5 | Sensor | Hot aisle inlet temperature per rack row (return air) | Medium |
| 6 | Sensor | Server inlet temperature sensors from IPMI per device | Medium |
| 7 | Sensor | Particle count (ISO 14644 class) | Low |
| 8 | Visual | Heatmap: add humidity overlay toggle (currently separate chart only) | High |
| 9 | Visual | Add ASHRAE compliance table per rack — flag racks outside A1/A2 envelope | Medium |
| 10 | Visual | Add dew point vs. supply air temp chart with condensation risk zone | Medium |
| 11 | Info | Show absolute humidity (g/kg) alongside RH for ASHRAE compliance | Low |
Floor Map (/floor-map)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add leak sensor overlay — highlight tiles where water sensors are placed | High |
| 2 | Sensor | Add smoke/VESDA zone overlay | Medium |
| 3 | Sensor | Add PDU/power path overlay — show which feed (A/B) each rack is on | High |
| 4 | Visual | Add 3rd overlay: humidity | Medium |
| 5 | Visual | Add airflow arrows showing cold aisle → rack → hot aisle direction | Low |
| 6 | Visual | Show blank rack slots count on each rack tile (U available) | Medium |
| 7 | Visual | Add rack-level alarm badge as an overlay option | High |
| 8 | Visual | Add zoom/pan for larger floor plans | Medium |
| 9 | Info | Add CRAC coverage radius shading showing which racks each CRAC thermally serves | Medium |
Capacity (/capacity)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Visual | Add capacity runway chart — at current growth rate, weeks until power/cooling capacity hit | High |
| 2 | Sensor | Add U-space utilisation per rack — units occupied vs. total 42U | Medium |
| 3 | Sensor | Generator fuel capacity as a capacity dimension | Medium |
| 4 | Info | Thermal capacity per CRAC vs. current IT load — N+1 cooling margin | High |
| 5 | Info | Add growth projection input — operator enters expected kW/month to forecast capacity date | Medium |
| 6 | Visual | Cross-room comparison radar chart (Power %, Cooling %, Space %) | Medium |
| 7 | Visual | Show stranded power total in kW (not just per-rack list) | Medium |
| 8 | Sensor | Weight capacity per rack — floor load (kg/m2) | Low |
Alarms (/alarms)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add Generator alarm category (fuel low, start fail, overload) | High |
| 2 | Sensor | Add Leak alarm category with direct link to leak sensor on floor map | High |
| 3 | Sensor | Add Fire/VESDA alarm category with severity escalation | High |
| 4 | Sensor | Add Network device alarm category (switch down, link fault, LACP failure) | Medium |
| 5 | Visual | Add escalation timer — how long critical alarm unacknowledged, colour ramp | High |
| 6 | Visual | Add MTTR stat card alongside existing stat cards | Medium |
| 7 | Visual | Alarm table: add "Assigned to" column | Low |
| 8 | Visual | Add alarm suppression / maintenance window toggle | Medium |
| 9 | Info | Root cause correlation — surface linked alarms (e.g. rack temp high + CRAC fan low) | Medium |
Assets (/assets)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Per-device power draw from PDU outlet monitoring (not estimated) | High |
| 2 | Sensor | Server inlet temperature from IPMI/iDRAC per device | High |
| 3 | Sensor | Add PDUs as asset type with per-outlet monitoring | High |
| 4 | Sensor | Network device status (switch uptime, port count, active links) | Medium |
| 5 | Visual | Inventory table: add sortable columns (currently unsortable) | High |
| 6 | Visual | Add rack elevation diagram (visual U-space view) in RackDetailSheet | High |
| 7 | Visual | Add device age / warranty expiry column in inventory | Medium |
| 8 | Info | Add DCIM-style lifecycle status — Active / Decomm / Planned | Low |
| 9 | Info | Add asset import/export (CSV) for CMDB sync | Medium |
Reports (/reports)
| # | Type | Improvement | Priority |
|---|---|---|---|
| 1 | Sensor | Add energy cost report — kWh, estimated cost at tariff, month-to-date | High |
| 2 | Visual | Add PUE trend graph — 30-day rolling PUE vs. target | High |
| 3 | Visual | Add cooling efficiency (kW IT / kW cooling) over time | Medium |
| 4 | Visual | Add alarm MTTR and alarm volume trend per week | Medium |
| 5 | Info | Add scheduled report configuration — email PDF daily/weekly | Medium |
| 6 | Info | Add comparison period — this week vs. last week | Medium |
| 7 | Info | Add sustainability section — CO2e, renewable fraction, WUE | Low |
| 8 | Info | Add SLA compliance section — uptime %, incidents, breach risk | Medium |
| 9 | Info | Expand CSV exports: PDU branch data, CRAC detailed logs, humidity history | Medium |
New Pages to Build
| Page | Description | Priority |
|---|---|---|
| Generator & Power Path | ATS status, generator load, fuel level, transfer switch history | High |
| Leak Detection | Site-wide leak sensor map, sensor status, historical events | High |
| Fire & Life Safety | VESDA levels, smoke detector zones, suppression system status | High |
| Network Infrastructure | Core/edge switch health, port utilisation, link status | Medium |
| Energy & Sustainability | kWh cost, PUE trend, CO2e, WUE | Medium |
| Maintenance | Planned outages, maintenance windows, alarm suppression | Low |