Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

App Files Files Community

fbmc-chronos2 / doc /final_domain_research.md

Evgueni Poloukarov

feat: complete Phase 1 ENTSO-E asset-specific outage validation

27cb60a about 1 month ago

preview code

raw

history blame contribute delete

6.37 kB

	# Final Domain Collection Research

	## Summary of Findings

	### Available Methods in jao-py

	The `JaoPublicationToolPandasClient` class provides three domain query methods:

	1. `query_final_domain(mtu, presolved, cne, co, use_mirror)` (Line 233)
	- Final Computation - Final FB parameters following LTN
	- Published: 10:30 D-1
	- Most complete dataset (recommended for Phase 2)

	2. `query_prefinal_domain(mtu, presolved, cne, co, use_mirror)` (Line 248)
	- Pre-Final (EarlyPub) - Pre-final FB parameters before LTN
	- Published: 08:00 D-1
	- Earlier publication time, but before LTN application

	3. `query_initial_domain(mtu, presolved, cne, co)` (Line 264)
	- Initial Computation (Virgin Domain) - Initial flow-based parameters
	- Published: Early in D-1
	- Before any adjustments

	### Method Parameters

	```python
	def query_final_domain(
	mtu: pd.Timestamp, # Market Time Unit (1 hour, timezone-aware)
	presolved: bool = None, # Filter: True=binding, False=non-binding, None=ALL
	cne: str = None, # CNEC name keyword filter (NOT EIC-based!)
	co: str = None, # Contingency keyword filter
	use_mirror: bool = False # Use mirror.flowbased.eu for faster bulk download
	) -> pd.DataFrame
	```

	### Key Findings

	1. DENSE Data Acquisition:
	- Set `presolved=None` to get ALL CNECs (binding + non-binding)
	- This provides the DENSE format needed for Phase 2 feature engineering

	2. Filtering Limitations:
	- ❌ NO EIC-based filtering on server side
	- ✅ Only keyword-based filters (cne, co) available
	- Solution: Download all CNECs, filter locally by EIC codes

	3. Query Granularity:
	- Method queries 1 hour at a time (mtu = Market Time Unit)
	- For 24 months: Need 17,520 API calls (1 per hour)
	- Alternative: Use `use_mirror=True` for whole-day downloads

	4. Mirror Option (Recommended for bulk collection):
	- URL: `https://mirror.flowbased.eu/dacc/final_domain/YYYY-MM-DD`
	- Returns full day (24 hours) as CSV in ZIP file
	- Much faster than hourly API calls
	- Set `use_mirror=True` OR set env var `JAO_USE_MIRROR=1`

	5. Data Structure (from `parse_final_domain()`):
	- Returns pandas DataFrame with columns:
	- Identifiers: `mtu` (timestamp), `tso`, `cnec_name`, `cnec_eic`, `direction`
	- Contingency: `contingency_*` fields (nested structure flattened)
	- Presolved field: Indicates if CNEC is binding (True) or redundant (False)
	- RAM breakdown: `ram`, `fmax`, `imax`, `frm`, `fuaf`, `amr`, `lta_margin`, etc.
	- PTDFs: `ptdf_AT`, `ptdf_BE`, ..., `ptdf_SK` (12 Core zones)
	- Timestamps converted to Europe/Amsterdam timezone
	- snake_case column names (except PTDFs)

	### Recommended Implementation for Phase 2

	Option A: Mirror-based (FASTEST):
	```python
	def collect_final_domain_sample(
	start_date: str,
	end_date: str,
	target_cnec_eics: list[str], # 200 EIC codes from Phase 1
	output_path: Path
	) -> pl.DataFrame:
	"""Collect DENSE CNEC data for specific CNECs using mirror."""

	client = JAOClient() # With use_mirror=True

	all_data = []
	for date in pd.date_range(start_date, end_date):
	# Query full day (all CNECs) via mirror
	df_day = client.query_final_domain(
	mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
	presolved=None, # ALL CNECs (DENSE!)
	use_mirror=True # Fast bulk download
	)

	# Filter to target CNECs only
	df_filtered = df_day[df_day['cnec_eic'].isin(target_cnec_eics)]
	all_data.append(df_filtered)

	# Combine and save
	df_full = pd.concat(all_data)
	pl_df = pl.from_pandas(df_full)
	pl_df.write_parquet(output_path)

	return pl_df
	```

	Option B: Hourly API calls (SLOWER, but more granular):
	```python
	def collect_final_domain_hourly(
	start_date: str,
	end_date: str,
	target_cnec_eics: list[str],
	output_path: Path
	) -> pl.DataFrame:
	"""Collect DENSE CNEC data hour-by-hour."""

	client = JAOClient()

	all_data = []
	for date in pd.date_range(start_date, end_date, freq='H'):
	try:
	df_hour = client.query_final_domain(
	mtu=pd.Timestamp(date, tz='Europe/Amsterdam'),
	presolved=None # ALL CNECs
	)
	df_filtered = df_hour[df_hour['cnec_eic'].isin(target_cnec_eics)]
	all_data.append(df_filtered)
	except NoMatchingDataError:
	continue # Hour may have no data

	df_full = pd.concat(all_data)
	pl_df = pl.from_pandas(df_full)
	pl_df.write_parquet(output_path)

	return pl_df
	```

	### Data Volume Estimates

	Full Download (all ~20K CNECs):
	- 20,000 CNECs × 17,520 hours = 350M records
	- ~27 columns × 8 bytes/value = ~75 GB uncompressed
	- Parquet compression: ~10-20 GB

	Filtered (200 target CNECs):
	- 200 CNECs × 17,520 hours = 3.5M records
	- ~27 columns × 8 bytes/value = ~750 MB uncompressed
	- Parquet compression: ~100-150 MB

	### Implementation Strategy

	1. Phase 1 complete: Identify top 200 CNECs from SPARSE data
	2. Extract EIC codes: Save to `data/processed/critical_cnecs_eic_codes.csv`
	3. Test on 1 week: Validate DENSE collection with mirror
	```python
	# Test: 2025-09-23 to 2025-09-30 (8 days)
	# Expected: 200 CNECs × 192 hours = 38,400 records
	```
	4. Collect 24 months: Using mirror for speed
	5. Validate DENSE structure:
	```python
	unique_cnecs = df['cnec_eic'].n_unique()
	unique_hours = df['mtu'].n_unique()
	expected = unique_cnecs * unique_hours
	actual = len(df)
	assert actual == expected, f"Not DENSE! {actual} != {expected}"
	```

	### Advantages of Mirror Method

	- ✅ Faster: 1 request/day vs 24 requests/day
	- ✅ Rate limit friendly: 730 requests vs 17,520 requests
	- ✅ More reliable: Less chance of timeout/connection errors
	- ✅ Complete days: Guarantees all 24 hours present

	### Next Steps

	1. Add `collect_final_domain_dense()` method to `collect_jao.py`
	2. Test on 1-week sample with target EIC codes
	3. Validate DENSE structure and data quality
	4. Run 24-month collection after Phase 1 complete
	5. Use DENSE data for Tier 1 & Tier 2 feature engineering

	---

	Research completed: 2025-11-05
	jao-py version: 0.6.2
	Source: C:\Users\evgue\projects\fbmc_chronos2\.venv\Lib\site-packages\jao\jao.py