Back to Blog
TechnicalArchitectureTutorial

Understanding Federated Queries in Industrial Data

A deep dive into how federated query execution works and why it's the key to unlocking insights from distributed OT systems.

Suman GajavellyJanuary 14, 20264 min read
Understanding Federated Queries in Industrial Data

One of the core innovations in Conduit is our federated query engine. In this post, we'll explore what federated queries are, how they work, and why they're essential for modern industrial data architectures.

The Traditional Approach: Centralization

Historically, when organizations wanted to analyze data from multiple systems, they followed a predictable pattern:

  1. Extract data from source systems
  2. Transform it into a common format
  3. Load it into a central data warehouse or lake

This ETL (Extract, Transform, Load) approach has been the standard for decades. But in industrial environments, it creates significant challenges:

  • Latency: Batch ETL means your "current" data is always hours or days old
  • Cost: Moving and storing petabytes of time-series data is expensive
  • Governance: Duplicated data creates compliance and security concerns
  • Maintenance: ETL pipelines are fragile and require constant attention

Enter Federated Queries

Federated query execution flips this model on its head. Instead of moving data to where the query runs, we move the query to where the data lives.

How It Works

When you submit a query to Conduit, here's what happens:

1. Parse the query and identify required data sources
2. Generate optimized sub-queries for each source system
3. Execute sub-queries in parallel against source systems
4. Stream results back to the federation layer
5. Merge, correlate, and return unified results

Let's walk through a concrete example.

Example: Cross-System Correlation

Suppose you want to correlate temperature readings from your PI historian with alarm events from Ignition:

SELECT
  p.timestamp,
  p.temperature,
  a.alarm_type,
  a.severity
FROM pi.temperatures p
JOIN ignition.alarms a
  ON p.asset_id = a.asset_id
  AND p.timestamp BETWEEN a.start_time AND a.end_time
WHERE p.timestamp > NOW() - INTERVAL '24 hours'

Conduit breaks this into two parallel operations:

Sub-query 1 (PI):

SELECT timestamp, temperature, asset_id
FROM temperatures
WHERE timestamp > NOW() - INTERVAL '24 hours'

Sub-query 2 (Ignition):

SELECT alarm_type, severity, asset_id, start_time, end_time
FROM alarms
WHERE start_time > NOW() - INTERVAL '24 hours'

These execute simultaneously. Results stream back to Conduit, where the join operation correlates records by asset and time window.

Query Optimization

Naive federation would be slow. The key to performance is intelligent query planning:

Predicate Pushdown

Filter conditions are pushed to source systems, reducing data transfer:

Original: SELECT * FROM pi.temps WHERE value > 100
Pushed:   PI executes "value > 100" filter locally

Projection Pruning

Only requested columns are retrieved:

Original: SELECT temperature FROM pi.readings
Pruned:   PI returns only temperature column, not all 50 columns

Join Reordering

Joins are executed in the optimal order to minimize intermediate result sizes.

Parallel Execution

Independent sub-queries execute in parallel across source systems.

Handling Heterogeneous Data

Industrial systems store data differently:

  • Historians use time-series models (timestamp, tag, value)
  • SCADA uses event-driven models (state changes)
  • MES uses relational models (orders, batches, products)

Conduit's semantic layer maps these different models to a unified schema. When you query "temperature for asset X", Conduit knows:

  • In PI, this is tag T-101.PV
  • In Ignition, this is Tags/Building1/Reactor1/Temperature
  • In the SQL database, this is sensors.temperature WHERE asset_id = 'X'

Performance Characteristics

Federated queries have different performance characteristics than centralized queries:

| Aspect | Centralized | Federated | |--------|-------------|-----------| | Query latency | Lower (local data) | Higher (network hops) | | Data freshness | Batch delayed | Real-time | | Storage cost | High (copies) | Low (no copies) | | Governance | Complex | Simple |

For most operational queries, the slight latency increase is worth the benefits of real-time data and simplified architecture.

When to Use Federated Queries

Federated queries excel for:

  • Operational dashboards requiring real-time data
  • Ad-hoc analysis across multiple systems
  • Compliance queries where data residency matters
  • Integration without ETL pipelines

They're less suitable for:

  • Heavy analytics requiring repeated scans of historical data
  • Machine learning training on large datasets

For these use cases, consider using Conduit to populate a purpose-built analytics store.

Conclusion

Federated query execution is a paradigm shift in industrial data architecture. By moving queries to data instead of data to queries, organizations can get real-time insights without the cost and complexity of centralized data lakes.

Want to see federated queries in action? Request a demo and we'll show you cross-system correlation on your own data.

Want to learn more about how Conduit can transform your industrial data landscape?

Request a demo