Latency Optimization in High-Frequency Trading
Why Latency Matters
In algorithmic trading, latency isn't just a performance metric—it's a competitive advantage. The difference between 50ms and 150ms can mean:
- Missing arbitrage opportunities that exist for 100-500ms
- Worse execution prices due to market movement
- Reduced ability to react to flash crashes
- Lower signal-to-noise ratio in microstructure analysis
Our target: Sub-150ms p95 latency end-to-end (market data → signal → order execution)
Our Latency Budget
Breaking down our 150ms target:
| Component | Target | Actual (p95) |
|---|---|---|
| Market data ingestion | 20ms | <10ms |
| Order book processing | 30ms | 15-20ms |
| Signal generation | 40ms | 25-35ms |
| Risk checks | 20ms | 10-15ms |
| Order routing | 40ms | 30-40ms |
| Total | 150ms | 90-120ms |
We maintain 30ms+ headroom for spikes and degradation under load.
WebSocket Streams: Sub-Millisecond Updates
Why WebSockets Over Polling
HTTP Polling (traditional approach):
- Poll exchange every 100-500ms
- Miss trades between polls
- Server overhead from constant requests
- Typical latency: 200-1000ms
WebSocket Streams (our approach):
- Push-based: exchange sends updates instantly
- No missed data
- Persistent connection, lower overhead
- Typical latency: <10ms p95
Multi-Exchange Aggregation
We maintain WebSocket connections to 10+ exchanges simultaneously, consolidating order books in real-time for arbitrage detection and best execution routing.
Event-Driven Architecture
Asynchronous Processing
Instead of blocking sequential execution, we use async/await patterns:
```python async def trading_pipeline(): # Fetch data and run risk prechecks in parallel data_task = asyncio.create_task(fetch_market_data()) risk_task = asyncio.create_task(precheck_limits())
data, risk_status = await asyncio.gather(data_task, risk_task)
if risk_status.ok:
signals = await generate_signals(data)
await execute_orders(signals)
Total: ~80ms with parallelization vs 100ms sequential
```
Signal Generation Optimization
Columnar Storage for Speed
Using Apache Arrow + Polars for lightning-fast time-series operations:
- 10-100x faster than Pandas
- SIMD vectorization for calculations
- Zero-copy data sharing
- Lazy evaluation for efficiency
```python import polars as pl
Fast VWAP calculation
df = pl.scan_parquet("trades/*.parquet") vwap = ( df.filter(pl.col("timestamp") > cutoff) .select([ (pl.col("price") * pl.col("volume")).sum() / pl.col("volume").sum() ]) .collect() # Execute in parallel )
Typical time: <10ms for 1M rows
```
Cached Computations
Pre-computing expensive calculations and caching with 60s TTL:
- Cache hit: <1ms
- Cache miss: 20-30ms
- Hit rate: 85-90% in production
Order Execution Speed
Direct WebSocket Order Placement
Instead of REST API calls, we use WebSocket connections for order placement:
- REST API: 50-100ms latency
- WebSocket: 30-80ms latency
- Reduction: 20-40ms per order
Smart Order Routing
We route to the fastest available exchange with sufficient liquidity:
```python def select_exchange(symbol, liquidity_needed): candidates = [] for exchange in active_exchanges: if exchange.get_liquidity(symbol) >= liquidity_needed: candidates.append({ 'exchange': exchange, 'latency': exchange.avg_latency_p95, 'fee': exchange.taker_fee })
# Sort by latency, then fee
return sorted(candidates, key=lambda x: (x['latency'], x['fee']))[0]
```
Production Performance Metrics
Our live system achieves:
- ✅ Order book update latency: <10ms p95
- ✅ Signal generation: <35ms end-to-end
- ✅ Order execution: <100ms to exchange
- ✅ Total round-trip: 90-120ms p95
Comparison to Industry
| Platform Type | Typical Latency | Our Latency | Advantage |
|---|---|---|---|
| Retail bots | 500-2000ms | 90-120ms | 4-20x faster |
| Pro platforms | 200-500ms | 90-120ms | 2-5x faster |
| HFT firms (co-located) | 1-50ms | N/A | Different league |
We're dramatically faster than retail/pro platforms without requiring expensive co-location.
Real-World Impact
Arbitrage Capture
With 90-120ms latency:
- ✅ Capture 60-70% of arbitrage opportunities (exist for 100-500ms)
- ✅ Execute before prices converge
- ✅ Sufficient speed for cross-exchange strategies
Flash Crash Response
During flash crashes:
- Market drops 5-10% in seconds
- Our system detects and responds in <500ms
- Cancels orders, flattens positions, waits for stability
- Re-enters at discounted prices opportunistically
Microstructure Signal Quality
Faster latency = fresher data:
- Order flow imbalance (OFI) calculations use <50ms old data
- VPIN calculated on near-real-time volume buckets
- Microprice reflects current market conditions
Continuous Monitoring
We track latency metrics 24/7 with alerting:
- Warning: >100ms p95 (any component)
- Critical: >150ms p95 (triggers investigation)
- Automatic degradation: Reduce trading frequency if latency spikes
For more technical details, see our system architecture page.