Alex Merced's Data, Dev and AI Blog

Connect Vertica to Dremio Cloud: Federation for Analytics-Optimized Data

2026-03-01 | 10 min read | Dremio, Connectors, Data Integration

Cover Image

Vertica is a columnar analytics database engineered for fast aggregate queries on large datasets. It was built from the ground up for analytical workloads — column-oriented storage, massively parallel processing, and automatic database design optimization. Organizations running Vertica typically have years of investment in analytics infrastructure: curated schemas, optimized projections, and sophisticated workloads that depend on Vertica's high-performance query engine.

But Vertica has limitations that become more painful as data ecosystems grow. Licensing costs scale with data volume. Federation with non-Vertica sources requires complex ETL. And connecting Vertica data to modern cloud tools, AI platforms, and cross-cloud architectures requires exporting data or building custom connectors.

Dremio Cloud connects to Vertica and queries it alongside your other data sources. Dremio's predicate pushdowns leverage Vertica's columnar engine for filtering and aggregation, while Reflections cache results to reduce ongoing Vertica compute load. You keep Vertica for what it does well and extend its reach to every other system in your organization.

Why Vertica Users Need Dremio

Reduce Vertica License Costs

Vertica's licensing model ties cost to data volume and node count. Every analytical query consumes cluster resources. As your data grows and more teams want access, the cost of scaling Vertica becomes significant. Dremio's Reflections provide an alternative: pre-compute the results of your most common queries and serve them from Dremio's cache instead of hitting Vertica on every request. Dashboard queries, scheduled reports, and ad-hoc exploration can all be served from Reflections, reducing the compute pressure on your Vertica cluster.

Federate with Cloud Sources

Vertica excels at analytical queries on its own data, but your organization's data lives in many places: S3 data lakes, PostgreSQL application databases, Snowflake cloud warehouses, MongoDB document stores. Without a federation layer, combining these with Vertica data requires ETL pipelines that extract from each source, transform, and load into Vertica. Dremio queries each source in place and joins the results — no data movement needed.

Modernize Without a Big-Bang Migration

Migrating away from Vertica is a large, risky project. Dremio lets you gradually shift analytical workloads. Start by querying Vertica through Dremio alongside new cloud-native sources (Apache Iceberg tables, S3 data lakes). As confidence grows, migrate specific datasets from Vertica to Iceberg tables in Dremio's Open Catalog, where they benefit from automated maintenance and lower storage costs. The migration happens incrementally, and Vertica continues serving critical workloads throughout.

Unified Governance

Vertica has its own access control, but it doesn't extend to your other data sources. Dremio's Fine-Grained Access Control applies consistent column masking and row-level filtering across Vertica, PostgreSQL, S3, and every other connected source from a single governance layer.

Prerequisites

  • Vertica hostname or IP address — the coordinator node of your Vertica cluster
  • Port — Vertica defaults to 5433
  • Database name
  • Username and password — a Vertica user with SELECT privileges on the tables you want to query
  • Network access — port 5433 must be reachable from Dremio Cloud
  • Dremio Cloud accountsign up free for 30 days with $400 in compute credits

Step-by-Step: Connect Vertica to Dremio Cloud

1. Add the Vertica Source

In the Dremio console, click the "+" button in the left sidebar and select Vertica from the database source types.

2. Configure General Settings

  • Name: A descriptive identifier (e.g., analytics-vertica or web-analytics). This name appears in SQL queries as the source prefix.
  • Host: Your Vertica coordinator host.
  • Port: Default 5433.
  • Database: The Vertica database name.

3. Set Authentication

Provide the username and password for a Vertica user with read access. You can also use Secret Resource URL for password management through AWS Secrets Manager.

4. Configure Advanced Options

Setting Purpose Default
Record fetch size Rows per batch from Vertica 200
Maximum Idle Connections Idle connection pool size 8
Connection Idle Time (s) Seconds before idle connections close 60
Connection Properties Custom JDBC parameters None

5. Set Reflection and Metadata Refresh

Configure how often Reflections refresh (re-query Vertica) and how often Dremio checks for new tables or schema changes. Click Save.

Query Vertica Data from Dremio

SELECT device_type, COUNT(*) AS sessions, AVG(session_duration_seconds) AS avg_duration, 
  SUM(page_views) AS total_page_views
FROM "analytics-vertica".web.sessions
WHERE session_date >= '2024-01-01' AND session_date < '2024-07-01'
GROUP BY device_type
ORDER BY sessions DESC;

Dremio pushes the date filter and aggregation to Vertica's columnar engine, which processes them efficiently against its compressed, column-oriented storage.

Federate Vertica with Other Sources

-- Join Vertica web analytics with PostgreSQL CRM and S3 marketing data
SELECT
  c.customer_segment,
  COUNT(v.session_id) AS total_sessions,
  AVG(v.session_duration_seconds) AS avg_session_duration,
  COUNT(DISTINCT v.user_id) AS unique_visitors,
  SUM(s3.ad_spend) AS marketing_spend,
  ROUND(COUNT(v.session_id) / NULLIF(SUM(s3.ad_spend), 0) * 1000, 2) AS sessions_per_thousand_dollars
FROM "analytics-vertica".web.sessions v
JOIN "postgres-crm".public.customers c ON v.user_id = c.customer_id
LEFT JOIN "s3-marketing".campaigns.spend_by_segment s3 ON c.customer_segment = s3.segment
WHERE v.session_date >= '2024-01-01'
GROUP BY c.customer_segment
ORDER BY total_sessions DESC;

Vertica handles the session aggregation, PostgreSQL handles the customer lookup, and Dremio handles the cross-source join.

Build a Semantic Layer

CREATE VIEW analytics.gold.web_performance AS
SELECT
  v.device_type,
  v.session_date,
  COUNT(*) AS sessions,
  AVG(v.session_duration_seconds) AS avg_duration_seconds,
  SUM(v.page_views) AS total_page_views,
  SUM(CASE WHEN v.converted = true THEN 1 ELSE 0 END) AS conversions,
  ROUND(SUM(CASE WHEN v.converted = true THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS conversion_rate_pct,
  CASE
    WHEN AVG(v.session_duration_seconds) > 300 THEN 'High Engagement'
    WHEN AVG(v.session_duration_seconds) > 120 THEN 'Moderate Engagement'
    ELSE 'Low Engagement'
  END AS engagement_tier
FROM "analytics-vertica".web.sessions v
GROUP BY v.device_type, v.session_date;

Navigate to the Catalog, click Edit (pencil icon) on the view, and Generate Wiki and Generate Tags. This creates the business context that powers AI features.

AI-Powered Analytics on Vertica Data

Dremio AI Agent

The built-in AI Agent lets users ask questions about your Vertica data in plain English. Instead of writing complex analytical SQL, a marketing manager can ask "What's our conversion rate on mobile this quarter?" The Agent reads the wiki descriptions attached to your views, understands what "conversion rate" and "mobile" mean in your data, and generates the correct SQL.

The quality of the AI Agent's responses depends directly on the quality of your semantic layer. Wikis that explain "conversion_rate_pct is the percentage of web sessions that resulted in a purchase" produce better results than technical column names alone.

Dremio MCP Server

The Dremio MCP Server extends AI capabilities to external chat clients. Connect Claude or ChatGPT to your Dremio data through the hosted MCP Server with OAuth authentication:

  1. Create a Native OAuth application in Dremio Cloud
  2. Configure redirect URLs for your AI client
  3. Connect using mcp.dremio.cloud/mcp/{project_id}

Now your team can ask Claude "Analyze our web engagement trends from Vertica data this quarter" and get accurate, governed results — without writing SQL or accessing Vertica directly.

AI SQL Functions

Use AI SQL functions directly in queries to enrich Vertica data:

-- Classify web sessions by potential value
SELECT
  session_id,
  device_type,
  page_views,
  session_duration_seconds,
  AI_CLASSIFY(
    'Based on this browsing behavior, classify the user intent',
    'Device: ' || device_type || ', Pages: ' || CAST(page_views AS VARCHAR) || ', Duration: ' || CAST(session_duration_seconds AS VARCHAR) || 's',
    ARRAY['Purchase Intent', 'Research', 'Browsing', 'Bounced']
  ) AS predicted_intent
FROM "analytics-vertica".web.sessions
WHERE session_date = CURRENT_DATE;

AI_CLASSIFY runs LLM inference inside your SQL query, classifying each web session from Vertica data into intent categories. AI_GENERATE can produce narrative summaries, and AI_SIMILARITY can find semantic matches between text fields.

Accelerate Vertica Queries with Reflections

Create Reflections on your most frequently queried views:

  1. Navigate to the view in the Catalog
  2. Click the Reflections tab
  3. Choose Raw Reflection (full cache) or Aggregation Reflection (pre-computed metrics)
  4. Select columns and aggregations
  5. Set the Refresh Interval — for Vertica data that updates daily, daily refresh works; for real-time dashboards, match the refresh to your SLA
  6. Click Save

BI tools connected via Arrow Flight or ODBC get sub-second response times from Reflections, even though the underlying data lives in Vertica. A conversion analytics dashboard that queries Vertica 96 times per day with a daily Reflection refresh consumes Vertica resources only once — a 99% reduction in cluster load.

Governance Across Vertica and Other Sources

Dremio's Fine-Grained Access Control (FGAC) provides governance that extends beyond Vertica to every connected source:

  • Column masking: Mask conversion rates, revenue data, or user identifiers from specific roles. A product manager sees engagement metrics but not raw revenue.
  • Row-level filtering: Restrict data visibility based on user roles. Regional teams see only their region's data automatically.
  • Unified policies: Same governance applies across Vertica, PostgreSQL, S3, BigQuery, and all other sources — no per-database policy management.

These policies apply across SQL Runner, BI tools (Arrow Flight/ODBC), AI Agent, MCP Server, and Arrow Flight clients.

Connect BI Tools via Arrow Flight

Arrow Flight provides 10-100x faster data transfer than JDBC/ODBC:

  • Tableau: Dremio connector for direct Arrow Flight access
  • Power BI: Dremio ODBC driver or native connector
  • Python/Pandas: pyarrow.flight for programmatic data access to Vertica analytics
  • Looker: Connect via JDBC
  • dbt: dbt-dremio for SQL-based transformations

All queries benefit from Reflections, governance, and the semantic layer.

VS Code Copilot Integration

Dremio's VS Code extension with Copilot integration lets developers query Vertica data from their IDE. Ask Copilot "Show me conversion rates by device type from web analytics" and get SQL generated from your semantic layer — without switching to the Dremio console.

When to Keep Data in Vertica vs. Migrate to Iceberg

Keep in Vertica: Active analytical workloads optimized with Vertica projections, data with complex Vertica-specific features (database designer optimizations, flex tables), workloads that depend on Vertica's sub-second response times for real-time dashboards.

Migrate to Iceberg: Historical data and archives, datasets consumed by non-Vertica tools, data where Vertica licensing cost per TB exceeds the analytical value, datasets that benefit from time travel and automated compaction.

For data that stays in Vertica, create manual Reflections to reduce query load. For migrated data, Dremio's Open Catalog provides automated compaction, time travel, and Autonomous Reflections at a fraction of the per-TB cost.

Vertica Deployment Modes and Dremio

Vertica has two deployment modes, both compatible with Dremio:

Enterprise Mode (On-Premises)

Traditional deployment with local storage. Dremio connects via JDBC and pushes SQL operations to Vertica's engine when possible. Reflections are particularly valuable here — they offload analytical queries and reduce the on-premises compute needed.

EON Mode (Cloud-Optimized)

Vertica's compute-storage separation architecture on AWS, Azure, or GCP. Dremio connects the same way, but EON mode's elastic compute makes Reflections' cost-saving impact even more significant — when Dremio serves cached results, EON subclusters can scale down.

Vertica-Specific SQL Considerations

Dremio handles most Vertica SQL natively. For Vertica-specific syntax:

  • Projections: Vertica projections are transparent to Dremio — Vertica automatically uses optimal projections for queries pushed down
  • Flex tables: Dremio reads flex table columns as VARCHAR — cast to appropriate types in your Dremio views
  • COPY LOCAL: Not available through Dremio — use Dremio's own CREATE TABLE AS SELECT for data loading
  • Vertica ML functions: Use external queries for Vertica-specific ML functions: SELECT * FROM TABLE("vertica-analytics".EXTERNAL_QUERY('SELECT PREDICT_LINEAR...'))

Migration ROI Example

A mid-sized organization with 50TB in Vertica Enterprise:

  • Current cost: ~$500K/year in Vertica licensing (per-TB pricing)
  • Migrate 30TB of historical data to Iceberg: Eliminates 60% of licensed data volume
  • Remaining 20TB in Vertica: Active analytical workloads, protected by Reflections
  • Net result: Potential 40-60% reduction in Vertica licensing costs, with improved analytics capabilities (AI, federation, governance) on all data

Get Started

Vertica users can reduce licensing pressure, federate with cloud sources, modernize incrementally, and add AI analytics — all through Dremio Cloud. Connect your Vertica cluster to Dremio, create Reflections on your most-queried tables, and start tracking the reduction in Vertica query load as Dremio serves cached results.

Try Dremio Cloud free for 30 days and connect your Vertica cluster alongside your other data sources.

Check out these related posts:

📬 Join the Mailing List

Get updates directly to your inbox.

Subscribe Now

Menu

Search