Connect to data sources behind a firewall in Mircosoft Fabric

Intro

Many large organisations have a strategy to keep there data secure. Part of this strategy is to have firewall rules enabled on all data source. This complicates making a connection to an Azure data source in from Microsoft Fabric or Power BI.

This article describes the options available to connect to a data source in Azure tha have the Firewall rules enabled.

Microsoft Provided Gateways

By default in Microsoft Fabric connections are made using the Microsoft provided Gateways. For Fabric Workspaces to have access to data sources using the Microsoft provided Gateway, you have to allow traffic from an extensive list of IP addresses of gateways hosted by Microsoft.

However, this gives access to every tenant in every region from within Microsoft Fabric and Power BI. With the introduction of Python notebooks, this is has turned into a risk. Now it’s very easy to write Python scripts for brute force attacks on your data sources by anyone who has a Microsoft tenant, including free development and trail tenants.

Connection categories

Withing Fabric, basically  there are two categories of connections. The first are connections that are centrally managed and visible under “Manage Connections and Gateways” and secondly there are connections that individually created for each connection separately.

Centrally managed connections

These are the ‘traditional’ Power BI connections that show up under “Manage Connections and Gateways” and can be managed from here. These connections can be reused in multiple Fabric and Power BI items. The main advantage of these connections is that the credentials are centrally stored and do not need to be provided every time a data sources is connected to a Fabric of Power BI Item.

Fabric connections

The second category of connections are the connections that are made from within Fabric items that do not use the centrally managed connections. For these category of connections the credentials and connections details have to be provided each time a connection is set up or created. These are typically connections made from a spark notebook, lakehouse or warehouse.

Centrally managed connections

Connection TypeKey FeaturesLimitations
Cloud connection Shared cloud connectionTraditional connections Data sources in Azure Data sources accessible via internet Microsoft provided gatewaysOnly connect to public endpoint
Firewall settings on data source can prevent connections
On-prem data gatewaySelf-hosted (on-premises, personal of VM) On-prem data sources Data sources accessible via internet Support for custom drivers and ODBCServer must be AD Domain joined Data source on Azure must connect To public endpoint or private endpoint via routing
VNet data gatewayProvides a secure connection Can connect to private endpointsPremium feature  

1 Cloud connections

Cloud connections are the ‘traditional’ Power BI connections made from a semantic model when no on-prem or VNet data gateway is specified. As mentioned before these connections are made by using the Microsoft provided gateways and are not to be used to connect to firewall enabled data sources.

They can be used to connect to SaaS solutions like SharePoint and Dataverse as SaaS solutions typically have no firewall enabled.

There are two kinds of cloud connections:

  • Cloud connection: Only the owner of the connection can use and edit the connection.
  • Shared Cloud connection: Connection can be shared with other users, groups or service principals.

When working in a team the Shared Cloud connection should be used. The connection remains intact when other team members edit or republish the semantic model.

2 On-prem data gateway

Connections can be made on an on-prem data gateway. Originally the on-prem data gateway is introduced for connecting to on-prem data sources like an SQL Server or Oracle server, it can also be used to connect to data sources in the cloud.

3 VNet data gateway

A VNet data gateway is a gateway hosted as a cloud resource. The advantage is that it gives a highest degree of isolation for the data source. The VNet data gateway is running in the cloud and does not have the latency involved when using the on-prem data gateway bapps-gateway-03. A VNET gateway can be used to connect to the service endpoint or a private endpoint of the data source.

Setting up a VNet data gateway requires some set up as described on the page on the right: How to create a Virtual Network (VNet) data gateway.

For data sources that have the public endpoint disabled, the VNet data gateway is the only option for the centrally managed connections. Using the VNet data gateway requires the VNET data gateway and the workspace to be on premium or Fabric capacity.

How to create a Virtual Network (VNet) data gateway

Fabric Connections

Connection TypeKey FeaturesLimitations
Service Tag  Firewall setting on VNET of data source Configure in Networks Security Group Azure Firewall Can be used by all Fabric experiencesGives access to the Microsoft Fabric workloads from all tenants in all regions Only be used as a last option and in combination with other security measures  
Workspace identity &
Trusted workspace access
Firewall setting on data source Allow traffic from the defined workspace(s) Works in combination with shared cloud connectionsFabric capacity (F64 or higher) Can only connect to Storage account Connects to public endpoint Can only be used by Shortcuts Fabric Data factory – Copy data
Managed private endpointProvides a secure connection Reroute cloud connections to private endpointsFabric capacity (F64 or higher) Can only be used by spark workloads In batch operations, hardcoded credentials are needed

1 Service Tag

The individual Fabric connections are also made by using the Microsoft provided gateway. One way of allowing these connections to a data source is configuring the Service Tag “PowerBI”. The Service Tag has to be defined on a Network Security Group (NSG). A Service Tag is actually a shortcut for the long list of IP addresses mentioned above and will also give access to all tenants in all region to the data source. For this reason using Service Tags is not a secure way to give access to data sources.

Unfortunately, when dealing with connections of the category “Individual Fabric Connections”  the available options to connect to a firewall enabled data source are limited. First try to import the data by using one of the available options. If, as a last resort, a Service Tag is used to give access to a data source this has to be combined with additional measures to prevent unauthorized access to the data source.

2 Workspace identity & Trusted workspace access

Workspace identity & Trusted workspace access is not a connection but the ability to route a shared cloud connection through a firewall. For now this is only supported by:

  • ADSL gen 2 storage account

Not all Fabric Items can use / create a connection of this type. For now it is limited to:

  • Lakehouse Shortcuts
  • Fabric Data Factory – Copy data
  • T-SQL COPY INTO statement

To set up a connection of this type both Workspace identity on the workspace and Trusted workspace access on the storage has to be enabled. After that a shared cloud connection will be able to pass the firewall.

To create and use a Workspace identity, the workspace has to be on a Fabric Capacity of F64 or higher. Trusted Workspace access connects to the public endpoint of the storage account. When the workspace is removed from the Fabric capacity of moved to a capacity of a SKU lower than F64, connections over Workspace identity & Trusted workspace access will stop working.

Connections running over Workspace Identity & Trusted workspace are the odd one out. Although it is a typical connection for Fabric workloads, it also make use of a centrally managed connection. The connection can only partly be managed in under “Manage Connections and Gateways”. Some settings can be changes, but the connection will show offline when performing the check. Only when the connection is initiated from the workspace itself will it run over Trusted Workspace access and bypass the firewall.

3 Managed Private Endpoint

Managed Private Endpoints created a Private Endpoint to the data source and can be created for any data source that supports managed Private Endpoints. In Fabric however, only spark workloads can use a connection over a managed private endpoint. This limits the use to spark notebooks and spark job definitions.

When using spark notebooks in batch mode, the credentials of the owner of the notebook are used. For connections to data sources that do not provide credentials, the credentials of the owner are used. When your organization has enabled MFA for user accounts, the usability of the connection to data sources in batch mode is limited. It might continue to work for some time, but ultimately MFA wil be required and the connection wil stop working . Only when using notebooks interactively, a reliable connection to an Azure Key Vault can be made without explicit storing credentials.

To create and use managed private endpoint, the workspace has to be on a Fabric Capacity of F64 or higher. When the workspace is removed from the Fabric capacity or moved to a capacity of a SKU lower than F64, connections over a managed private endpoint will stop working.

Choosing the right connection

Option to connect to an Azure Data source with firewall enabled from Fabric workloads

Fabric workloadAvailable connectionsManaged ConnectionConsiderations
Semantic model
Dataflow (gen2)
On-Prem data gateway VNet data gatewayYes 
Lakehouse Shortcut to ADSL gen2Workspace identity &
Trusted workspace access
Yes *)Requires F64 capacity
The shortcut can be used by other Fabric Workloads to read data from ADSL gen2
Fabric Data Factory – Copy dataOn-Prem data gateway VNet data gatewayYes  Copy Data activity is better at handling large data volumes than a dataflow gen2
Fabric Data Factory – Copy dataWorkspace identity &
Trusted workspace access
Yes *)Only connects to ADSL gen2
Spark jobsManaged private endpointNoRequires F64 capacity
For batch processing hardcoded credentials are needed

*) Shared cloud connection that can only be used by the workload it was created in.

Leave a Comment

Your email address will not be published. Required fields are marked *