Streamlining Data Integration: Extracting Data from SAP with AWS AppFlow via VPC Peering

Tech Community • 8 min read

The world of business is evolving at a rapid pace, and companies are constantly on the lookout for efficient solutions to streamline their operations and enhance their decision-making processes. 

One critical aspect of this digital transformation is the seamless integration of various business applications and data sources. SAP, being one of the most widely used enterprise resource planning (ERP) systems, holds a treasure trove of valuable business data. 

Extracting this data and integrating it with other applications can unlock valuable insights and drive innovation. AWS AppFlow emerges as a powerful tool in this context, providing a seamless and secure way to transfer data between SAP and various AWS services. 

When combined with VPC Peering, AWS AppFlow ensures an even more secure and efficient data transfer. This blog post delves into how businesses can leverage AWS AppFlow and VPC Peering for extracting data from SAP, ensuring a secure, reliable, and efficient integration.

Understanding the Challenge

I’ve been recently working on an interesting project for one of our clients to build a DataLake on AWS supercharged by Nordcloud Cloud Foundations (NCF) involving multiple technology stacks: Snowflake, SAP and Salesforce to name a few. 

During this very dynamic project (when requirements can change daily) we had to solve a number of challenges of retrieving data from SAP systems hosted in the managed environment (SAP HEC).

SAP systems are complex, and extracting data from them can be a daunting task, especially when aiming for a seamless and real-time integration. Businesses often grapple with issues related to data format inconsistencies, connectivity challenges, and security concerns. Additionally, the traditional ETL (Extract, Transform, Load) processes can be resource-intensive, requiring significant time and effort from IT teams.

AWS AppFlow: A Game-Changing Solution

AWS AppFlow addresses these challenges head-on, offering a user-friendly and secure platform for integrating SAP with AWS services such as Amazon S3, AWS Lambda, and various other SaaS applications like Salesforce, ServiceNow, and Slack.

Key Features of AWS AppFlow:

  1. Simplicity: With its intuitive interface, AppFlow simplifies the integration process, enabling users to create and configure flows without the need for extensive coding or specialised skills.
  2. Security: AppFlow ensures that your data is transferred securely, providing options for encryption and fine-grained access control. It also allows for VPC (Virtual Private Cloud) connectivity, ensuring that your data does not traverse the public internet.
  3. Scalability: AppFlow is built to handle large volumes of data, providing a scalable solution that grows with your business needs.
  4. Real-Time Integration: AppFlow supports event-driven architecture, allowing for real-time data integration, ensuring that your systems are always up to date.
  5. Data Transformation: AppFlow provides built-in data transformation capabilities, enabling you to cleanse, enrich, and format your data on the fly as it is transferred between systems.

Why Choose AWS AppFlow with VPC Peering for SAP Integration?

VPC Peering allows for the direct, private connection between two Virtual Private Clouds (VPCs), bypassing the public internet and enhancing both security and performance. When integrating SAP data with AWS services, utilising VPC Peering with AWS AppFlow ensures:

  1. Enhanced Security: By keeping data within the Amazon network, the risk of exposure to potential internet-based threats is significantly reduced.
  1. Consistent Network Performance: The private connection ensures stable and predictable network performance, crucial for real-time or near-real-time data integration.
  1. Data Sovereignty: For businesses with strict data residency requirements, VPC Peering ensures that data does not traverse outside the geographical boundaries defined by the organisation.
  1. Simplified Network Architecture: Integrating directly within the AWS ecosystem reduces the complexity of network configurations and management.

Getting started with AWS Workflow.

Step 1: Set Up Your AWS Environment

Before you start, ensure that you have an AWS account and that you have configured the necessary permissions for AppFlow.

Step 2: Configure SAP Connectivity

  1. In the AWS AppFlow console, navigate to the 'Connectors' section and select SAP OData.
  2. Provide the necessary connection details such as the SAP OData URL, authentication credentials, and any additional settings required for your SAP environment.

Step 3: Create a Flow

  1. Navigate to the 'Flows' section and click on 'Create flow'.
  2. Select SAP as the source and choose the AWS service or SaaS application as the destination.
  3. Configure the flow settings, including the trigger type (manual or event-driven), and map the source and destination fields.

Step 4: Transform and Map Your Data

  1. Utilise AppFlow’s transformation capabilities to format, cleanse, and enrich your data as needed.
  2. Map the SAP fields to the corresponding fields in the destination system.

Step 5: Test and Deploy

  1. Once you have configured the flow, use the ‘Test flow’ feature to ensure that everything is set up correctly.
  2. After testing, deploy the flow, and monitor its performance using AWS CloudWatch.

AWS has multiple blogs related to SAP connectivity (for example How to connect SAP solutions running on AWS with AWS accounts and services) and while we wanted to use a Transit Gateway (which is one of the core services for NCF) sadly we were not able to. 

Hence we had opted out for a VPC peering connectivity where each workload account would be connected to SAP HEC VPC via its own peering. Nordcloud Cloud Foundation does support VPC peering natively via one of its own modules however, in this particular case the peering was set manually.

Network connectivity diagram.

Once connectivity decision has been made and peering has been established (don’t forget to update the route tables) we were ready to proceed further.

Note the IICS Agent in the diagram - originally Informatica was there to play a crucial role of transferring and transforming data between various sources however the client wanted to explore other possibilities and opted out for AWS AppFlow.

Make sure that all requirements are met before you begin (especially on SAP side).

Challenges with SAP connectivity.

When we first approached the connectivity to SAP we ran into a number of challenges:

  1. We were unable to request additional services in the SAP managed account.
  2. EC2 instances running SAP applications had only internal DNS names. And while they were resolvable from the internal DNS servers (and from AWS via Router 53 
  3. If AppFlow would connect via SAP WebDispatcher - WebDispatcher would need to have a public SSL certificate (setting it up, renewing would cause an additional operation overhead)

Final Architecture and requirements.

In order for everything to work you would need (bellow also snippets from Terraform code):

  1. A public DNS Zone (you can also use Router 53 for that), for example appflow-sap.com:
resource "aws_route53_zone" "sap_appflow" {
  name = "${local.environment}.appflow-sap.com"
}
  1. A public hostname (e.g. extractor.appflow-sap.com) that will point to a private ip address of SAP HEC system:
resource "aws_route53_record" "sap" {
  zone_id = aws_route53_zone.sap_appflow.zone_id
  name    = "extractor"
  type    = "A"
  ttl     = 60
  records = [“”] # Adjust accordingly
}
  1. A public certificate that issued by AWS ACM:
resource "aws_acm_certificate" "sap_appflow" {
  domain_name       = aws_route53_record.sap.fqdn
  validation_method = "DNS"


  lifecycle {
    create_before_destroy = true
  }
}
resource "aws_acm_certificate_validation" "sap_appflow" {


  certificate_arn         = aws_acm_certificate.sap_appflow.arn
  validation_record_fqdns = [for record in aws_route53_record.sap_acm_validation : record.fqdn]
}
resource "aws_route53_record" "sap_acm_validation" {
  depends_on = [aws_acm_certificate.sap_appflow]


  for_each = {
    for dvo in aws_acm_certificate.sap_appflow.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }


  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = aws_route53_zone.sap_appflow.zone_id
  lifecycle {
    create_before_destroy = true
  }
}
  1. VPC Endpoint Service requires a Network Load Balancer which requires a number of other resources (such as Security Group):
resource "aws_security_group" "nlb_sap_vpce" {
  name   = "sg-nlb-sap-vpce"
  vpc_id =  “”# Adjust accordingly
  lifecycle {
    create_before_destroy = true
  }
}
resource "aws_security_group_rule" "nlb_sap_allow_all_outbound" {
  type              = "egress"
  description       = "Allow all outbound"
  to_port           = 0
  protocol          = "-1"
  from_port         = 0
  cidr_blocks       = ["0.0.0.0/0"] # Adjust accordingly
  security_group_id = aws_security_group.nlb_sap_vpce.id
}
resource "aws_security_group_rule" "nlb_sap_allow_tcp443" {


  type              = "ingress"
  description       = "Allow https inbound"
  to_port           = 443
  protocol          = "6"
  from_port         = 443
  cidr_blocks       = ["0.0.0.0/0"] # Adjust accordingly
  security_group_id = aws_security_group.nlb_sap_vpce.id
}
  1. Target group (note the protocol!):
resource "aws_lb_target_group" "sap" {
  name        = "nlb-target-group-sap"
  port        = 443
  protocol    = "TLS"
  target_type = "ip"
  vpc_id      = “”# Adjust accordingly
}
  1. Target Group attachments:
resource "aws_lb_target_group_attachment" "sap" {
  for_each = [“”] # Adjust accordingly


  target_group_arn  = aws_lb_target_group.sap.arn
  target_id         = each.key
  port              = 443 # Adjust accordingly
  availability_zone = "all"
}
  1. A Network Load Balancer itself:
resource "aws_lb" "sap_vpce" {
  name                       = "nlb-sap-vpce"
  internal                   = true
  load_balancer_type         = "network"
  subnets                    = [“”] # Adjust accordingly
  security_groups            = [aws_security_group.nlb_sap_vpce.id]
  enable_deletion_protection = true
}
  1. A listener:
resource "aws_lb_listener" "sap_web_dispatcher" {
  load_balancer_arn = aws_lb.sap_vpce.arn
  port              = "443"
  protocol          = "TLS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate.sap_appflow.id
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.sap.arn
  }
}
  1. VPC Endpoint itself:
resource "aws_vpc_endpoint_service" "sap_vpce" {
  acceptance_required        = false
  network_load_balancer_arns = [aws_lb.sap_vpce.arn]
  private_dns_name           = aws_route53_record.sap.fqdn
  lifecycle {
    replace_triggered_by = [aws_lb.sap_vpce.arn]
  }
}
  1. And don’t forget about permissions:
resource "aws_vpc_endpoint_service_allowed_principal" "sap_appflow" {
  vpc_endpoint_service_id = aws_vpc_endpoint_service.sap_vpce.id
  principal_arn           = "appflow.amazonaws.com"
}

Once everything is configured you can start querying SAP via AppFlow.

Conclusion

Integrating SAP data with AWS services opens up a plethora of opportunities for businesses to enhance their operational efficiency, data insights, and decision-making processes. 

Utilising AWS AppFlow in conjunction with VPC Peering offers a secure, reliable, and high-performance solution for this integration, ensuring that businesses can make the most out of their SAP data while maintaining the highest standards of security and performance. 

By following the steps outlined in this guide, organisations can set the stage for a future-proof, integrated ecosystem that drives innovation and business success.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

Ilja Summala
Ilja’s passion and tech knowledge help customers transform how they manage infrastructure and develop apps in cloud.
Ilja Summala LinkedIn
Group CTO