CloudyNerd: December 2016

Some things to consider while architecting an AWS environment :

Design instances with at least 2 or more AZs.
Purchase reserved instances in DR zones. This because you have a guaranteed instance capacity.
Use Route 53 to implement failover DNS techniques or Latency based routing.
Maintain backup strategies.
Keep AMIs up to date
Backup AMIs/ EBS snapshots to multiple regions
Design your system for failure (read Chaos Monkey) and have a good resilience testing in place.
Use Elastic IP addresses to failover to stand by instances when autoscaling and load balancing is not available.
Decouple application components using services like SQS.
Throw away broken instances.
Utilize bootstrapping to bring up new instances with minimal config.
Utilize CloudWatch to monitor infra changes/health.
Always enable RDS Multi-AZ and automated backups.
Create self healing applications.
Utilize MultiPartUpload for S3 uploads.
Cache static content using CloudFront.
Protect transit data by using HTTPS/SSL endpoints or connect to instances using Bastion hosts or VPN connections.
Protect data at rest using encrypted file systems or EBS/S3 encryption options.
Use AWS Key Management service/ Use centralized signon for on prem users and apply to EC2 instances/ IAM logins.
Never ever store API keys on an AMI. Instead use IAM roles on EC2 instances.

Use CloudWatch for shutting down inactive instances, There are basic monitoring checks that CloudWatch provides. Following needs custom scripts,

Disk Usage, Available Disk space.
Swap usage,Available swap.
Memory Usage; Available memory.

Use AWS Config service which provides a detailed configuration information about an environment. take point in time snapshots of all AWS resources to determine state of your environment. View historical configurations within your environment by viewing snapshots. Receive notification/ view relationships between different AWS resources.

Use AWS Cloudtrail for security/compliance/monitor all actions against the AWS account.

Know when to use S3, S3 RRS and Glacier service for storage.
Know when to run DBs on EC2 or when to use managed RDS DB providers.

Have Elasticity/Scalability :

Proactive Cycle/ Proactive Event based scaling/ Demand based auto scaling
Vertical scaling - adding more resources to your instances i.e. increasing their size/type
Horizontal scaling - adding more instances to existing ones.
Eg. DynamoDB has high availability/scalability. Use read replicas to offload heavy traffic Increase instance size. Use ElastiCache clusters for caching DB session information.

Data Security with AWS :

There is a shared responsibility model.
AWS is responsible for :

Facilities
Physical security of hardware
Network Infra
Virtualization infra

Customers (We) are responsible for

AMI
OS
Apps
Data in transit/rest/stores
Creds
Policies and configs

AWS provides us with network level firewalls. inside we have SG which provide another security layer across an EC2 instance level. You can also have OS level firewalls e.g. IPTables, FirewallD, Windows Firewall on individual EC2 instances. Can also use Antivirus S/W like trendMicro which integrates into AWS EC2 instances.

Remember : One requires permission to do any type of port scanning even in your own private cloud.

S3 has built in AES-256 bit encryption that encrypts data at REST. It is decrypted as it is sent to customer while downloading.
EBS encrypted volumes - Data is encrypted at EC2 instance and copied to EBS for storage. Any snapshot taken is automatically encrypted.
RDS encryption - MySQL, Oracle, PostegreSQL, MS SQL all support this feature. Encrypts the underlying storage space for that instance. Automated backups encrypted as well as snapshots. Read Replicas are encrypted as well. Provides SSL to encrypt a connection to a DB instance.

CloudHSM :

Hardware Security Module is a dedicated physical machine/appliance isolated in order to store security keys and other types of encryption keys used within an application. HSM have special security mechanisms like they are separated physically from other resources. Tamper resistant.

Not even "AWS engineers" have access to the keys on CloudHSM appliance.

AMIs can be shared between different user accounts but in the same region !

You cannot use Cnames in apex of a domain in Route53. Apex is just google.com. no subdomain.
But you can put cname of a load balancer in the subdomain like news.google.com.
Just select the LB name in Alias for Apex record.

As long as the instances belong to public subnet, the ELB will direct traffic to them in spite of the instances having no public IP assigned to it.
You can configure access logs in the ELB so that you can view the source IP of the traffic requests.

Disaster Recovery

Recovery Time Objective (RTO) : Time it takes after disruption to restore operations back to its regular service level as defined by the company's operational level agreement. eg. If RTO is 3 hours, we have 3 hours to restore back to acceptable service level.
Recovery Point objective (RPO) : acceptable amount of data loss measured in Time. eg. If system is dowan at 4PM and RPO is 2hours then system should recover all data as part of app as it was before 2PM.

Pilot-Light : Minimal version of Prod environment is running on AWS. replication from on-prem to AWS is done in event of an disaster. DNS switch could be done.
Warm- Standby : Larger than pilot-light. Running critical apps in standby mode in AWS. rest instances will be running in Poilot-Light mode.
Multi-Site solution : Clone of Prod env. Just flip the switch. Its costly but you have almost less downtime in case of a disaster.

AWS Direct Connect

When we connect to out cloud network we usually connect the open network which includes network costs, network latency via public routing. All of this can be avoided by having a direct connection from your on-prem network to your cloud network.

Thus Direct connect will reduce bandwidth commitment, data transferred over direct connect is billed at lower rate, also dedicated private connections reduce latency rather than sending traffic via public routing.

So its like a dedicated private connection. We can also use multiple virtual interfaces.

AWS direct connect only allows to connect with private internal IP addresses. You cannot access public IP addresses.

We can use 2 active active or active passive direct connects for HA. You can also use Public Virtual interface to connect to any public service like S3, DynamoDB etc.

Direct connect provides only access to one specific region.

Amazon Kinesis

It is a developer service. A realtime data processing service to capture and store large amount of data to powerreal time streaming dashboards of incoming data streams.

Kinesis dashboards can be created using AWS SDKs. Kinesis also allows us to export data to other AWS services for storage.

Multiple Kinesis apps can process the same incoming data streamed.

Kinesis syncs data across 3 data centers (AZs) in 1 region. The data is preserved for upto 24 hours.

Very highly scalable.

Kinesis can be used in applications like real time gaming, real time analytics, application alerts, mobile data etc.

Kinesis workflow -

Create stream

Build producers to input data
Consumers consume the stream : can be S3, EMR, Redshift

1 shard = 1 MB per sec

Amazon CloudFront

Its a global CDN which delivers contect from an origin location to edge location.

Origin can be S3 bucket or ELB CNAME that distributes requests among origin instances.

Signed URLs can be used to provide limited time access to private content. CloudFront can also work with Route53 for alternate CNAMES like cdn.xyz.com

CloudFront is designed for caching. It will serve the cached file stored on the edge until cache expires. This only if caching is enabled.

In order to create a new version of that object you will have to create a new object with new name or either invalidate that object. Remember : Invalidations made on CloudFront cost. First 100 invalidations are free.

You can also view additional info like what percentage of viewers are from mobile, desktop, tablet etc.

CDN has ability to set S3 bucket list permissions despite the bucket having no list permission policy !

You can also create separate behavior and restrict access to certain files like certain path pattern eg finance/* etc or a behavior where everything ending with *.xml should go to certain distribution etc

Route 53

Domain hosting service which can be used for failover.

Route 53 also provides latency based routing.

Also provides weighted record sets for routing.

Geo based record sets allows us to send requests coming from specific regions to a specific endpoint.

SNS (Simple Notification Service)

Used to receive notifications when events occur in the environment.

SNS consists of - Topic (where message is sent to), Subscription end points (include JSON, SMS, HTTPS, SQS queue)

SQS (Simple Queue Service)

Highly available queue service. Allows creation of distributed/decoupled application components.

Each message can contain upto 256 KB of text in any format.

SQS guarantees message delivery but does not guarantee the order and duplicity of messages.

generally a worker instance will poll queue to retrieve waiting messages for processing.

I have worked on Azure queues in one of the firms where I worked. Really good service if you would like to have a distributed application.

There are 2 types of polling,

Long polling - allows the SQS service to wait untill a message is available in a queue before sending a response. It will stay connected for the defined amount of time.
Short polling - will not return all possible messages in a poll. Returns only a subset of servers. Thus increasing the api calls.

The difference between SQS and SWF is that SWF is task oriented.

Amazon EMR (Elastic Map Reduce)

EMR is a service which deploys out EC2 instances based off of the Hadoop big data software. EMR is used to analyze and process vast amounts of data.

EMR can run any OpenSource application built on Hadoop.

EMR just launches pre built AMIs with Hadoop baked into it. It gives admin access to underlying OS. Integrates with AWS services like S3, DynamoDB etc. Bootstrappers enable the ability to pass config information before Hadoop starts when a new cluster is created.

Basically EMR is made up of Mappers and Reducers. Mappers split large data file for processing. Reducers take the results and combine it into a data file on the disk.

Processes files into chunks of 128MB/64MB.
Every EC2 size instances has their own Mappers and Reducers limit. e.g. one m1.small EC2 instance will have 2 mappers and 1 reducers.

Goal should be to use as many mappers as you can with out running out of memory for loading of data.

Master Node - Only provides info about data like where it is coming from and where it should go.
Core Node - Processes data and stores it on HDFS or S3.
Task Node - Processes data and sends back to Core node for storage either on HDFS or S3.

Amazon CloudFormation

Pure definition of Infra as code. We can create resources using JSON templates.

You can create a new Stack or you can create a template from existing infra setup.
We can have 20 cloud formation templates. Also the template names must be unique in your environment.

Troubleshooting AWS

Ports are by default closed on SGs. You need to open the ports to connect to EC2 instances.

EBS volumes must be present in same region as EC2 instances.

Watch out for EC2 instance capacity.

EC2 needs public IP address and needs to be in a public subnet in order to connect to the internet.

In EC2-classic the Elastic IP address will be detached when instance is stopped.

AMIs are available only in the regions they are created. AMIs copied to new regions receive new AMI ids.

Make sure to auto-assign public IP address to EC2 instances while creating them so you don't have to assign them a public IP address manually.

you should add on-prem routes to Virtual Private Gateway route table in order to access extended resources.

When you add NAT instance you need to add 0.0.0.0/0 route to i-xxx on the route table for private subnets.

Peering connections between VPCs can only be made between VPCs in the same region.

Make sure the proper ports are open in both NACL and SGs.
Only one internet gateway can be attached to a VPC.
Only one Virtual private gateway is needed on a VPC.
You can assign EC2 instance to multiple SGs.

You should enable cross load balancing.
Check proper status checks assigned with Load Balancer in order to see if instances are healthy or not.
ELB needs to have a SG with port 80 open.
Enable access logs for ELB to S3 bucket for better troubleshooting.
Subnet needs to be added to the EBL in order to add the instances for load balancing.

CloudyNerd

Thursday, 15 December 2016

Some AWS papers worth reading....

aws-building-fault-tolerant-applications_1428519022

aws-amazon-emr-best-practices_1427755536

aws-cloud-architectures_1427123268

aws-cloud-best-practices_1427123288

aws-csa-2015-slides_1479830830

aws-disaster-recovery_1428162405

aws-security-best-practices_1427123343

aws-security-whitepaper_1427042776

aws-storage-options_1427123308

exam-blueprint-new_1479416469

intro-to-aws-security_1469227682

Thursday, 8 December 2016

Some random things about AWS that cross my mind