AWS storage¶

Available options¶

block storage
object storage

Block storage¶

EC2 instance volumes	EBS: Elastic Block Store	EFS: Elastic file system
built-in, free (paid with the instance fee)	independent of the instance, in the same AZ, we pay $/GB/month	mounted to EC2 or on-premises servers
high IOPs since it is stored directly in the instance (no network)	up to 16 TB, low IPOs, files transferred through the network to the instance
data is lost when instance shuts down, no way of retrieval	durable, multiple replicas will be created in the same AZ, data is safe	replicas across multiple AZs
no snapshot or backup	snapshots and backups available
	found under EC2 service

Object storage¶

S3	Amazon Glacier
object storage	Archival storage
transfers happens on http	transfer happens on more dedicated protocols
slow transfer	very slow retrieval
auto scale
good for static assets, logs	good for saving unneeded bulk data for long times
	transfer objects from s3 to a more permanent storage

Object vs Block storage¶

object storage	block storage
	similar to filesystem, or DB
treats the object as a one whole entity	have the ability to index the blocks inside one file
updating an object, will remove the old object and create a new one from scratch with the updates	can apply the updates to a specific block in the file, without re-uploading it
better for static assets	best if the files content updated regularly
	suitable if you need file-system-like abilities with better IOs

lifecycle rules¶

rules that can be applied to objects (s3) or vaults (glacier) after a certain age of objects.
examples:
1. when s3 object becomes 7 days old, move it from standard storage to standard IA (infrequent access) which has lower cost.
2. when s3 object becomes 30 days old, move it to glacier archive standard storage.
3. when a glacier archive becomes 30 days old, move it from standard storage to glacier standard IA (infrequent access) which has lower cost.
4. when a glacier archive becomes 2600 days (7 years), delete it permanently

AWS snowball¶

petabyte-scale data transport solution
when you signup, AWS will send a device where you can upload your data to it, then ship the device back to AWS.
AWS then will upload the device content to S3.
data uploaded to the device is encrypted using AES 256.

AWS EBS¶

abbreviation of Elastic Block Storage
can be found under EC2 service.
you can create storage block and attach it to EC2 instance, when the instance terminates the data in this EBS volume will still available
EBS volume must be in the same availability zone as the EC2 instance
it can be detached from an instance and attached to another one without any data loss, just as an external USB device
suitable for running database servers on the instance itself, but the DB files must be saved on the EBS volume so the data can be persisted.
achieves high IOPs performance although the instance is communicating with the volume through the network.
once the EBS volume is created and attached to the instance:
1. the volume must be formatted by installing a file system on it
2. the volume then must be mounted to a specific location in the instance file system
3. the files on the volume can be accessed through the mounted location on the instance file system.

volume snapshots¶

snapshot of EBS volume is saved to S3, therefore the snapshot is available in different availability zones (s3 durability) as opposed to the volume itself which is only available on one availability zone.
s3 snapshot buckets are managed by AWS, and we dont need to worry about them.
s3 snapshot can be easily copied to different regions as opposed to the volume itself.
s3 snapshots can be shared across multiple accounts.
snapshots use incremental backups.
snapshots have lifecycle rules just as any other S3 bucket.

AWS S3¶

no filesystem, instead, a flat structure inside a bucket.
s3 cluster -> bucket -> object.
every region has a single s3 cluster.
no folders hierarchy, but object key names allow slashes / in them, which may appear as folder.
max object size is 5TB.
upload limit is 5GB per put operation.
s3 supports multipart upload.
s3 supports server side encryption (SSE) using the Advanced Encryption Standard (AES-256)
s3 has encryption key per object, so if a key compromised, only a single object will be affected.
objects have versions on them, so every time an object is modified, s3 will create a new version of the object, we can roll back versions.
query in place with s3 select, where you can do query or sub-select on the object contents at S3 level without the need to pull the object and do the operations on it using your own compute.
cross-region replications, where you can choose a bucket to be replicated in different regions, and any modification on one region will be applied to the other regions.
S3 event notifications when any modifications happens on the bucket, you can subscribe to the stream of events using a subscriber (eg. lambda function)
S3 transfer acceleration helps boost transfer speed for long-distance communications.
S3 static website, S3 buckets can serve static html files directly.
S3 bucket name must globally unique, through all accounts and all regions
bucket naming rules: https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html
there are no folders in s3 instead s3 create the illusion of folders using object key prefix.

example of a policy that grants public access to an S3 bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": ["arn:aws:s3:::${bucketName}/*"]
        }
    ]
}

example of a policy that allows specific aws account to save logs into a logs bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": { "AWS": "${AWS_ACCOUNT_ID}" },
            "Action": "s3:PutObject",
            "Resource": ["arn:aws:s3:::app-logs", "arn:aws:s3:::app-logs/*"]
        }
    ]
}

AWS Glacier¶

archiving data for long-term storage om low cost.
glacier -> vault -> archive.
data can be transmitted from s3 or direct upload.
data can be retrieved later by submitting a retrieval request which involves 3-5 hours wait before the data is accessible.
we can pay additional fee to get expedited retrieval so the wait hours can be by passed.
pricing depends on the numbers of retrieval requests as $/1000 pulls