Evgeniy Gantman +972 (55) 966 25 15 /+375 29 695 55 52 egDevOps@gmail.com

Manage the three areas of DevOPS - internal infrastructure, customer infrastructure, CI / CD automation

I. Internal infrastructure

infrastructure for providing the company with quality services for internal customers/colleagues - FW, VPN, Internet, Labs, Stages, AWS, Scaleway, etc. organization and optimization of internal resource management - allocation of resources, cycle and life of resources, systematization of these processes preparing, together with the CI/CD team, a cycle of testing and releasing the company's products and projects in terms of infrastructure (automating the creation, modification and deletion of test areas with the necessary environment and parameters, organizing the environment for publishing software and projects for external customers) organization or outsourcing of system administration of equipment of colleagues and office documentation of internal infrastructure for employees. Information for employees about the capabilities of the infrastructure, test benches, access, etc. Setting up and restricting access II. Customer infrastructure

documentation and release of information on the infrastructure of projects in various sections (environment, addressing, physical and logical schemes, areas of responsibility and responsible) Creation (together with the customer and architects) of proposals for the infrastructure of the project, its implementation and support in a consistent state. identification of potential problems in the infrastructure in terms of performance, security, updates. Preparation and implementation of plans to solve problems Tracking trends in infrastructure implementation approaches and applying them to projects. Training on cloud, hybrid and on-premise approaches to project infrastructure, application opportunities in software and project architecture. automation of tasks in the maintenance of project infrastructure monitoring, generation of standard and template metrics for the company's products and projects. Embed monitoring templates and collectors into software releases. Preparation of sections of documentation for products and projects III. CI/CD Automation

automation of infrastructure preparation, taking into account the build direction and type of testing, software distribution options configuring the software distribution environment (Nexus), taking into account versioning and product branches, access control and permissions. documenting the processes for the employees involved in the development, indicating the possibilities and rules of work provided by automation.

Knowlege in AWS

Analytics: Athena, EMR

Compute: EC2, EC2 Auto Scaling

Containers: Elastic Container Registry (ECR), Elastic Container Service (ECS), Elastic Kubernetes Service (EKS), AWS Fargate

Database: DynamoDB, RDS, Redshift

Developer Tools: AWS Cloud Development Kit (AWS CDK), AWS CloudShell, AWS CodeArtifact, AWS CodeBuild, AWS CodeCommit, AWS CodeDeploy, AWS CodePipeline, AWS CodeStar, AWS Command Line Interface (CLI), AWS X-Ray

Management and Governance: AWS CloudFormation, AWS CloudTrail, CloudWatch, AWS Config, AWS OpsWorks, AWS Organizations, AWS Systems Manager, AWS Trusted Advisor

Networking and Content Delivery: API Gateway, AWS Client VPN, CloudFront, Route 53, AWS Site-to-Site VPN, AWS Transit Gateway, VPC, Elastic Load Balancing

Security, Identity, and Compliance: GuardDuty, AWS Identity and Access Management (IAM), Inspector, AWS Key Management Service (AWS KMS), AWS Secrets Manager, AWS Single Sign-On, AWS WAF

Serverless: EventBridge (CloudWatch Events), AWS Lambda, Simple Notification Service (SNS), Simple Queue Service (SQS)

Storage: Elastic Block Store (EBS), Elastic File System (EFS), S3, AWS Storage Gateway

Knowledge of application architecture

Performance: capacity, resource utilization, logic, data storage, caching, concurrency

Scalability: vertical, horizontal(replication, services, caching, async process, partitioning), load balancing(LB, service discovery, dns&geoLB), micro-services)

Reliability: redundancy, detect and recover from fault, timeouts, retries, circuit breaker, fail fast, shed load, back-pressure.

Security: network(key, cert&sign, https, firewall), identity management(transfer, verification, storage), access management(role based access, oauth2, JWT tokens, token verification), vulnerabilities(SQL, css, xsrrf), encryption(symmetric, public key, hashing, digital signature & certificate), safety regulations within the company

Deployment: application, infrastructure, operations, VM, docker, k8s, rolling upgrades, blue-green deployments, recreate deployments, canary deployments.

Technology Stack: CDN, HTTP cache (session, static, dynamic), logs, data search, reports, data storage,

Web containers (Apache, nginx, node.js), object caching(memcache, radis), asynchronous messaging(Redis, RabbitMQ, Kafka), service mesh(istio), datastores(mariasql, PostgreSQL, MongoDB, HBase, Cassandra), analytics (logstash, fluentd, Hadoop HDFS, apache spark, Elastic search, Kafka, Storm, Flink)


Software Development Lifecycle Automation

1.1 Apply concepts required to automate a CI/CD pipeline · Set up repositories · Set up build services · Integrate automated testing (e.g., unit tests, integrity tests) · Set up deployment products/services · Orchestrate multiple pipeline stages 1.2 Determine source control strategies and how to implement them · Determine a workflow for integrating code changes from multiple contributors · Assess security requirements and recommend code repository access design · Reconcile running application versions to repository versions (tags) · Differentiate different source control types 1.3 Apply concepts required to automate and integrate testing · Run integration tests as part of code merge process · Run load/stress testing and benchmark applications at scale · Measure application health based on application exit codes (robust Health Check) · Automate unit tests to check pass/fail, code coverage o CodePipeline, CodeBuild, etc. · Integrate tests with pipeline 1.4 Apply concepts required to build and manage artifacts securely · Distinguish storage options based on artifacts security classification · Translate application requirements into Operating System and package configuration (build specs) · Determine the code/environment dependencies and required resources o Example: CodeDeploy AppSpec, CodeBuild buildspec · Run a code build process

1.5 Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to implement them using AWS services · Determine the correct delivery strategy based on business needs · Critique existing deployment strategies and suggest improvements · Recommend DNS/routing strategies (e.g., Route 53, ELB, ALB, load balancer) based on business continuity goals · Verify deployment success/failure and automate rollbacks

Configuration Management and Infrastructure as Code

2.1 Determine deployment services based on deployment needs · Demonstrate knowledge of process flows of deployment models · Given a specific deployment model, classify and implement relevant AWS services to meet requirements o Given the requirement to have DynamoDB choose CloudFormation instead of OpsWorks o Determine what to do with rolling updates 2.2 Determine application and infrastructure deployment models based on business needs · Balance different considerations (cost, availability, time to recovery) based on business requirements to choose the best deployment model · Determine a deployment model given specific AWS services · Analyze risks associated with deployment models and relevant remedies 2.3 Apply security concepts in the automation of resource provisioning · Choose the best automation tool given requirements · Demonstrate knowledge of security best practices for resource provisioning (e.g., encrypting data bags, generating credentials on the fly) · Review IAM policies and assess if sufficient but least privilege is granted for all lifecycle stages of a deployment (e.g., create, update, promote) · Review credential management solutions (e.g., EC2 parameter store, third party) · Build the automation o CloudFormation template, Chef Recipe, Cookbooks, Code pipeline, etc. 2.4 Determine how to implement lifecycle hooks on a deployment · Determine appropriate integration techniques to meet project requirements · Choose the appropriate hook solution (e.g., implement leader node selection after a node failure) in an Auto Scaling group · Evaluate hook implementation for failure impacts (if a remote call fails, if a dependent service is temporarily unavailable (i.e., Amazon S3), and recommend resiliency improvements · Evaluate deployment rollout procedures for failure impacts and evaluate rollback/recovery processes 2.5 Apply concepts required to manage systems using AWS configuration management tools and services · Identify pros and cons of AWS configuration management tools · Demonstrate knowledge of configuration management components · Show the ability to run configuration management services end to end with no assistance while adhering to industry best practices

Monitoring and Logging

3.1 Determine how to set up the aggregation, storage, and analysis of logs and metrics · Implement and configure distributed logs collection and processing (e.g., agents, syslog, flumed, CW agent) · Aggregate logs (e.g., Amazon S3, CW Logs, intermediate systems (EMR), Kinesis FH – Transformation, ELK/BI) · Implement custom CW metrics, Log subscription filters · Manage Log storage lifecycle (e.g., CW to S3, S3 lifecycle, S3 events) 3.2 Apply concepts required to automate monitoring and event management of an environment · Parse logs (e.g., Amazon S3 data events/event logs/ELB/ALB/CF access logs) and correlate with other alarms/events (e.g., CW events to AWS Lambda) and take appropriate action · Use CloudTrail/VPC flow logs for detective control (e.g., CT, CW log filters, Athena, NACL or WAF rules) and take dependent actions (AWS step) based on error handling logic (state machine) · Configure and implement Patch/inventory/state management using ESM (SSM), Inspector, CodeDeploy, OpsWorks, and CW agents o EC2 retirement/maintenance · Handle scaling/failover events (e.g., ASG, DB HA, route table/DNS update, Application Config, Auto Recovery, PH dashboard, TA) · Determine how to automate the creation of monitoring 3.3 Apply concepts required to audit, log, and monitor operating systems, infrastructures, and applications · Monitor end to end service metrics (DDB/S3) using available AWS tools (X-ray with EB and Lambda) · Verify environment/OS state through auditing (Inspector), Config rules, CloudTrail (process and action), and AWS APIs · Enable, configure, and analyze custom metrics (e.g., Application metrics, memory, KCL/KPL) and take action · Ensure container monitoring (e.g., task state, placement, logging, port mapping, LB) · Distinguish between services that enable service level or OS level monitoring o Example: AWS services that use OS agents (e.g., Inspector, SSM) 3.4 Determine how to implement tagging and other metadata strategies · Segregate authority based on tagging (lifecycle stages – dev/prod) with Condition context keys · Utilize Amazon S3 system/user-defined metadata for classification and automation · Design and implement tag-based deployment groups with CodeDeploy · Best practice for cost allocation/optimization with tagging

Policies and Standards Automation

4.1 Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security · Detect, report, and respond to governance and security violations · Apply logging standards across application, operating system, and infrastructure · Apply context specific application health and performance monitoring · Outline standards for delivery models for logs and metrics (e.g., JSON, XML, Data Normalization)

4.2 Determine how to optimize cost through automation · Prioritize automation effort to reduce labor costs · Implement right sizing of workload based on metrics · Assess ways to improve time to market through automating process orchestration and repeatable tasks · Diagnose outliers to determine use case fit o Example: Configuration drift · Measure and automate cost optimization through events o Example: Trusted Advisor 4.3 Apply concepts required to implement governance strategies · Generalize governance standards across CI/CD pipeline · Outline and measure the real-time status of compliance with governance strategies · Report on compliance with governance strategies · Deploy governance policies related to self-service capabilities o Example: Service Catalog, CFN Nag

Incident and Event Response

Troubleshoot issues and determine how to restore operations · Given an issue, evaluate how to narrow down the unhealthy components as quickly as possible · Given an increase in load, determine what steps to take to mitigate the impact · Determine the causes and impacts of a failure o Example: Deployment, operations · Determine the best way to restore operations after a failure occurs · Investigate and correlate logged events with application components o Example: application source code 5.2 Determine how to automate event management and alerting · Set up automated restores from backup in the event of a catastrophic failure · Set up methods to deliver alerts and notifications that are appropriate for different types of events · Assess the quality/actionability of alerts · Configure metrics appropriate to an application’s SLAs · Proactively update limits 5.3 Apply concepts required to implement automated healing · Set up the correct scaling strategy to enable auto-healing when a failure occurs (e.g., with Auto Scaling policies) · Use the correct rollback strategy to avoid impact from failed deployments · Configure Route 53 to ensure cross-Region failover · Detect and respond to maintenance or Spot termination events 5.4 Apply concepts required to set up event-driven automated actions · Configure Lambda functions or CloudWatch actions to implement automated actions · Set up CloudWatch event rules and/or Config rules and targets · Use AWS Systems Manager or Step Functions to coordinate components (e.g., Lambda, use maintenance windows) · Configure a build/roll-out process to automatically respond to critical software updates

High Availability, Fault Tolerance, and Disaster Recovery

6.1 Determine appropriate use of multi-AZ versus multi-Region architectures · Determine deployment strategy based on HA/DR requirements · Determine data replication strategy based on cost and durability requirements · Determine infrastructure, platform, and services based on HA/DR requirements · Design for HA/FT/DR based on service availability (i.e., global/regional/single AZ) 6.2 Determine how to implement high availability, scalability, and fault tolerance · Design deployment strategy to support HA/FT/scalability · Assess statefulness of application infrastructure components · Use load balancing to distribute traffic across multiple AZ/ASGs/instance types (spot/M4 vs C4) /targets · Use appropriate caching solutions to improve availability and performance 6.3 Determine the right services based on business needs (e.g., RTO/RPO, cost) · Determine cost-effective storage solution for your application o Example: tiered, archival, EBS type, hot/cold · Choose a database platform and configuration to meet business requirements · Choose a cost-effective compute platform based on business requirements o Example: Spot · Choose a deployment service/model based on business requirements o Example: Code Deploy, Blue/Green deployment · Determine when to use managed service vs. self-managed infrastructure (Docker on EC2 vs. ECS) 6.4 Determine how to design and automate disaster recovery strategies · Automate failure detection · Automate components/environment recovery · Choose appropriate deployment strategy for environment recovery · Design automation to support failover in hybrid environment 6.5 Evaluate a deployment for points of failure · Determine appropriate deployment-specific health checks · Implement failure detection during deployment · Implement failure event handling/response · Ensure that resources/components/processes exist to react to failures during deployment · Look for exit codes on each event of the deployment · Map errors to different points of deployment

We use:

- name: virtualbox
- name: proxmox
- name: openstack
- name: vmware

- name: k3s
- name: k8s
- name: helm
- name: istio

- name: hashicorp-vagrant
- name: hashicorp-packer
- name: bash
- name: ansible
- name: hashicorp-terraform
- name: rancher
- name: kubespray
- name: moxaba
- name: jenkins
- name: gitea
- name: gitlab

- name: choco

- name: openproject
- name: nextcloud
- name: glpi

- name: BookStack
- name: MkDocs
- name: hugo
- name: jira
- name: Wiki_js

- name: crowdsec
- name: conjur
- name: hashicorp-vault
- name: keepass
- name: opnsense
- name: Rootkit_Hunter
- name: wazuh
- name: IPBan
- name: OpenVAS

- name: cassandra
- name: mariadb
- name: mongodb
- name: postgresql
- name: hbase
- name: hadoophdfs
- name: kafka
- name: rabbitmq
- name: redis
- name: apachespark
- name: memcache

- name: cerf
- name: minio
- name: longhorn
- name: nexus
- name: nfs

- name: elk
- name: fluentd
- name: monit
- name: loki
- name: vector
- name: zabbix

- name: freefilesync
- name: automysqlbackup
- name: autopostgresqlbackup
- name: bacula
- name: stash
- name: velero
- name: urbackup

- name: dns
- name: guacamole
- name: hashicorp-consul
- name: Headlamp
- name: MikoPBX
- name: nginx
- name: pfSense
- name: portableaps
- name: r-studio
- name: UFS_Explorer_Professional_Recovery
- name: Wireproxy
- name: ntopng