Summary:
Experienced SRE with many years of designing and implementing HA systems
in both bare-metal and cloud environments. Excellent ability to understand
how complex systems fail.
Experience:
Systems
- 15+ years experience in Unix/Linux system administration
- Linux Distros: Debian, CentOS, Fedora, Ubuntu
- Packages: Apache, PHP, postfix, ldap, MySQL, lvm, kvm
- Scripting Languages: python, bash, sh, sed, awk
- Provisioning tools: Ansible, Puppet, Chef, Terraform
|
Network
- Application Protocols: LDAP, HTTP(S), SSH, SSL/TLS. SMTP
- Service Protocols: DNS, NFS, Samba, DHCP, etc.
- Security Protocols: IPSec, X.509
- IPv4 routing and diagnostics vrrp and other layer 2 protocols
- Familiarity with IPv6 protocols and implementation
|
Programming
- Languages: Python, Terraform, Java, SQL, Go, C, & C++
- Source control: git, subversion, perforce
|
Database administration
|
Cloud
- Kubernetes administration
- Backblaze B2 storage
- AWS EC2, S3
- VMWare vcenter and ESXI administration
|
Experience:
September 2022 to present: Sr. Site Reliability Engineer at BackBlaze in San Mateo, CA
- Decreased human intervention and increased data durability by developing automated filesystem maintenance tools in python
- Automated systems deployment using Ansible
- Automated back-ups to both AWS S3 and Backblaze B2 using restic
- Managed incidents while on-call
April 2017 to July 2022: Senior Systems Engineer Shutterfly, Redwood City, CA
- Set up and maintained HA redudant ldap cluster distributed across multiple sites
- Automation of backup and restore of etcd datastores in kubernetes cluster
- Helped maintain kubernetes clusters in production and pre-prod
- Implemented pod-restart detection in kubernetes
- Optimized ECS microservice service definitions in AWS
- Assisted on migrating apache reverse proxy cluster from on-prem physical cluster to AWS ECS service
September 2015 to March 2017: Systems Engineer Facebook, Menlo Park, CA
- Maintained an 8 server, distributed IRC cluster
- Developed a sytem to synchronize vmware VM inventory information with an internal accounting system.
- Automated several processes using python and the pyvmomi API
April 2011 to September 2015: Systems Engineer Shutterfly Inc., Redwood City, CA
- Implemented redundant subversion repository for configuration management
- Performance analysis and capacity planning for several clusters
- Managed several MongoDB replica sets
- Managed a multi datacenter Cassandra cluster
- Automation/parallelization of fulfillment processing system using mod_perl and Apache
July 2010 to Feb 2011: Sysadmin and Operations Engineer, Rapleaf Inc., San Francisco, CA
- Set up multi-primary failover pair of LDAP servers
- Implemented trial high availability Kerberos5 authenication servers
- Automated switch configuration management
- Set up multi-primary HA mysql pair using keepalived for automated failover
- Administered an Amanda backup system writing to an LTO5 tape library
- Documentation the proper use of gpg based file encryption for the support staff
- Administered a 200+ node Hadoop cluster
March 2009 to June 2010: Independent Consultant and developer
- Coordinated development of a website for a small non-profit.
- Set up back-end services for that website and their office (email, mailing lists, database, wordpress install and configuration.
- Developed an embedded system and PCB for an educational toy.
- Working on several educational websites.
September 2005 to February 2009: Site Reliability Engineer,
Google, Inc., Mountain View, California
- Responsible for production HR, productivity, project and bug tracking systems.
- Implemented replicable installs for all supported applications.
- Implemented and advised on various monitoring systems.
- Debugged failures in production systems.
- Implemented trans-continental and inter-continental database replication and failover.
- Developed tools for diagnosing and repairing data problems in production systems.
Education:
California State University Hayward
Master's of Science degree in Computer Science
Coursework in: Cryptography, Network Security, Distributed
Systems, Database Theory, Operating Systems, Compression,
Compilers, Artificial Intelligence, etc.
University of Wyoming Laramie,
Wyoming
3 semesters of graduate study
pursuing a Ph.D. in Astrophysics
Humboldt State University Arcata,
California
Bachelor of Science degree in
Physics
Course work in: Astronomy, Physics, Math & Computers
references available on request