About Tigo
Tigo is the worldwide leader in Flex MLPE (Module Level Power Electronics) with innovative solutions that significantly enhance safety, increase energy production, and decrease operating costs of photovoltaic (PV) systems. Tigo’s TS4 platform maximizes the benefit of PV systems and provides customers with the most scalable, versatile, and reliable MLPE solution available.
Tigo was founded in Silicon Valley in 2007 to accelerate the adoption of solar energy worldwide. Tigo systems operate on 7 continents and produce gigawatt hours of reliable, clean, affordable and safe solar energy daily.
We need top-notch individuals with a passion for solving complex problems and bringing renewable energy to the masses. Members of the team enjoy rewarding salaries, excellent benefits, an uninhibited work culture, and the satisfaction of helping to reduce the world’s dependency on fossil fuels. We work hard knowing our results will impact the affordability, reliability, and safety of clean and renewable energy systems.
Job Description
Help power the future of energy intelligence by building and running the infrastructure behind Tigo’s AI/ML forecasting (Predict⁺) and global solar monitoring platform (Tigo EI). We’re still a relatively small global team, operating with a startup mindset: fast decisions, high ownership, and tight collaboration across functions.
What you’ll work on
You’ll be part of a small, hands-on DevOps team that owns the infrastructure behind:
- Predict⁺ – AI/ML-based forecasting for energy generation and consumption.
- Tigo Energy Intelligence (EI) – a monitoring and analytics platform used in 100+ countries.
- We run a hybrid environment: Azure cloud plus bare metal (Hetzner + on-prem server rooms), with Talos Kubernetes, modern observability, and GitOps.
Role overview
As a mid–senior DevOps Engineer, you will:
- Run and evolve our Kubernetes platforms (Talos on bare metal, AKS on Azure).
- Own Infrastructure as Code and GitOps-driven delivery.
- Improve reliability, observability (OTEL), security, and cost efficiency of production systems.
- Work in a fully remote, global, highly asynchronous team with a lot of autonomy and ownership.
Responsibilities
Kubernetes & infrastructure
- Operate and improve Talos-based bare-metal clusters and Azure AKS.
- Handle deployments, upgrades, scaling, backup/restore, and troubleshooting.
- Contribute to security hardening (RBAC, network policies, image/secret hygiene).
Cloud & bare metal (Azure + Hetzner + on-prem)
- Help manage Azure subscriptions, networking, identity, and security baselines.
- Work with bare-metal servers (Hetzner + on-prem) and connectivity between them and Azure (VPNs, routing).
- Support capacity planning and cost-aware designs.
IaC, automation & delivery
- Use Terraform as the source of truth for infrastructure.
- Use Ansible for configuration and repeatable provisioning.
- Build and maintain CI/CD pipelines with self-hosted GitLab CI and Azure DevOps.
- Implement and operate GitOps with Argo CD and self-managed GitLab.
Observability, data & incident response
- Extend and maintain Prometheus, Grafana, Zabbix and OpenTelemetry (OTEL) for metrics, logs, and traces.
- Work with self-hosted PostgreSQL and ClickHouse, plus Kafka, Redis, and pub/sub / advanced queuing.
- Help define alerts, SLOs, and runbooks; participate in on-call and post-incident reviews.
Security & compliance
- Apply DevOps practices aligned with ISO27001 (access, logging, change management, backups).
- Contribute to secrets management, least-privilege access, and image/infra hardening.
Requirements
You don’t need every single item, but you should recognize yourself in most of these:
- Experience with Azure (subscriptions, basic networking, identity, security concepts).
- 4+ years in DevOps / SRE / Infrastructure roles with real production ownership.
- Strong hands-on experience with Kubernetes (self-hosted and/or AKS).
- Solid experience with Terraform and Ansible.
- Experience building and running CI/CD with GitLab CI and/or Azure DevOps.
- Strong Linux fundamentals and troubleshooting skills.
- Good understanding of networking (TCP/IP, DNS, VPNs, load balancers, firewalls).
- Exposure to Prometheus / Grafana / Zabbix or similar monitoring tools.
- Scripting in Bash and/or Python.
- Excellent communication skills and very strong written and spoken English.
- Comfortable working in a fully remote, asynchronous, global team and owning work with minimal hand-holding.
- Willingness to participate in a production on-call rotation.
Nice to have
- Running globally distributed, customer-facing SaaS or energy/IoT systems.
- Production use of OpenTelemetry (OTEL) and modern observability stacks.
- Hands-on with Talos Kubernetes and hybrid Azure + bare-metal setups.
- GitOps at scale with Argo CD and self-managed GitLab (SCM, runners, registry).
- Operating ClickHouse at scale (backup/restore, performance, retention).
- Keycloak / OIDC / SAML, ELK/Wazuh, SAST/DAST, or other security tooling.
- Background in bare-metal operations (servers, storage, virtualization, networking).
Why you’ll enjoy working here
- Impact: Your work directly supports platforms that monitor, optimize, and forecast clean energy production worldwide.
- Environment: Public company stability (NASDAQ: TYGO) with small-team, startup-style execution and ownership.
- Stack: Modern tools (Kubernetes, Talos, Terraform, GitOps, OTEL, Kafka, ClickHouse, Redis) with room to shape direction.