How I Resolved Terraform State Drift After Manual Changes

Fixing Terraform State Drift Caused by Manual Changes: A Step-by-Step Solution

How I Resolved Terraform State Drift After Manual Changes

Introduction:

Terraform is one of the most popular Infrastructure-as-Code (IaC) tools used for automating infrastructure management. It helps maintain consistency and version control for infrastructure configurations. However, Terraform heavily relies on the state file to track and manage the infrastructure. Any manual changes to the infrastructure that aren’t reflected in Terraform’s state file can lead to discrepancies and state drift.

In this post, I’ll walk you through an issue I encountered where manual changes to the infrastructure caused Terraform to show differences between the state file and the actual resources. I'll explain how I identified the cause, synchronized the state, and implemented best practices to avoid similar issues in the future.

Issue I Faced:

I was using Terraform to manage the infrastructure for a project, and everything was working smoothly. However, after making a few manual changes to the infrastructure (outside of Terraform), I ran a terraform plan and noticed significant discrepancies between the state file and the actual resources.

Terraform was showing that certain resources didn’t match the state, and it was planning to modify or even destroy resources that were already manually adjusted. This was a clear sign that the Terraform state file was no longer in sync with the infrastructure.

What Wasn’t Obvious:

At first, I couldn’t pinpoint exactly where the discrepancies were coming from. I knew I had made some manual changes to the infrastructure, but Terraform wasn’t showing detailed information about which specific resources were out of sync.

Terraform relies on the state file to track the current status of the infrastructure, so any manual changes outside of Terraform would cause these discrepancies. While Terraform is capable of detecting changes, it can’t track manual adjustments unless you explicitly synchronize the state or import the resources.

Troubleshooting Process:

  1. Ran terraform plan to Identify Differences:

    The first step I took was running terraform plan to see what Terraform thought needed to be changed. The plan showed that Terraform wanted to destroy or modify several resources, as it detected discrepancies between the state and the actual infrastructure.

     terraform plan
    

    The output confirmed that there were resources that didn’t match the expected configuration, and I suspected these were the resources that had been manually modified.

  2. Checked the Terraform State File:

    After identifying the differences, I checked the Terraform state file to see if it reflected the actual state of the resources. Since I had made manual changes, Terraform no longer had an accurate understanding of the current infrastructure. The state file was outdated and needed to be synchronized.

  3. Used terraform refresh to Synchronize the State:

    I ran terraform refresh to try and update the state file with the actual resource state. This command checks the current state of all resources and updates the state file accordingly:

     terraform refresh
    

    This helped update the state file for some of the resources, but not all of them. Some resources were still missing from the state file because they were created manually or were otherwise unmanaged by Terraform.

  4. Used terraform import for Missing Resources:

    To import the missing resources into the Terraform state, I used the terraform import command. This allowed me to manually bring resources that weren’t initially created by Terraform into the state file.

    Here’s an example of how I imported a resource:

     terraform import aws_instance.my_instance i-1234567890abcdef0
    

    After importing the missing resources, the state file was brought into sync with the actual infrastructure.

  5. Implemented Remote State Backends (S3 & DynamoDB):

    To avoid future state drift, I decided to implement remote state backends using AWS S3 and DynamoDB. Storing the state remotely provides a shared source of truth for the infrastructure, especially in team environments. This ensures that everyone is working with the same state file and reduces the chances of discrepancies.

    I configured the S3 backend and DynamoDB table for state locking as follows:

     hclCopyterraform {
       backend "s3" {
         bucket = "my-terraform-state"
         key    = "state/terraform.tfstate"
         region = "us-west-2"
         dynamodb_table = "my-terraform-lock"
         encrypt = true
       }
     }
    

    Using remote backends ensures that Terraform can track and lock the state in a centralized location, preventing manual changes from causing drift.

Resolution:

  1. Synchronized the State with terraform refresh:

    The first step in fixing the issue was using terraform refresh to update the state file with the actual configuration of the resources. This updated most of the discrepancies, but it didn’t resolve everything.

  2. Imported Missing Resources Using terraform import:

    I imported the manually created resources into the Terraform state to ensure that Terraform could track them going forward. This was crucial for synchronizing the state and preventing Terraform from trying to destroy or modify resources unnecessarily.

  3. Configured Remote State Backends:

    To ensure the state remains consistent and avoid manual drift in the future, I configured Terraform to use AWS S3 for remote state storage and DynamoDB for state locking. This ensured that the state was always up-to-date and shared among all team members.

Key Takeaways:

Here are the key lessons I learned from this experience:

  • Avoid Manual Changes to Infrastructure: One of the best practices in Terraform (and infrastructure automation in general) is to avoid making manual changes outside of the infrastructure code. Manual changes can lead to state drift, making it difficult for Terraform to manage resources accurately.

  • Use terraform refresh to Sync the State: If you encounter state drift or discrepancies, use terraform refresh to synchronize the Terraform state file with the actual infrastructure.

  • Import Resources with terraform import: If Terraform is not aware of certain resources, use the terraform import command to bring them under Terraform management and prevent it from trying to destroy or modify them unnecessarily.

  • Implement Remote State Backends: To prevent state drift in the future, always use remote backends (such as S3) for storing the Terraform state. This ensures that the state file is shared, locked, and consistently updated across your team or deployment environment.


Conclusion:

Terraform is a powerful tool for managing infrastructure as code, but manual changes can disrupt its ability to maintain state consistency. By using commands like terraform refresh and terraform import, I was able to resolve the discrepancies between the state file and the infrastructure. Additionally, by implementing remote state backends, I ensured that future state drift would be avoided.

If you’ve faced similar issues with state drift in Terraform or have additional strategies for maintaining state consistency, feel free to share your experiences in the comments. Let’s continue improving our Terraform workflows together!