Schedule Virtual Machine on Microsoft Azure with Data Factory

Victor Bonnet
9 min readApr 22, 2022

Here are the main topics of the article:

  1. Create resources: Virtual machine, Storage Account, Data Factory, Key Vault
  2. Create secrets
  3. Build a pipeline in Data Factory and get Secrets from the Virtual machine
    3.1. Set Data Factory access
    3.2. Set virtual machine access
    3.3. Virtual Machine setup
    3.4. Build the Data Factory pipeline
  4. Schedule the pipeline

Prerequisite: An Azure subscription

1. Create resources

1.1. Resource group

Create a Resource group in which resources will be saved by choosing a name and a region. Then try to keep the same region for all resources that will be created later on.

1.2. Storage account

Create a Storage account in the previously created Resource group.

Choose a name for it and also choose the option Enable hierarchical namespace in the Advanced tab (note that this is not mandatory). The other parameters can be kept default.

Once the Storage account is created, click on Container link on the left pane. Then add a new container by clicking on the +Container button and give it a name. Here we name it sandbox.

1.3. Virtual Machine

Create a Virtual machine in the previously created Resource group.

  • Choose a name for it, here we name it scheduledvm
  • Select an Ubuntu image
  • As an Authentication type, select SSH public key. Choose a Username or keep the default one: azureuser. Select Generate new key pair for SSH public key source parameter
  • In Management tab, under Identity, select System assigned managed identity
  • The other parameters can be kept default
  • After click on Create button, when prompted, click Download private key and create resource to continue.

Save the key in a known location for a future use

  • Once the Virtual machine is created and running, Stop it for now

1.4. Data Factory

Create a Data Factory resource in the previously created Resource group with the default parameters: give it a name, Git configuration is not required.

1.5. Key Vault

Create a Key Vault resource in the previously created Resource group with the default parameters.

2. Secrets

To grant access to the storage account from the Virtual machine, we can use a Shared Access Signature (SAS) token.

  • Select the Storage Account resource and under Security + networking on the left pane, click Shared access signature
  • Select All elements in Allowed resource types Service, Container, Object — and select all elements in Allowed permissions. (minimum of read, write and maybe list permissions are mandatory)
  • Choose the expiry date of your choice for the token
  • Click Generate SAS and connections string button

The generated SAS token, among others, will appear on the below, select and copy it.

  • Go to Key Vault resource previously created
  • Select Secrets under Settings section on the left pane
  • Click +Generate/Import tab
  • Choose a name for the secret and paste the SAS token as the value
  • Click Create

Now go the Storage account resource and select the container previously created.

  • Go to the Properties of the container and select the URL of the container that will appear on the screen
  • Create a second secret in the Key Vault containing this URL

At this point we have the two following secrets in the Key Vault:

3. Build a pipeline in Data Factory and get Secrets from the Virtual Machine

The pipeline in Data Factory consists in three activities. At the end, it should look like this:

3.1. Set Data Factory access

Allow the Data Factory access to the Virtual machine so it can start it, ask it to execute a script and then stop it:

  • Go to the Virtual machine resource
  • Select Access control (IAM)
  • Select +Add > Add role assignment
  • Select an appropriate role, for example Virtual Machine Contributor
  • On the Members tab, select Managed identity and add the Data Factory as a member
  • Click on the Review + assign button

3.2. Set virtual machine access

Here the Virtual machine needs to access the Key Vault.

To get secrets saved in the Key Vault from the Virtual Machine, we will use the Azure-CLI and to be able to use the Azure-CLI, we need to login to the Azure account.

So we need to do two things:

a. Allow the Virtual machine to login to the Azure account
b. Allow the Virtual machine to get secrets from the Key Vault

3.2.1. Allow virtual machine to login to the Azure account

Here we have to set the Virtual machine access at subscription level.

  • Select the subscription in the Azure account
  • In the left pane, click on Access control (IAM)
  • Click +Add and Add role assignment
  • In the Role tab, select Contributor role
  • In the Members tab, select Assign access to Managed identity and select the Virtual machine as a member
  • Select Review+assign

3.2.2. Allow virtual machine access to the Key Vault

Here we give access to the Key Vault to Virtual machine. This time we need to use Access Policies.

  • Open the Key vault resource
  • Under Settings menu on the left pane, select Access Policies
  • Click on +add access policies
  • Select minimum of Get permission for Secrets
  • Select the Virtual Machine as principal
  • Click Add button
  • Don’t forget to Save for the new access policy to be taken into account

3.3. Virtual Machine setup

Login to the Virtual machine as described below:

We use the cloud shell:

  • Open the Virtual machine resource previously created and start it
  • Once the Virtual machine is running, click on the Connect button and select SSH
  • Click on Cloud shell button on the top right menu in Azure portal
  • In the terminal that opens at the bottom of the screen, click Upload/Download files and upload the private key (.pem file) that was generated at Virtual machine creation
  • As indicated in the Virtual machine connection screen, execute chmod 400 (or chmod 600) on the key
  • Then connect to the Virtual machine by using the command described above, by replacing the <private key path> with the correct name of the key (note that the ip address will not be the same)

Once logged in, execute the following commands:

# update the package list
sudo apt-get update
# install Azure-CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# install jq to extract JSON data
sudo apt install jq
# install azcopy to copy files to storage account
wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
sudo rm -f /usr/bin/azcopy
sudo cp ./azcopy_linux_amd64_*/azcopy /usr/bin/
sudo chmod 755 /usr/bin/azcopy
rm -f downloadazcopy-v10-linux
rm -rf ./azcopy_linux_amd64_*/
  • Create a .sh file here, we name it mainScript.sh
  • Edit it with the editor of your choice, here we use VIM editor, and copy the content that is described below in the file. (I will not explain how to use VIM editor here but everything needed can be found on internet or at the following address https://vim.rtorr.com/)
  • Execute the following command: chmod 777 mainScript.sh

The script above does the following actions:

  • Login to the Azure account without prompts
  • uses Azure-CLI to retrieve secrets froms the Key Vault
  • Create a file namesd according to the current date and time
  • Save the file in the Storage account using SAS token

At this point, executing the command sh mainScript.sh should work and a file should appear in the Storage account.

3.4. Build the Data Factory pipeline

Open the Data Factory resource previously created and create a new pipeline:

3.4.1. Set pipeline parameters

To make things easier later, let’s add some pipeline parameters:

Note that the secrets URL can be found on the Key Vault By clicking on the previously created secrets.

3.4.2. Web activity to start the virtual machine

Add a new Web activity and give it a name. Then set the parameters on the Settings tab as described in the capture below:

  • URL: click Add dynamic content and write this command (note that the command depends on the pipeline parameters name)
  • Method: POST
  • Body: should not be blank, can be anything
  • Authentication: System Assigned Managed Identity
  • Resource: https://management.azure.com

At this point, debuging the pipeline should start the virtual machine.

3.4.3. Make the virtual machine execute a script

To do this, we need another Web activity.

Add a new Web activity and give it a name. Then set the parameters on the Settings tab as described below:

  • URL: click Add dynamic content and write this command (note that the command depends on the pipeline parameters name)
  • Method: POST
  • Body: this is to execute the script as azureuser user

(note that you might need to change azureuser in the script below by the username choosen at Virtual Machine creation)

Link the two activites as follow:

At this point, running the pipeline should start the virtual machine and execute the script.

3.4.4. Stop the virtual machine

Copy/paste the StartVM activity previously created and rename it. The only change to be made is in the URL input in the Setting tab. The keyword Start should be replaced by Deallocate:

Link the last activity as follow:

4. Schedule the pipeline

The last step in Data Factory is to schedule the pipeline to automate the script execution on the virtual machine.

For this we can use Data Factory triggers.

  • In the Data Factory left pane select Manage and then Triggers
  • Click on +New to create a new trigger with the desired parameters
  • Select the Start trigger on creation option
  • Click OK button to create
  • Go back to the pipeline and click Add trigger and then New/Edit
  • Select the previously created trigger and click OK
  • Ignore the next screen about parameters by clicking OK
  • Click Publish all in the top menu

Now the Data Factory pipeline should execute accordingly to the trigger settings and a file named depending on pipeline execution date and time should appear in the storage account container.

Conclusion

This is one way to automate script execution on Microsoft Azure, there are many other ways to do it. This is a simple example, however it could be the starting point for more complex projects.

I hope you enjoyed the article and that it will be helpful in solving problems. If you like please clap and if you have any suggestion of improvment feel free to comment.

--

--

Victor Bonnet

Learning Data engineering, Cloud computing and data science after 10 years as structural engineer in aeronautics