Tag Archives: python

Calling rsync with Python’s Subprocess module

I was recently trying to script multiple file transfers via rsync, but unfortunately, I was unable to control file names.  I chose to use python and issue commands to the OS to initiate transfer.  Initially, everything was working great, but as soon as I encountered a space or parenthesis, the script blew up!

In this tuturial, I’m showing how to transfer a single file, but rsync is a very powerful tool capable of much more.  The principles discussed in this post can be adapted with other uses of rsync, but I’m considering rsync usage to be out of scope here.  There is a very good online man page here: (http://linux.die.net/man/1/rsync).  I’ve chosen to initiate transfers one file at a time, so I can easily have multiple connection streams running vs a single connection stream that transmits all files in sequence.  It is much faster this way with my current ISP, as I suspect they shape my traffic.  Also note, these methods can be applied to scp file transfers as well.

We will start with a very basic rsync command to copy /tmp/test.txt from a remote location to my local pc.  Before starting, I’ve set up public key authentication between my home pc and the remote server.  I initiate the connection from my home pc, as I don’t care to put private keys in remote locations that could provide access to my home network.

/usr/bin/rsync -va myusername@host.domain.com:/tmp/test.txt

This works very well, but what happens when the file has a space? With most commands, we can just wrap quotes around it, and it works.

rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'
rsync: link_stat "/tmp/with" failed: No such file or directory (2)
rsync: link_stat "/home/myusername/space.txt" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1637) [Receiver=3.1.0]

Unfortunately, in this case, the remote system sees “/tmp/with space.txt” as two separate files, /tmp/with and $HOME/space.txt. What we need to do for the remote location is both wrap it with quotes and escape it.  We could also double escape the filename, but I chose to keep things looking a little bit sane.

/usr/bin/rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'

That is fine, but we need a good way to do this on the fly when we are given file names in bulk.  There are three key libraries I like to use when doing this:

  • subprocess – This is an extremely powerful library for spawning processes on the OS.
  • os.path – This submodule of os contains very useful tools for manipulating filesystem object strings.
  • re – Regular expression operations provides an easy to use escape function.

In a nutshell, here is the operation that needs to happen to create the command and execute it:

import re
import subprocess

full_remote_path = "/tmp/filename with space.txt"
full_local_path = "/tmp/filename with space.txt"
remote_username = "myusername"
remote_hostname = "host.domain.com"

# Here we use re.escape to escape the paths.
escaped_remote = re.escape(full_remote_path)
escaped_local = re.escape(full_local_path)

# I've chosen to just escape the local path and leave off the quotes.
cmd = "/usr/bin/rsync -va %s@%s:'%s' %s" % (remote_username, remote_hostname, escaped_remote, escaped_local)
print cmd

p = subprocess.Popen(cmd, shell=True).wait()

Here is the rsync command that is sent to the os:

/usr/bin/rsync -va myusername@host.domain.com:'/tmp/filename with space.txt' /tmp/filename with space.txt

Now that we have this working, now, I get to explain how os.path fits in.  Should you be copying /tmp/mydirectory/afile.txt on the remote system to /tmp on your local system, but /tmp/mydirectory does not exist, you will receive an error:

rsync -qv myusername@host.domain.com:/tmp/test.txt /tmp/mydirectory/test.txt
rsync: change_dir#3 "/tmp/mydirectory" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(694) [Receiver=3.1.0]

The easiest way to do this would be to run a simple mkdir -p command on /tmp/mydirectory before beginning.  Should the directory exist, the command does nothing.  Should it be missing, it will be created with the necessary parent directories.  In a case where you are copying a file to a remote machine, you can pass this command to the remote machine via ssh.

To do this in python, I like to take the full filename, and split it to receive the complete directory path.

import os
import re
import subprocess

local = "/tmp/mydirectory/test.txt"

localdir = os.path.split(local)[0]
localdir = "%s/" % localdir
localdir = re.escape(localdir)

mkdir_cmd = '/bin/mkdir -p %s' % localdir
p = subprocess.Popen(mkdir_cmd, shell=True).wait()

Here is my full example code that I created to test and demo this technique:

#! /usr/bin/python

import subprocess
import os
import re

def do_rsync(rh, ru, rd, rf, ld):

 # The full file path is the directory plus file.
 remote = os.path.join(rd, rf)

 # escape all characters in the full file path.
 remote = re.escape(remote)

 # here we format the remote location as 'username@hostname:'location'
 remote = "%s@%s:'%s'" % (ru, rh, remote)

 # here we define the desired full path of the new file.
 local = os.path.join(ld, rf)

 # This statement will provide the containing directory of the file
 # this is useful in case the file passed as rf contains a directory
 localdir = os.path.split(local)[0]

 # os.path.split always returns a directory without the trailing /
 # We add it back here
 localdir = "%s/" % localdir

 # escape all characters in the local filename/directory
 local = re.escape(local)
 localdir = re.escape(localdir)

 # before issuing the rsync command, I've been running a mkdir command
 # Without this, if the directory did not exist, rsync would fail.
 # If the directory exists, then the mkdir command does nothing.
 # If you are copying the file to the remote directoy, the mkdir command can be passed by ssh
 mkdir_cmd = '/bin/mkdir -p %s' % localdir

 # create the rsync command
 rsync_cmd = '/usr/bin/rsync -va %s %s' % (remote, local)

 # Now we run the commands.
 # shell=True is used as the excaped characters would cause failures.
 p1 = subprocess.Popen(mkdir_cmd, shell=True).wait()
 p2 = subprocess.Popen(rsync_cmd, shell=True).wait()
 print ""
 return 0

rh = "host.domain.com"
ru = "myusername"
rd = "/tmp"
rf = "test.txt"
ld = "/tmp"

print "Here we do a simple test with test.dat"
do_rsync(rh, ru, rd, rf, ld)

rf = "this is a filename - with (stuff) in it.dat"

print "Here is a filename with a bit more character."
do_rsync(rh, ru, rd, rf, ld)

exit()

A function like this could be put into place very easily, but a few changes would be necessary in order to make this production ready.  The rsync cleanup can be minimized by changing the -v to a -q, but in doing this, you will want to check the exit status using subprocess to determine if the transfer was successful.  In my case, I chose to use the process and queue functions from the multiprocessing module to manage multiple streams.

Safe SCP and Delete

My current project involves creating many files on a Raspberry Pi, then immediately transferring them to a more traditional linux server with normal spinning disk hard drives and more system resources.

In order to reduce writes on the Rasberry Pi’s SD card, I intend to store these files on a ramdisk, then I need a safe way to copy the files to the remote server.  In the event a connection is unavailable, this script will detect that the connection failed, and move the file to a failback directory (on the SD card) instead of deleting it after the scp command completes.  If the file is moved to the failback directory, an accompanying json file is also created to store the necessary information.

Assumptions:

  • Script will be run from the source computer.
  • Both computers will be running current versions of OpenSSH.
  • Public key authentication is set up so that it does not require a password to ssh from the source to the destination.
  • scpis installed on the source, and located at /usr/bin/scp. (Verify using the command: which scp).

Variables required for the function:

  • source_file – This is the absolute location of the file to be transferred
    Example: /home/user/temp/transfer_this_file.jpg
  • destination_directory – This is the destination location as would be specified when using the scp command.
    Example: user@remotehost:/home/user/destination_directory/
  • failback_directory – Absolute location of the failback directory on the source computer.
    Example: /home/user/failback/

Python modules imported:

  • subprocess – Yes, I realize that os could be used instead of subprocess, but I’m already using subprocess for other functions in the same file.
  • os
  • uuid
  • json

The variable p is returned as to report on the status of the original transfer.  A zero is returned of the transfer was successful.  A one is returned if the transfer failed.

#! /usr/bin/python

import subprocess
import os
import uuid
import json

destination_directory = "user@remotehost:/home/user/incoming"
fail_directory = "user@notahost:/notadirectory"
fail_user = "bobert@remotehost:/home/ryan/temp"
fallback = "/home/pi/motion/fallback/"

def safe_scp(source_file, destination_directory, fallback_directory):
	# /usr/bin/scp -qB /source_filesystem/source_file.jpg user@host:/destination/
	cmd = "/usr/bin/scp -qB "+str(source_file)+" "+str(destination_directory)
	cmd = cmd.split()
	p = subprocess.Popen(cmd).wait()
	if (p == 1):
		# upload failed, move it to the fallback directory
		filename = os.path.split(source_file)[1]
		#
		# Using os.path to create an absoltue filename underneath the failback
		# directory for rename.
		fallback_file = os.path.abspath(os.path.join(fallback_directory, filename))
		os.rename(source_file, fallback_file)
		# generate filename with info to upload
		json_data = {
			"source_file" : fallback_file,
			"destination_directory" : destination_directory,
			"fallback_directory" : fallback_directory }
		# using uuid to ensure a truly unique filename
		json_filename = "failed_upload-"+str(uuid.uuid4())+".json"
		# creating the absolute filename under the failback directory
		json_filename = os.path.join(fallback_directory, json_filename)
		f = open(json_filename, "w")
		f.write(json.dumps(json_data))
		f.close()
	else:
		# delete source file in case of a successful transfer.
		os.remove(source_file)
	return p

# usage: I've created a bunch of blank text documents to test the transfer.

# simulating errors
print safe_scp("/home/pi/motion/1.txt", fail_directory, fallback)
#print safe_scp("/home/pi/motion/2.txt", fail_user, fallback)
# above line was commented out later in testing after a conflict with denyhosts
print safe_scp("/home/pi/motion/2.txt", fail_directory, fallback)

# normal operation
print safe_scp("/home/pi/motion/3.txt", destination_directory, fallback)
print safe_scp("/home/pi/motion/4.txt", destination_directory, fallback)
print safe_scp("/home/pi/motion/5.txt", destination_directory, fallback)
exit()

A quick note: Careful when playing around with intentionally failed authentication.  My computer is set up to block an ip after a number of failed login attempts.  I actually locked myself out from my Raspberry Pi when testing, and it took me longer than I care to admit to figure out what the exact problem was.  Hope this helps!

PiFace Digital Review

I was selected by Element14 to write a review for the PiFace Digital.

2013-05-20-21.04.50-768x1024

They were kind enough to ship me a free one, so I’m only going to link to the review on their page:
http://www.element14.com/community/roadTestReviews/1457

Here is the purchase link:
http://www.newark.com/piface/piface-digital/daugther-card-pi-face-expansion/dp/48W3976?COM=rasp-accessory-group

I’m currently working on a project to control and monitor the garage doors in my home.  I will document that project here once I’m able to work on it more, and get something working.

Cleverbot vs. Cleverbot

So, I stumbled upon pycleverbot, a nice little module to interface with the Cleverbot website.

Of course, what is the first thing everybody wants to do?  Make Cleverbot talk to itself!

With the module, coding is quite simple.  The version I’m running outputs the code to an html page on my server, but the syntax was screwing up in WordPress, so I’ve left that out.

import cleverbot

# beginning two different cleverbot sessions
# I use steve and bob to keep things separate while coding.
# This is not reflected in the html output.
steve = cleverbot.Session()
bob = cleverbot.Session()

# Gathering info....boring stuff
convo_start = raw_input("How would you like to begin the conversation? : ")
print ""
cycles = int(raw_input("How many cycles would you like to run? : "))
print ""

# Starting the conversation
print "Bob: "+convo_start
reply = steve.Ask(convo_start)
print "Steve: "+reply

i = 0

# continuing the conversation in the loop.
while (i <= cycles): # you will have to edit the less than symbol on this line.
    reply = bob.Ask(reply)
    print "Bob: "+reply

    reply = steve.Ask(reply)
    print "Steve: "+reply

    i = i + 1

# ....and now to tie up my loose ends.
exit()

The output is definitely interesting.  At one point, I grabbed text from the website’s “Think For Me” as a true study of what this would yield.  At one point, they started quoting the song “Still Alive” from Portal!

I’ve also found that this bot seems loop resistant.  I groaned when I saw “Yes you did!; No I didn’t!” cycle about 3 times, but the bot actually recovered.  I’m actually surprised that the bot seems to be great at bringing up it’s own subjects too.

Then every once in a while, the bot will give me a facepalm moment of just utter stupidity.  I’ve learned that as soon as one of them announces that they are Cleverbot that I should just ignore the next 10 lines.  Regardless, I accomplished what I wanted to, and I can live with that.

Now for some examples!

I’d be more than happy to start the script with any examples given to me, just let me know!

Is this already running?

I currently have some of my google voice scripts running once every 5 minutes to avoid overlap.  My processing script and sending script will both be fine if multiple instances run, due to the tracking system I’ve developed on the database, but I worry about the script that actually fetches the incoming messages from Google Voice.  Duplicate instances of that script may cause messages to be fetched twice, and I have yet to write a script to check for duplicate incoming messages.  I might as well make an effort to kill some of the duplicates at the source first.

Here is the function:

# check_if_running.py
import os
import commands
import sys

def checkRunning():

	# ps -ef | grep  | grep -v grep | grep -v

	psCommand = "ps -ef | grep "+str(sys.argv[0])+" | grep -v grep | grep -v "+str(os.getpid())
	psOutput = commands.getoutput(psCommand)

	if (len(psOutput) == 0):
		functionReturn = 0
	else:
		functionReturn = 1
	return functionReturn

print "Function returns: "+str(checkRunning())

exit()

Speaking of duplicates, I have noticed that some outgoing messages are sent multiple times.  My theory is that the phone is actually sending duplicate messages in low signal areas when it cannot be certain if the message has been sent or not, therefore still necessitating the duplicate checking script even if it just just a secondary check on the existing processes.

PS: This is an example of a new-to-me wordpress plugin, WP-Syntax.  Pretty slick?  Yes, I think it is.