commcare-export is a library designed to let you export your data from CommCare HQ in a simple, configurable, and reusable way.
commcare-export will automatically pull your form and case data from CommCare HQ and save it locally, typically to either an excel spreadsheet or to a local database. You can configure it to pull data on a schedule to automatically keep your data up to date, or run it as a one-time data dump. Here are the basic steps:
- Configure your export files (excel or JSON)
- Run export
- (Optionally) configure export to run automatically
- Use data
The rest of this page describes how to use
commcare-export in more detail.
This section is for you if...
- You are looking to export data from CommCareHQ into a SQL database or Excel Workbook.
- You are moderately technical, comfortable running command-line programs on your platform of choice. (If you are a programmer you will be better served by going directly to the README on Github)
Prerequisites and Conventions
- If you are looking to export into a SQL database, you must have this database set up, and you should know how to provide a URL for it.
- For Windows instructions, if something is typed in monospace font starting with a
>sign, this is something for you to type into the command prompt, accessible via Start > Run > cmd. For example, the following indicates that you should type "this is a command" into the Command Prompt.
> this is a command
- For non-Windows instructions, if something is typed in monospace font starting with a
$sign, this is something for you to type into a command prompt. For example, the following indicates that you should type "this is a command" into a terminal.
$ this is a command
This section guides you through getting your system ready to use the
commcare-export command-line tool.
1. Install Python
Visit the Python for Windows page and download a package appropriate for your system. The following instructions assume that you have select Python 2.7, but this only affects the location of Python on your hard drive. This tool works with all recent version of Pythons.
On Mac OSX:
On recent versions of Mac OSX, an OK version of Python is already installed. Lucky you!
If you are on a Linux-based operating system, you probably have an automatic package manager available. Lucky you! You probably already have Python installed or know how to install it, but if not, then you will need to invoke the package manager for your operating system. Here are some possible commands.
$ apt-get install python
$ yum install python
$ emerge python
pip (a program that automatically installs Python programs)
Visit the following links for Windows packages for Python and install the following:
- Distribute - the file version you want to install depends on a) the type of windows operating system (32 or 64 bit system) and b) what version of python you have installed on your computer.
*Note that in the following instructions, the Windows version will use
easy_install while other platforms use
pip. Do not be alarmed - these are essentially equivalent tools, except that recent versions of Python and Windows appear to ship a broken
pip, so we need to use the older
On Mac OSX and Linux:
$ sudo easy_install pip
(with pip) > c:\python27\scripts\pip install commcare-export
(with easy_install) > c:\python27\scripts\easy_install commcare-export
On Mac OSX and Linux:
$ sudo pip install commcare-export
Getting started with commcare-export
This section describes how to run the examples that come with commcare-export against a sample project from the CommCareHQ exchange. After working through it, you should have some familiarity with the, its configuration, and how it is used.
1. Set up some demonstration data in CommCareHQ
- Visit the CommCareHQ Exchange and add the Simple CommCare Demo/Tutorial" app to a new project space.
- Visit the Release Manager, make a build, click the star to release it.
- Visit CloudCare and fill out a bunch of forms.
2. Run the examples and see the output in the Terminal
Download the examples demo-registrations.xlsx, demo-deliveries.xlsx and demo-pregnancy-cases.xlsx attached to this page. In the following, you will need to adjust the path to the file to be wherever you downloaded them and fill in the name of your project - the bits in bold. The following commands will run the examples in such a way that the "tables" will be simply printed.
> C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-registrations.xlsx
> C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-deliveries.xlsx
> C:\python27\scripts\commcare-export --output-format markdown --project YOUR_PROJECT --query demo-pregnancy-cases.xlsx
$ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-registrations.xlsx
$ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-deliveries.xlsx
$ commcare-export --output-format markdown --project YOUR_PROJECT --query demo-pregnancy-cases.xlsx
- 'YOUR_PROJECT' refers to the name of your domain, not the entire domain URL. For instance, the URL would be https://www.commcarehq.org/a/YOUR_PROJECT. For the tool, simply type in YOUR_PROJECT.
- If you are using a domain that isn't stored on www.commcarehq.org, then you will need to use the "--commcare-hq YOUR_URL" flag in the cmd prompt.
3. Run the examples and save the output to an Excel Workbook
Nearly the same command will save the output to an Excel Workbook.
> C:\python27\scripts\commcare-export --output-format xlsx --output registrations-export-results.xlsx --project YOUR_PROJECT --query demo-registrations.xlsx
(try modifying the command above to save the results of each of the examples to a file)
On Mac OSX / Linux:
(try coming up with this command yourself based on reading the Windows example above!)
- If you are using windows, the command line for any path reference to folder names with spaces in the titles need to be included in quotes. For exampe, the folder 'My Data' would need to be referenced by "C:\Users\YOU\My Data" (quotes included in the command)
4. Save the output of an example query to a SQL database
By choosing "
--output-format sql" and
"--output URL" you can save the output to a SQL database. First, let us try it with SQLite, a simple database that is almost certainly already installed and ready to be used on any modern computer, so you do not have to have anything set up to run these examples. In the following, when you see a lite that starts with "
sqlite>" it is mimicking what it is like to interact with the database, so you should type in the bits that come after that.
In these commands, the term
sqlite:///registrations.db is a standard URL format for databases. If, for example, you have a PostgreSQL database hosted on
registrations that you would like to save into instead, the URL might look like
postgresql://db.example.com/registrations (you may need to include username and password in the URL, but fully explaining the format is beyond the scope of this tutorial).
> C:\python27\scripts\commcare-export --output-format sql --output sqlite:///registrations.db --project YOUR_PROJECT --query demo-registrations.xlsx
(you will not see any output)
> sqlite3 registrations.db
sqlite> SELECT * FROM "Registrations"
$ commcare-export --output-format sql --output sqlite:///registrations.db --project YOUR_PROJECT --query demo-registrations.xlsx
(you will not see any output)
$ sqlite3 registrations.db
sqlite> SELECT * FROM "Registrations"
5. Synchronize recent changes periodically
One expected way to use this tool is to run a weekly/daily/hourly export of just those forms and cases which have changed. To set this up you will need knowledge of how to set up scheduled tasks, so this tutorial will just show a single example command and trust that you are comfortable adjusting it to suit your needs. For
SOME_RECENT_DATE, any typical date format should be understood.
$ commcare-export --output-format sql --output YOUR_DATABASE_URL --since SOME_RECENT_DATE (the rest of the command as above)
NOTE: It is always safe to run this for the same time period twice, or overlapping time periods, as the command always inserts new data, but simply updates existing rows.
Excel Configuration Details
This section describes in detail the Excel configuration for commcare-export. You should open one (or many) of the examples and compare them to the explanations below.
One Configuration Sheet Per Output Table:
Each sheet of the configuration workbook is the configuration for an output table. The table will have the same name as the sheet. The sheet consist of a series of columns with names that are special to commcare-export. These column headers are required but also freeing - you may have any other columns you like as long as they do not have one of the reserved titles.
Columns of a Configuration Sheet
- Data Source: (required) This column specifies whether you want to export from the
formAPI or the
caseAPI. In most cases, you will simply put one of those two words in the first cell of the column. (Advanced Usage: You may specify a deep JSONPath into the data in order to iterate over repeats. If these terms don't sound familiar, then you probably don't need them!)
- Filter Name: (optional) This column specifies the names of filters to use for forms and cases. The fields that you can filter on are restricted: For forms, you may use
xmlns.exactto filter by form type and
app_idto filter by your application's ID. For cases, you may use
typeto filter by the case type.
- Filter Value: (optional) This column must be the same length as Filter Name as the values are paired with names in order to filter the results you request from CommCareHQ.
- Field: Each row of this column specifies the name of one output field. Note that for use with a SQL database, a field named
- Source Field: Each row of this column specifies where the data of the corresponding Field should be extracted from. This must be the same length as Field, naturally!
How To Reference Fields in Forms and Cases
In general, the name of form and case fields matches those that you wrote when you created your application, with predictable modifications. You should open an example Excel configuration file right now to compare what you see there with the explanations below.
Top-Level Form Fields: A field named FIELD in a form is referenced by
form.FIELD. For example, the field
age is referenced by
form.age in an Excel config file.
Top-Level Case Fields: A field name FIELD in a case is referenced by
properties.FIELD. For example, the field
delivery_date is referenced by
properties.delivery_date in an Excel config file.
Nested Fields: A field nested more deeply within a form or case may be referenced by separating each level of nesting with a period. For example in demo-deliveries.xlsx you will see the nested field
form.delivery_information.delivery_type which will pull out the value the delivery_type question inside the group delivery_information. This will always match the structure of your application. As of the time this document was written, you can find the right name by looking at the column name in a regular form/case export from CommCareHQ.
(Power Users / Programmers Only) To find out exactly what fields are available, you must view the raw output of the CommCareHQ Case List API and Form List API. You can then reference these fields using an arbitrary JSONPath into the structure.