How to Create a Pentaho Report Using the REST Client
Prerequisites
- Pentaho Kettle (see earlier blog post)
- Pentaho Report Designer
- You will get an error message in Pentaho Report Designer that says 'java.lang.NoClassDefFoundErr: com/sun/jersey/api/client/ClientConfig' if you try to run a report against the Kettle job. To fix this error copy the following files from C:\pentaho\data-integration\libext\jersey\ to C:\pentaho\prd-ce-3.9.1-GA\report-designer\lib\. If you plan to ublish the report to the BI Server, also copy these files to C:\pentaho\biserver-ce\tomcat\webapps\pentaho\WEB-INF\lib\ as well -
- jersey-apache-client-1.7.jar
- jersey-client-1.7.jar
- jersey-core-1.7.jar
The Payload
- My example returns an array of objects
[
{
"id":"1",
"name":"item1",
"description":"someDescription1"
},
{
"id":"2",
"name":"item2",
"description":"someDescription2"
},
{
"id":"3",
"name":"item3",
"description":"someDescription3"
}
]
Set Up the ETL
- I used four widgets in this eample but you'll need at least the first three to set it up. You can search for them by name in the Spoon workbench
- Hook them all up together. When you click on the first widget, you'll get a drop-down menu. Choose the rightmost item in the dropdown and then click on the next widget to make it the target (and so on until they're all connected)
Generate Rows
- Even though the REST Client widget has a URL field, it doesn't seem to work and you will need the 'Generate Rows' to pass in the URL. You'll probably need to use one anyway if you're building an industrial-strength application
- You only have three fields to fill in -
- Name: We'll use 'Url'. You can use any arbitrary string, but this field name will be passed on to downstream widgets
- Type: 'String', naturally
- Value: the REST address URL
- That is all you need for this step
REST Client
- The 'General' Tab:
- Leave the URL empty. It doesn't work as far as I can tell
- Select 'Accept URL from field name'
- URL field name: This is the field name you created in the 'Generate Rows' widget, so 'Url'
- Accept the default values for the remaining fields
- The Authentication tab is self explanatory
- Select 'premptive' if you want your credentials sent with every request
- The Headers tab has just the one field that we created earlier - 'Url'
- We're done with this widget
Json Input
- We're only interested in the 'File' and 'Fields' tabs
- 'File' tab:
- Make sure 'Source is defined in a field' is selected
- For 'Get source from field', make sure 'result' is selected
- In the 'Fields' tab you can define the fields you are interested in. We're interested in all three fields in my example so I fill in their -
- Name: Arbitrary
- Path: If you aren't sure about this, this site might be able to help - jsonpath.curiousconcept.com/
- Type: the data type
- That wraps it up fo the Json Input
Select Values
- You don't really need this. I used it in this example to get rid of a couple of fields that I will never use in my report - the 'Url' and 'result' fields
- Now preview your workflow. You should see the results of your query -
- Close the preview window and save your workflow. This will create a Kettle (.ktr) file. You'll need this for the next step
Creating the Report
- Fire up Pentaho Report Designer and select the Report Wizard
- Pick a template and click 'Next'
- You should be presented with the 'Data Source' window. Click on the '+' on the top right corner to add a new datasource
- Scroll down to the bottom and select 'Pentaho Data Integration' from the list
- In the next pop-up window select the green '+' icon to create the new data source
- Name: Query 1 is the default name but you can change that
- Browse to the .ktr file you created in Spoon
- You must select 'Preview' to make sure that the Report Designer can run the workflow or the Wizard won't let you proceed to the next step. The preview should much like the preview in Spoon
- Click 'OK' and then 'Next' which will bring you to the layout step where you pick the fields you want. Select the fields you want in the lower window and click 'Finish'
Comments
Empty row generator step can be used, with URL getting passed into the Client step.
Thus, we can pass variables into the URL field of Client step.
{ result = [
{
"id":"1",
"name":"item1",
"description":"someDescription1"
},
{
"id":"2",
"name":"item2",
"description":"someDescription2"
},
{
"id":"3",
"name":"item3",
"description":"someDescription3"
}
]}