Skip to main content
Version: 2025-04-10

RikAI2-Extract Prompting

RikAI2-Extract is designed to excel at extraction tasks, particularly where you may need to extract many fields or return information across many pages of a longer document.

Basic Principles

Extract receives the prompt in a JSON schema format and returns the answer in the same JSON schema.

In your JSON, the key is part of your prompt. If your key is precise and clear, Extract may be able to return your intended answer using only the key, or you can include more specific instructions in the value.

Example:

This key with an empty value is preferable:

“patient_name”: “”

Compared to this less preferable, vague key:

“field1”: “Return the patient name”

Using the key for instructions or prompt questions is also incorrect:

"What is the patient's name?": ""

This model performs best when prompting with the JSON schema you ultimately want returned and filled out. There is also specific formatting to follow if you want to use the returnConfidence = true parameter to return confidence scores and bounding boxes.

Our general prompting tips apply to Extract. It is very important to be direct and clear in your prompt or descriptions.

Prompt Structure

The model is prompted with a JSON schema, and the response returned by the model reflects that same prompt schema. You have two options for your schema, based on whether you want to use returnConfidence = true to return field-level confidence scores and bounding boxes.

returnConfidence = false

“Key”: “Value”,

Replace “Key” with the one to three word label you want to use in the return JSON, using a specific name for what you are extracting. If it is extremely straightforward, this may be all you need.

Replace “Value” with more verbose instructions or a question.

Example:

“patient_name”: “”

“patient_name”: “What is the patient's name?”

“patient_name”: “Full patient name”

These should all return the same answer on a straightforward field and form. Leverage the description more precisely for complex requests.

returnConfidence = true

{
"Key": {
"data": "instructions here",
"page_number": 0
}
}

When you want to return bounding boxes and confidence scores, use this structure. Do not change the key “data” or the key value pair “page_number”: 0.

Example:

{
"patient_name": {
"data": "What is the name of the patient?",
"page_number": 0
}
}

You can use a nested JSON up to 2 levels and still use returnConfidence = true.

Example:

{
"patient_info": {
"first_name": {
"data": "Patient first name",
"page_number": 0
},
"last_name": {
"data": "Patient last name",
"page_number": 0
}
}
}

Capabilities

Checkboxes

Extract is equipped to handle checkboxes.

You can specify the options that might be checked, or ask a natural language question about the checkbox.

Example:

"accident_type": ""

"accident_type_2": "Work/Auto/Other"

"accident_type_3": "What option is checked for Type of Accident?"

Tables

Asking for table information can return table columns or information delimited by the “ | “ character.

You can prompt the model to create a JSON using the table information with separate keys and values. Even if you don't know the total rows, you can use a prompt like the example below and the model will create the nested JSON to return the table information requested.

Example:

“Table name”: {
"chemicals": "extract each row of content with the chemical as the key and the rest of the information as the value"
}

Dynamic JSON Schema

Similar to the prompt in the Tables section above, you can prompt the model to create a JSON schema to match the number of times a type of information needs to be extracted, even if that will vary per document.

Example:

"procedure_#": "For every procedure named in this document, return the procedure name as the value number the key accordingly"

This will return a list of procedures:

"procedure_1": "appendectomy",

"procedure_2": "wisdom teeth removal",

"procedure_3": "EKG"

Prompting for a [“list”]

Using ["square brackets"] in the JSON value indicates the answer should take the form of a list.

Example: {“medications”:[“medication_name”]} should return a list of all medications in the response.

Prompting for reasoning tasks

Your key is very important for Extract to understand the goal, as described in your value. You will need to direct the model to infer an answer, rather than extracting details.

“returnConfidence” must be set to False for reasoning tasks; Extract will use most or all of your file to reach its decision, and will not be able to return bounding boxes.

Example:

{"risks_summary":"You are a risk inspector. Identify the property risks in this file. Then, provide your determination as to whether this property is low or high risk, and why."}

Extract will generally provide shorter explanations and summaries than our other RikAI models, but you can still specify a length or verbosity as desired.

Providing instructions applicable to all fields

You can include initial key/value pairs in your prompt with instructions for output, then follow with your list of fields you’re prompting for.

This allows for less verbose and repetitive instructions in every field, and also makes it easier to change one instruction and apply to all fields.

Example:

{"instructions_for_output":
{"definitions":"Include definitions here about terms in your documents or prompts",
"output_question_answer_guidelines":"Use the 'Yes' answer when (xyz). Use the 'No' answer when (xyz). Use the 'N/A' answer for all other cases."}}

Please Note:

  • The response will alphabetize your keys by default, according to JSON best practices.

  • When using returnConfidence = true, if the resulting confidence score is low and your bounding boxes are impossible coordinates, the model will not return an answer. With returnConfidence = false, you may still return an answer.