# PDF Conversion API

Developer Portal : <https://api.market/store/magicapi/pdf-extract>

### About

Designed for developers and businesses alike, this powerful tool simplifies the extraction of valuable information from PDF documents. With just a few lines of code, you can harness the full potential of our API to convert PDF files into easily accessible text or HTML formats.

Using our API is a breeze. Simply integrate it into your application or workflow, and you're ready to go. Whether you're working with URLs or file uploads, our endpoints make it effortless to convert PDFs on the fly. Need to extract text from a PDF hosted online? Use the '/pdf-to-text-url/' endpoint. Want to convert a PDF file stored locally? The '/pdf-to-text-file/' endpoint has you covered. Similarly, if you're looking to generate HTML from PDFs, our '/pdf-to-html-url/' and '/pdf-to-html-file/' endpoints are at your service.

But why use our PDF Conversion API? The answer lies in its unparalleled efficiency and versatility. By automating the conversion process, our API saves you valuable time and resources. Say goodbye to manual extraction tasks and hello to streamlined document processing. Whether you're building content analysis tools, enhancing search functionality, or creating dynamic web pages, our API empowers you to unlock the full potential of your PDF documents.

And what about the output? When you convert a PDF to text, our API delivers plain text content in structured JSON format, making it easy to parse, analyse, or integrate into your applications. On the other hand, converting to HTML preserves the document's formatting, structure, and styling, allowing for seamless web publishing or content repurposing. With support for both text and HTML output formats, our API ensures flexibility and compatibility with a wide range of use cases.

### Curl Request and Response :

#### For `/pdf-to-html-url/` endpoint, the data would be :

#### Request :&#x20;

{% tabs %}
{% tab title="CURL" %}

```bash
curl -X 'POST' \
  'https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-html-url/' \
  -H 'accept: text/html' \
  -H 'x-api-market-key: API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "pdf_url": "https://www.cedarville.edu/-/media/Files/PDF/Web-Development-Services/SamplePDF.pdf?la=en&hash=1B9D390C8225C1DDE2155F786C6515A3CEF9D4EC"
}'
```

{% endtab %}

{% tab title="NodeJS" %}

```bash
const fetch = require('node-fetch');

let url = 'https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-html-url/';

let options = {
  method: 'POST',
  headers: {'x-api-market-key': 'SOME_STRING_VALUE', 'content-type': 'application/json'},
  body: '{"pdf_url":"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf"}'
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
```

{% endtab %}

{% tab title="Python" %}

```bash
import http.client

conn = http.client.HTTPSConnection("api.magicapi.dev")

payload = "{\"pdf_url\":\"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf\"}"

headers = {
    'x-api-market-key': "SOME_STRING_VALUE",
    'content-type': "application/json"
    }

conn.request("POST", "/api/v1/magicapi/pdf-extract/pdf-to-html-url/", payload, headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))
```

{% endtab %}

{% tab title="Java" %}

```bash
OkHttpClient client = new OkHttpClient();

MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\"pdf_url\":\"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf\"}");
Request request = new Request.Builder()
  .url("https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-html-url/")
  .post(body)
  .addHeader("x-api-market-key", "SOME_STRING_VALUE")
  .addHeader("content-type", "application/json")
  .build();

Response response = client.newCall(request).execute();
```

{% endtab %}
{% endtabs %}

#### Response :

{% code overflow="wrap" %}

```json
{
  "html": "<html><head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n</head><body>\n<span style=\"position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;\"></span>\n<div style=\"position:absolute; top:50px;\"><a name=\"1\">Page 1</a></div>\n<span style=\"position:absolute; border: black 1px solid; left:0px; top:50px; width:612px; height:792px;\"></span>\n<span style=\"position:absolute; border: black 1px solid; left:0px; top:50px; width:612px; height:792px;\"></span>\n<span style=\"position:absolute; border: black 1px solid; left:0px; top:50px; width:612px; height:792px;\"></span>\n<span style=\"position:absolute; border: black 1px solid; left:0px; top:122px; width:612px; height:720px;\"></span>\n<span style=\"font-family: ArialMT; font-size:11px\">Sample PDF  This is a sample PDF file that I will use to test the Sitecore publishing. "
}
```

{% endcode %}

#### For `/pdf-to-text-url/` endpoint , the data would be :

#### Request :

{% tabs %}
{% tab title="CURL" %}

```bash
curl -X 'POST' \
  'https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-text-url/' \
  -H 'accept: application/json' \
  -H 'x-api-market-key: clu2mnx050001l40cyq5h09t8' \
  -H 'Content-Type: application/json' \
  -d '{
  "pdf_url": "https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf"
}'
```

{% endtab %}

{% tab title="NodeJS" %}

```bash
const fetch = require('node-fetch');

let url = 'https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-text-url/';

let options = {
  method: 'POST',
  headers: {'x-api-market-key': 'SOME_STRING_VALUE', 'content-type': 'application/json'},
  body: '{"pdf_url":"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf"}'
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));
```

{% endtab %}

{% tab title="Python" %}

```bash
import http.client

conn = http.client.HTTPSConnection("api.magicapi.dev")

payload = "{\"pdf_url\":\"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf\"}"

headers = {
    'x-api-market-key': "SOME_STRING_VALUE",
    'content-type': "application/json"
    }

conn.request("POST", "/api/v1/magicapi/pdf-extract/pdf-to-text-url/", payload, headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))
```

{% endtab %}

{% tab title="Java" %}

```bash
OkHttpClient client = new OkHttpClient();

MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\"pdf_url\":\"https://www.sbs.ox.ac.uk/sites/default/files/2019-01/cv-template.pdf\"}");
Request request = new Request.Builder()
  .url("https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-text-url/")
  .post(body)
  .addHeader("x-api-market-key", "SOME_STRING_VALUE")
  .addHeader("content-type", "application/json")
  .build();

Response response = client.newCall(request).execute();
```

{% endtab %}
{% endtabs %}

#### Response :&#x20;

{% code overflow="wrap" %}

```json
{
  "text": "Sample PDF\n \n \nThis is a sample PDF file that I will use to test the Sitecore publishing.\n "
}
```

{% endcode %}

#### For `/pdf-to-text-file/` endpoint, the data would be :&#x20;

```bash
curl -X 'POST' \
  'https://prod.api.market/api/v1/magicapi/pdf-extract/pdf-to-text-file/' \
  -H 'accept: application/json' \
  -H 'x-api-market-key: API-KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'pdf_file=@TestPDFfile.pdf;type=application/pdf'
```

#### Response :

{% code overflow="wrap" %}

```json
{
    "text": "This is a test PDF file "
}
```

{% endcode %}

#### For `/pdf-to-html-file/` endpoint, the data would be :

#### CURL :&#x20;

```bash
curl -X 'POST' \
  'https://prod.aoi.market/api/v1/magicapi/pdf-extract/pdf-to-html-file/' \
  -H 'accept: application/json' \
  -H 'x-api-market-key: API-KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'pdf_file=@TestPDFfile.pdf;type=application/pdf'
```

#### Response :&#x20;

{% code overflow="wrap" %}

```json
{
    "html": "<html><head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n</head><body>\n<span style=\"position:absolute; border: gray 1px solid; left:0px; top:50px; width:612px; height:792px;\"></span>\n<div style=\"position:absolute; top:50px;\"><a name=\"1\">Page 1</a></div>\n<span style=\"font-family: Calibri; font-size:11px\">This is a test PDF file "
}
```

{% endcode %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.api.market/api-product-docs/magicapi/pdf-conversion-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
