Pipfeed's Extract API Developer Documentation

Extract data from online news & articles. Get full metadata with content, images, authors, summary, category, keywords, topics, and more.

Developer Portal: https://api.market/store/pipfeed/parse

Replit : https://replit.com/@hello737/Pipfeed-article-extract

Github Repo : https://github.com/imshashank/pipfeed-article-extract-demo

Extract data from online news & articles. Get full metadata with content, images, authors, summary, category, keywords, topics, and more.

Use Cases of Pipfeed API

Avoid months of development time building your custom URL extractor

We have spent months fine-tuning each small part of our article extract API so you don't have to. One API can help you achieve various use cases. These are some of the use-cases our customers Pipfeed's API help enable:

Add "Reader Mode" to your apps.
Send "summarized" content to your readers.
Use parsed data in training AI models.
Allow readers to download the content and read it offline.

Endpoint

URL: https://api.magicapi.dev/api/v1/pipfeed/parse/extract
Method: POST

Headers:

x-magicapi-key: Your unique API key.

Note: Do not share this key publicly.

Content-Type: Must be set to application/json.

Authentication

To use the Pipfeed's Extract API, you must include your API key in the request header.

Header Name: x-magicapi-key Value: your-api-key

If you just want to test then sign up for the "FREE" plan. This plan allows you to send 1000 API requests per month at no charge or credit card.

Request Payload

url (required): The URL of the article you wish to extract data from.

{
  "url": "https://www.wsj.com/tech/the-underground-network-sneaking-nvidia-chips-into-china-f733aaa6?mod=hp_lead_pos3"
}

Response Data

The API will return the following data:

url: The original URL of the article.
title: The title of the article.
author: The author's name.
html: The article's HTML content.
text: The plaintext version of the article.
length: The number of characters in the text content.
description: A brief description or sub-title of the article.
siteName: The website or source name.
topImage: The URL of the main image of the article.
date: The published date of the article.
keywords: Keywords related to the article.
summary: An array of summarized points or sentences from the article.
sentiment: A sentiment analysis of the article's content.

Sentiment Analysis

The sentiment analysis object contains:

score: A numerical score representing the overall sentiment. Positive numbers indicate positive sentiment, and negative numbers indicate negative sentiment.
comparative: A normalized score by length.
calculation: An array of words and their corresponding sentiment scores.
positive: An array of positive words found in the article.
negative: An array of negative words found in the article.

Curl and Response

Example Request

curl -X 'POST' \
  'https://api.magicapi.dev/api/v1/pipfeed/parse/extract' \
  -H 'accept: application/json' \
  -H 'x-magicapi-key: API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "url": "https://techcrunch.com/2022/04/18/web-scraping-legal-court/"
}'

const fetch = require('node-fetch');

let url = 'https://api.magicapi.dev/api/v1/pipfeed/parse/extract';

let options = {
  method: 'POST',
  headers: {'x-magicapi-key': 'SOME_STRING_VALUE', 'content-type': 'application/json'},
  body: '{"url":"https://techcrunch.com/2022/04/18/web-scraping-legal-court/"}'
};

fetch(url, options)
  .then(res => res.json())
  .then(json => console.log(json))
  .catch(err => console.error('error:' + err));

import http.client

conn = http.client.HTTPSConnection("api.magicapi.dev")

payload = "{\"url\":\"https://techcrunch.com/2022/04/18/web-scraping-legal-court/\"}"

headers = {
    'x-magicapi-key': "SOME_STRING_VALUE",
    'content-type': "application/json"
    }

conn.request("POST", "/api/v1/pipfeed/parse/extract", payload, headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

OkHttpClient client = new OkHttpClient();

MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\"url\":\"https://techcrunch.com/2022/04/18/web-scraping-legal-court/\"}");
Request request = new Request.Builder()
  .url("https://api.magicapi.dev/api/v1/pipfeed/parse/extract")
  .post(body)
  .addHeader("x-magicapi-key", "SOME_STRING_VALUE")
  .addHeader("content-type", "application/json")
  .build();

Response response = client.newCall(request).execute();

Example Response

{
  "url": "https://techcrunch.com/2022/04/18/web-scraping-legal-court/",
  "title": "Web scraping is legal, US appeals court reaffirms | TechCrunch",
  "author": "Zack Whittaker",
  "html": "<div class=\"page\" id=\"readability-page-1\"><div><p id=\"speakable-summary\">Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.</p>\n<p>The landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from web scraping personal information from users’ public profiles. The case <a href=\"https://techcrunch.com/2021/06/14/supreme-court-revives-linkedin-bid-to-protect-user-data-from-web-scrapers/\">reached the U.S. Supreme Court</a> last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.</p>\n\t\n\t\n\n\n<p>In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of <a href=\"https://techcrunch.com/2020/11/29/supreme-court-van-buren-hacking/\">the Computer Fraud and Abuse Act</a>, or CFAA, which governs what constitutes computer hacking under U.S. law.</p>\n<p>The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.</p>\n\t\n\t\n\n\n<p>But there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting <a href=\"https://techcrunch.com/2021/04/21/data-brokers-bill-wyden-paul-privacy-clearview/\">several tech giants to file lawsuits</a> against the startup. Several companies, including <a href=\"https://techcrunch.com/2019/09/04/facebook-phone-numbers-exposed/\">Facebook</a>, Instagram, <a href=\"https://techcrunch.com/2021/01/11/scraped-parler-data-is-a-metadata-goldmine/\">Parler</a>, <a href=\"https://techcrunch.com/2019/06/16/millions-venmo-transactions-scraped/\">Venmo</a>&nbsp;and Clubhouse have all had users’ data scraped over the years.</p>\n<p>The case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiq’s mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA. LinkedIn first lost <a href=\"https://techcrunch.com/2016/08/15/linkedin-sues-scrapers/\">the case against Hiq</a> in 2019 after the Ninth Circuit found that the CFAA does not bar anyone from scraping data that’s publicly accessible.</p>\n<p>On its second pass of the case, the Ninth Circuit said it relied on <a href=\"https://techcrunch.com/2021/06/03/supreme-court-hacking-cfaa-ruling/\">a Supreme Court decision</a> last June, during which the U.S. top court took its first look at the decades-old CFAA. In its ruling, the Supreme Court narrowed what constitutes a violation of the CFAA as those who gain unauthorized access to a computer system — rather than a broader interpretation of exceeding existing authorization, which the court argued could have attached criminal penalties to “a breathtaking amount of commonplace computer activity.” Using a “gate-up, gate-down” analogy, the Supreme Court said that when a computer or website’s gates are up — and therefore information is publicly accessible — no authorization is required.</p>\n\t\n\t\n\n\n\n<p>The Ninth Circuit, in referencing the Supreme Court’s “gate-up, gate-down” analogy, ruled that “the concept of ‘without authorization’ does not apply to public websites.”</p>\n\t\n\t\n\n\n<p>“We’re disappointed in the court’s decision. This is a preliminary ruling and the case is far from over,” said LinkedIn spokesperson Greg Snapper in a statement. “We will continue to fight to protect our members’ ability to control the information they make available on LinkedIn. When your data is taken without permission and used in ways you haven’t agreed to, that’s not okay. On LinkedIn, our members trust us with their information, which is why we prohibit unauthorized scraping on our platform.”</p>\n<blockquote data-secret=\"ILEgjf6kYo\"><p><a href=\"https://techcrunch.com/2021/06/03/supreme-court-hacking-cfaa-ruling/\">Supreme Court limits US hacking law in landmark CFAA ruling</a></p></blockquote>\n\n</div></div>",
  "text": "Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.\nThe landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from web scraping personal information from users’ public profiles. The case reached the U.S. Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.\n\t\n\t\n\n\nIn its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.\nThe Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.\n\t\n\t\n\n\nBut there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users’ data scraped over the years.\nThe case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiq’s mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA. LinkedIn first lost the case against Hiq in 2019 after the Ninth Circuit found that the CFAA does not bar anyone from scraping data that’s publicly accessible.\nOn its second pass of the case, the Ninth Circuit said it relied on a Supreme Court decision last June, during which the U.S. top court took its first look at the decades-old CFAA. In its ruling, the Supreme Court narrowed what constitutes a violation of the CFAA as those who gain unauthorized access to a computer system — rather than a broader interpretation of exceeding existing authorization, which the court argued could have attached criminal penalties to “a breathtaking amount of commonplace computer activity.” Using a “gate-up, gate-down” analogy, the Supreme Court said that when a computer or website’s gates are up — and therefore information is publicly accessible — no authorization is required.\n\t\n\t\n\n\n\nThe Ninth Circuit, in referencing the Supreme Court’s “gate-up, gate-down” analogy, ruled that “the concept of ‘without authorization’ does not apply to public websites.”\n\t\n\t\n\n\n“We’re disappointed in the court’s decision. This is a preliminary ruling and the case is far from over,” said LinkedIn spokesperson Greg Snapper in a statement. “We will continue to fight to protect our members’ ability to control the information they make available on LinkedIn. When your data is taken without permission and used in ways you haven’t agreed to, that’s not okay. On LinkedIn, our members trust us with their information, which is why we prohibit unauthorized scraping on our platform.”\nSupreme Court limits US hacking law in landmark CFAA ruling\n\n",
  "length": 3533,
  "description": "The landmark web scraping case was bounced back to the Ninth Circuit by the U.S. Supreme Court.",
  "siteName": "TechCrunch",
  "topImage": "https://techcrunch.com/wp-content/uploads/2022/04/GettyImages-1303427084-reworked.jpg",
  "date": "2022-04-18T19:16:57+00:00",
  "keywords": "",
  "summary": [
    "The landmark ruling by the U.S.",
    "The case reached the U.S.",
    "Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.",
    "But there have been egregious cases of web scraping that have sparked privacy and security concerns.",
    "Supreme Court limits US hacking law in landmark CFAA ruling"
  ],
  "sentiment": {
    "score": 31,
    "comparative": 0.05486725663716814,
    "calculation": [
      {
        "landmark": 2
      },
      {
        "limits": -1
      },
      {
        "supreme": 4
      },
      {
        "prohibit": -1
      },
      {
        "trust": 1
      },
      {
        "agreed": 1
      },
      {
        "ability": 2
      },
      {
        "protect": 1
      },
      {
        "fight": -1
      },
      {
        "disappointed": -2
      },
      {
        "supreme": 4
      },
      {
        "no": -1
      },
      {
        "accessible": 1
      },
      {
        "supreme": 4
      },
      {
        "breathtaking": 5
      },
      {
        "criminal": -3
      },
      {
        "gain": 2
      },
      {
        "violation": -2
      },
      {
        "supreme": 4
      },
      {
        "top": 2
      },
      {
        "supreme": 4
      },
      {
        "accessible": 1
      },
      {
        "lost": -3
      },
      {
        "violation": -2
      },
      {
        "lawsuits": -2
      },
      {
        "recognition": 2
      },
      {
        "legal": 1
      },
      {
        "accessible": 1
      },
      {
        "no": -1
      },
      {
        "accessible": 1
      },
      {
        "win": 4
      },
      {
        "abuse": -3
      },
      {
        "fraud": -4
      },
      {
        "violation": -2
      },
      {
        "accessible": 1
      },
      {
        "supreme": 4
      },
      {
        "reached": 1
      },
      {
        "stopping": -1
      },
      {
        "battle": -1
      },
      {
        "legal": 1
      },
      {
        "landmark": 2
      },
      {
        "legal": 1
      },
      {
        "accessible": 1
      },
      {
        "good": 3
      }
    ],
    "postive": [
      "landmark",
      "supreme",
      "trust",
      "agreed",
      "ability",
      "protect",
      "supreme",
      "accessible",
      "supreme",
      "breathtaking",
      "gain",
      "supreme",
      "top",
      "supreme",
      "accessible",
      "recognition",
      "legal",
      "accessible",
      "accessible",
      "win",
      "accessible",
      "supreme",
      "reached",
      "legal",
      "landmark",
      "legal",
      "accessible",
      "good"
    ],
    "negative": [
      "limits",
      "prohibit",
      "fight",
      "disappointed",
      "no",
      "criminal",
      "violation",
      "lost",
      "violation",
      "lawsuits",
      "no",
      "abuse",
      "fraud",
      "violation",
      "stopping",
      "battle"
    ]
  }
}

Considerations

Always keep your API.market key confidential.
Respect the terms of use for the websites you are extracting information from.
Ensure you handle the data you obtain responsibly and ethically.

By utilizing this API, you can quickly gather rich data from articles and further analyze or display the extracted information as per your application's needs.

Try out this API for 1000 FREE calls here at https://api.market/store/pipfeed/parse

PreviousCrunchbase API NextMigrating from Capix FaceSwap API to magicapi/faceswap-capix API

Last updated 1 year ago