Extract data from online news & articles. Get full metadata with content, images, authors, summary, category, keywords, topics, and more.
Use Cases of Pipfeed API
Avoid months of development time building your custom URL extractor
We have spent months fine-tuning each small part of our article extract API so you don't have to. One API can help you achieve various use cases. These are some of the use-cases our customers Pipfeed's API help enable:
Add "Reader Mode" to your apps.
Send "summarized" content to your readers.
Use parsed data in training AI models.
Allow readers to download the content and read it offline.
OkHttpClientclient=newOkHttpClient();MediaTypemediaType=MediaType.parse("application/json");RequestBody body = RequestBody.create(mediaType, "{\"url\":\"https://techcrunch.com/2022/04/18/web-scraping-legal-court/\"}");
Requestrequest=newRequest.Builder().url("https://api.magicapi.dev/api/v1/pipfeed/parse/extract").post(body).addHeader("x-magicapi-key","SOME_STRING_VALUE").addHeader("content-type","application/json").build();Responseresponse=client.newCall(request).execute();
Example Response
{"url":"https://techcrunch.com/2022/04/18/web-scraping-legal-court/","title":"Web scraping is legal, US appeals court reaffirms | TechCrunch","author":"Zack Whittaker", "html": "<div class=\"page\" id=\"readability-page-1\"><div><p id=\"speakable-summary\">Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.</p>\n<p>The landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from web scraping personal information from usersβ public profiles. The case <a href=\"https://techcrunch.com/2021/06/14/supreme-court-revives-linkedin-bid-to-protect-user-data-from-web-scrapers/\">reached the U.S. Supreme Court</a> last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.</p>\n\t\n\t\n\n\n<p>In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of <a href=\"https://techcrunch.com/2020/11/29/supreme-court-van-buren-hacking/\">the Computer Fraud and Abuse Act</a>, or CFAA, which governs what constitutes computer hacking under U.S. law.</p>\n<p>The Ninth Circuitβs decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.</p>\n\t\n\t\n\n\n<p>But there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting <a href=\"https://techcrunch.com/2021/04/21/data-brokers-bill-wyden-paul-privacy-clearview/\">several tech giants to file lawsuits</a> against the startup. Several companies, including <a href=\"https://techcrunch.com/2019/09/04/facebook-phone-numbers-exposed/\">Facebook</a>, Instagram, <a href=\"https://techcrunch.com/2021/01/11/scraped-parler-data-is-a-metadata-goldmine/\">Parler</a>, <a href=\"https://techcrunch.com/2019/06/16/millions-venmo-transactions-scraped/\">Venmo</a> and Clubhouse have all had usersβ data scraped over the years.</p>\n<p>The case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiqβs mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA. LinkedIn first lost <a href=\"https://techcrunch.com/2016/08/15/linkedin-sues-scrapers/\">the case against Hiq</a> in 2019 after the Ninth Circuit found that the CFAA does not bar anyone from scraping data thatβs publicly accessible.</p>\n<p>On its second pass of the case, the Ninth Circuit said it relied on <a href=\"https://techcrunch.com/2021/06/03/supreme-court-hacking-cfaa-ruling/\">a Supreme Court decision</a> last June, during which the U.S. top court took its first look at the decades-old CFAA. In its ruling, the Supreme Court narrowed what constitutes a violation of the CFAA as those who gain unauthorized access to a computer system β rather than a broader interpretation of exceeding existing authorization, which the court argued could have attached criminal penalties to βa breathtaking amount of commonplace computer activity.β Using a βgate-up, gate-downβ analogy, the Supreme Court said that when a computer or websiteβs gates are up β and therefore information is publicly accessible β no authorization is required.</p>\n\t\n\t\n\n\n\n<p>The Ninth Circuit, in referencing the Supreme Courtβs βgate-up, gate-downβ analogy, ruled that βthe concept of βwithout authorizationβ does not apply to public websites.β</p>\n\t\n\t\n\n\n<p>βWeβre disappointed in the courtβs decision. This is a preliminary ruling and the case is far from over,β said LinkedIn spokesperson Greg Snapper in a statement. βWe will continue to fight to protect our membersβ ability to control the information they make available on LinkedIn. When your data is taken without permission and used in ways you havenβt agreed to, thatβs not okay. On LinkedIn, our members trust us with their information, which is why we prohibit unauthorized scraping on our platform.β</p>\n<blockquote data-secret=\"ILEgjf6kYo\"><p><a href=\"https://techcrunch.com/2021/06/03/supreme-court-hacking-cfaa-ruling/\">Supreme Court limits US hacking law in landmark CFAA ruling</a></p></blockquote>\n\n</div></div>",
"text": "Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.\nThe landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from web scraping personal information from usersβ public profiles. The case reached the U.S. Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.\n\t\n\t\n\n\nIn its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.\nThe Ninth Circuitβs decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.\n\t\n\t\n\n\nBut there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had usersβ data scraped over the years.\nThe case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiqβs mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA. LinkedIn first lost the case against Hiq in 2019 after the Ninth Circuit found that the CFAA does not bar anyone from scraping data thatβs publicly accessible.\nOn its second pass of the case, the Ninth Circuit said it relied on a Supreme Court decision last June, during which the U.S. top court took its first look at the decades-old CFAA. In its ruling, the Supreme Court narrowed what constitutes a violation of the CFAA as those who gain unauthorized access to a computer system β rather than a broader interpretation of exceeding existing authorization, which the court argued could have attached criminal penalties to βa breathtaking amount of commonplace computer activity.β Using a βgate-up, gate-downβ analogy, the Supreme Court said that when a computer or websiteβs gates are up β and therefore information is publicly accessible β no authorization is required.\n\t\n\t\n\n\n\nThe Ninth Circuit, in referencing the Supreme Courtβs βgate-up, gate-downβ analogy, ruled that βthe concept of βwithout authorizationβ does not apply to public websites.β\n\t\n\t\n\n\nβWeβre disappointed in the courtβs decision. This is a preliminary ruling and the case is far from over,β said LinkedIn spokesperson Greg Snapper in a statement. βWe will continue to fight to protect our membersβ ability to control the information they make available on LinkedIn. When your data is taken without permission and used in ways you havenβt agreed to, thatβs not okay. On LinkedIn, our members trust us with their information, which is why we prohibit unauthorized scraping on our platform.β\nSupreme Court limits US hacking law in landmark CFAA ruling\n\n",
"length":3533,"description":"The landmark web scraping case was bounced back to the Ninth Circuit by the U.S. Supreme Court.","siteName":"TechCrunch","topImage":"https://techcrunch.com/wp-content/uploads/2022/04/GettyImages-1303427084-reworked.jpg","date":"2022-04-18T19:16:57+00:00","keywords":"","summary": ["The landmark ruling by the U.S.","The case reached the U.S.", "Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case.",
"But there have been egregious cases of web scraping that have sparked privacy and security concerns.","Supreme Court limits US hacking law in landmark CFAA ruling" ],"sentiment": {"score":31,"comparative":0.05486725663716814,"calculation": [ {"landmark":2 }, {"limits":-1 }, {"supreme":4 }, {"prohibit":-1 }, {"trust":1 }, {"agreed":1 }, {"ability":2 }, {"protect":1 }, {"fight":-1 }, {"disappointed":-2 }, {"supreme":4 }, {"no":-1 }, {"accessible":1 }, {"supreme":4 }, {"breathtaking":5 }, {"criminal":-3 }, {"gain":2 }, {"violation":-2 }, {"supreme":4 }, {"top":2 }, {"supreme":4 }, {"accessible":1 }, {"lost":-3 }, {"violation":-2 }, {"lawsuits":-2 }, {"recognition":2 }, {"legal":1 }, {"accessible":1 }, {"no":-1 }, {"accessible":1 }, {"win":4 }, {"abuse":-3 }, {"fraud":-4 }, {"violation":-2 }, {"accessible":1 }, {"supreme":4 }, {"reached":1 }, {"stopping":-1 }, {"battle":-1 }, {"legal":1 }, {"landmark":2 }, {"legal":1 }, {"accessible":1 }, {"good":3 } ],"postive": ["landmark","supreme","trust","agreed","ability","protect","supreme","accessible","supreme","breathtaking","gain","supreme","top","supreme","accessible","recognition","legal","accessible","accessible","win","accessible","supreme","reached","legal","landmark","legal","accessible","good" ],"negative": ["limits","prohibit","fight","disappointed","no","criminal","violation","lost","violation","lawsuits","no","abuse","fraud","violation","stopping","battle" ] }}
Considerations
Always keep your API.market key confidential.
Respect the terms of use for the websites you are extracting information from.
Ensure you handle the data you obtain responsibly and ethically.
By utilizing this API, you can quickly gather rich data from articles and further analyze or display the extracted information as per your application's needs.