Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong document type error #95

Open
tacman opened this issue Nov 15, 2024 · 1 comment
Open

wrong document type error #95

tacman opened this issue Nov 15, 2024 · 1 comment

Comments

@tacman
Copy link

tacman commented Nov 15, 2024

I'm trying to run against the link https://www.privatdozent.co/p/the-battle-line-at-louvain-1914, and getting an error.

15:13:45 INFO      [graby] Opengraph "article:" data: [] ["ogData" => []]
15:13:45 INFO      [graby] JSON-LD data: ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]] ["JsonLdData" => ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]]]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] author matched from JsonLd: Jørgen Veisdal ["author" => "Jørgen Veisdal"]
15:13:45 INFO      [graby] title matched from JsonLd: {The Battle Line at Louvain (1914)} ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] Trying //meta[@property="og:title"]/@content for title ["pattern" => "//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] title matched: The Battle Line at Louvain (1914) ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] ...XPath match: {pattern} ["pattern","//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] Trying //meta[@property="article:published_time"]/@content for date ["pattern" => "//meta[@property="article:published_time"]/@content"]
15:13:45 INFO      [graby] Trying //html[@lang]/@lang for language ["pattern" => "//html[@lang]/@lang"]
15:13:45 INFO      [graby] Trying //meta[@name="DC.language"]/@content for language ["pattern" => "//meta[@name="DC.language"]/@content"]
15:13:45 INFO      [graby] Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element ["pattern" => "//*[contains(@class, 'google-dfp-ad-wrapper')]"]
15:13:45 INFO      [graby] Trying //iframe/@srcdoc to strip element ["pattern" => "//iframe/@srcdoc"]
15:13:45 INFO      [graby] Trying sharedaddy to strip element ["string" => "sharedaddy"]
15:13:45 INFO      [graby] Trying i-amphtml-replaced-content to strip element ["string" => "i-amphtml-replaced-content"]
15:13:45 INFO      [graby] Using Readability

In Readability.php line 268:
                        
  [DOMException (4)]    
  Wrong Document Error  
                        

Exception trace:
  at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 DOMNode->appendChild() at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 Readability\Readability->init() at /home/tac/g/tacman/graby/src/Extractor/ContentExtractor.php:484
 Graby\Extractor\ContentExtractor->process() at /home/tac/g/tacman/graby/src/Graby.php:352
 Graby\Graby->doFetchContent() at /home/tac/g/tacman/graby/src/Graby.php:177
 Graby\Graby->fetchContent() at /home/tac/g/sites/feeds/src/Parser/Internal.php:25
 App\Parser\Internal->parse() at /home/tac/g/sites/feeds/src/Content/Extractor.php:117
 App\Content\Extractor->parseContent() at /home/tac/g/sites/feeds/src/Content/Import.php:97
 App\Content\Import->process() at /home/tac/g/sites/feeds/src/Command/FetchItemsCommand.php:155
 App\Command\FetchItemsCommand->execute() at /home/tac/g/sites/feeds/vendor/symfony/console/Command/Command.php:279
 Symfony\Component\Console\Command\Command->run() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:1094
 Symfony\Component\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:123
 Symfony\Bundle\FrameworkBundle\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:342
 Symfony\Component\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:77
 Symfony\Bundle\FrameworkBundle\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:193
 Symfony\Component\Console\Application->run() at /home/tac/g/sites/feeds/vendor/symfony/runtime/Runner/Symfony/ConsoleApplicationRunner.php:49
 Symfony\Component\Runtime\Runner\Symfony\ConsoleApplicationRunner->run() at /home/tac/g/sites/feeds/vendor/autoload_runtime.php:29
 require_once() at /home/tac/g/sites/feeds/c:11

feed:fetch-items [--slug [SLUG]] [--use_queue] [--] [<age>]

This is graby, calling this library, but I'm stuck and don't really understand DOM manipulation in PHP.

I'm running PHP 8.3, and I'm wondering it it's stricter about adding dom elements.

@tacman
Copy link
Author

tacman commented Nov 15, 2024

I made some progress by following https://stackoverflow.com/questions/1759137/domelement-cloning-and-appending-wrong-document-error

I'm not sure what I'm doing, though.

            $node = $this->body->ownerDocument->importNode($overlay, true);
            $this->body->appendChild($node);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant