About this file
This file is for a simple test that the multiple contents of the page (like blog pages) can appropriately be extracted.
This file is for a simple test that the multiple contents of the page (like blog pages) can appropriately be extracted.
The second entry in the blog like page should not be regarded as the main content of the page. Or, if the entires seem to be continuous, the scoring heuristics may regard them as a single content.
You can adjust parameters of the scoreing heuristics.
Comments on the article should not be regarded as a content.
Or, should be?