README.md 2.78 KB
Newer Older
Bostjan Skufca committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
# PHP Brute Force Sitemap Generator

Generate sitemaps by crawling your website for static pages and using hooks for
dynamic content. Intermediate sitemap URI list is stored in relative format and
served dynamically by generating final documents where relative URIs are
prefixed with configured base URI, resulting in final document that contains
full URIs.

Features:
* crawl your website for static content and generate list of URIs;
* seed crawler with existing URI list, or add URIs manually and recrawl
    to avoid missed content in the future;
* store URIs in relative format;
* when serving sitemaps, convert relative URIs to absolute form with
    configured prefix URI.



## Target users

Sitemaps are generally best served as accurate as possible, and this means that
your application needs to have infrastructure prepared for enumerating all
content it serves. Many applications do not support this, or support it only
partially.

Users that are stuck using such applications and who have to provide sitemaps
are usually left with the option of pre-generating sitemaps using public web
crawlers. This results in inaccurate and stale sitemaps.

This is where Brute Force Sitemap Generator (BFSG) steps in.



## Modes of operation

Definition of terms:
* **base URI**: URI under which sitemap will reside, i.e. https://example.com/ (without trailing "sitemap.*)
* **transData**: It stands for "transitional data" and represents sitemap data that do not
    contain absolute URIs. Absotule URIs are generated at the very last stage,
    where HTTP request for sitemap triggers generation of final sitemap by prefixing
    all relative URIs with base URI prefix which is obtained dynamically.

BFSG implements the following operations:
* create transData by crawling existing website
** crawling may be seeded by base URI only
** may be seeded by existing transData (list of URIs that were previously encountered)
* augment transData generated by crawler by using callback (for dynamically generated pages)
* using transData cache to generate and output final sitemap.(xml|txt)(.gz)?

BFSG can be glued to your application in the following ways:
1. add BFSG to your project as git submodule:
** you need to create sitemap-glue.php file that returns needed configuration details from your project
** sitemap-glue.php must reside on the same path level as main BFSG directory (just outside of BFSG source tree)
** (reasoning for this is that you will want to commit your glue code to your project repository instead to BFSG's git repo)
1. install BFSG with composer - TODO
1. Symfony: add BFSG as bundle - TODO



## License

BFSG is released under MIT license. See LICENSE file at the root of repository for
additional info.



## Credits

Brute Force Sitemap Generator was created and is maintained by Bostjan Skufca & Teon d.o.o company.