# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't parse or are not likely to be relevant
# if you want to crawl images or videos or archives then you should comment out this line
-(?i)\.(gif|jpg|png|ico|css|sit|eps|wmf|zip|gz|rpm|tgz|mov|exe|jpeg|bmp|js|mpg|mp3|mp4)(\?|&|$)

# skip URLs with slash-delimited segment that repeats 3+ times, to break loops
# very time-consuming : use only if necessary
# -.*(/[^/]+)/[^/]+\1/[^/]+\1/

# accept anything else
+.