YaCy Release 1.72

In-between release for linuxtag. This will also the last release
supporting java 1.6

Major Changes   
Jump to: Bugfixes / Other Changes

CommitDescription
Mon May 05 23:16:01 CEST 2014
by Marc Nause
Improved Blacklist API:

*) added JSON support
*) fixed Exception in case of missing parameters
*) renamed parameter for items in "add entry" and "delete entry" from
"entry" to "item" to match term in XML
Changed Files: htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/add_entry_p.json, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/delete_entry_p.json, htroot/api/blacklists/get_list_p.java, htroot/api/blacklists/get_list_p.json, htroot/api/blacklists/get_metadata_p.java, htroot/api/blacklists/get_metadata_p.json
Wed Apr 30 00:48:38 CEST 2014
by Marc Nause
First draft of a blacklist API.
Changed Files: htroot/Blacklist_p.java, htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/add_entry_p.xml, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/delete_entry_p.xml, htroot/api/blacklists/get_list_p.java, htroot/api/blacklists/get_list_p.xml, htroot/api/blacklists/get_metadata_p.java, htroot/api/blacklists/get_metadata_p.xml, htroot/api/blacklists_p.java, source/net/yacy/repository/BlacklistHelper.java
Sun Apr 20 01:41:30 CEST 2014
by reger
 refactore URIMetadataNode to further unify interaction with index
-  URIMetadataNode extending SolrDocument
- use language as stored (String), reducing conversion to string
- optimize debug code in transferIndex
Changed Files: htroot/api/yacydoc.java, htroot/yacy/crawlReceipt.java, htroot/yacy/transferURL.java, source/net/yacy/data/ymark/YMarkMetadata.java, source/net/yacy/kelondro/data/meta/URIMetadataNode.java, source/net/yacy/kelondro/data/word/WordReferenceVars.java, source/net/yacy/peers/Protocol.java, source/net/yacy/repository/Blacklist.java, source/net/yacy/search/query/SearchEvent.java, source/net/yacy/search/ranking/ReferenceOrder.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/ResultEntry.java
Thu Apr 17 13:21:43 CEST 2014
by Michael Peter Christen
added crawl depth for failed documents
Changed Files: htroot/Crawler_p.java, htroot/HostBrowser.java, htroot/yacy/crawlReceipt.java, source/net/yacy/crawler/CrawlStacker.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/FTPLoader.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/ErrorCache.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/snippet/MediaSnippet.java
Wed Apr 16 22:16:20 CEST 2014
by Michael Peter Christen
removed clickdepth_i field and related postprocessing. This information
is now available in the crawldepth_i field which is identical to
clickdepth_i because of a specific crawler strategy.
Changed Files: defaults/solr.collection.schema, defaults/solr.webgraph.schema, defaults/yacy.init, htroot/HostBrowser.java, htroot/RankingSolr_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/cora/federate/solr/ProcessType.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/CollectionSchema.java, source/net/yacy/search/schema/WebgraphConfiguration.java, source/net/yacy/search/schema/WebgraphSchema.java
Wed Apr 16 21:34:28 CEST 2014
by Michael Peter Christen
- added a new Crawler Balancer: HostBalancer and HostQueues:
This organizes all urls to be loaded in separate queues for each host.
Each host separates the crawl depth into it's own queue. The primary
rule for urls taken from any queue is, that the crawl depth is minimal.
This produces a crawl depth which is identical to the clickdepth.
Furthermorem the crawl is able to create a much better balancing over
all hosts which is fair to all hosts that are in the queue.
This process will create a very large number of files for wide crawls in
the QUEUES folder: for each host a directory, for each crawl depth a
file inside the directory. A crawl with maxdepth = 4 will be able to
create 10.000s of files. To be able to use that many file readers, it
was necessary to implement a new index data structure which opens the
file only if an access is wanted (OnDemandOpenFileIndex). The usage of
such on-demand file reader shall prevent that the number of file
pointers is over the system limit, which is usually about 10.000 open
files. Some parts of YaCy had to be adopted to handle the crawl depth
number correctly. The logging and the IndexCreateQueues servlet had to
be adopted to show the crawl queues differently, because the host name
is attached to the port on the host to differentiate between http,
https, and ftp services.
Changed Files: defaults/yacy.logging, htroot/ConfigPortal.java, htroot/Crawler_p.java, htroot/IndexCreateQueues_p.html, htroot/IndexCreateQueues_p.java, source/net/yacy/cora/document/id/DigestURL.java, source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/LegacyBalancer.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/Latency.java, source/net/yacy/crawler/data/NoticedURL.java, source/net/yacy/crawler/retrieval/Response.java, source/net/yacy/document/Document.java, source/net/yacy/document/TextParser.java, source/net/yacy/document/importer/MediawikiImporter.java, source/net/yacy/document/parser/bzipParser.java, source/net/yacy/document/parser/gzipParser.java, source/net/yacy/document/parser/sevenzipParser.java, source/net/yacy/document/parser/tarParser.java, source/net/yacy/document/parser/zipParser.java, source/net/yacy/kelondro/index/OnDemandOpenFileIndex.java, source/net/yacy/kelondro/table/ChunkIterator.java, source/net/yacy/kelondro/table/Table.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/index/DocumentIndex.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/query/QueryModifier.java, source/net/yacy/search/schema/CollectionConfiguration.java
Thu Apr 10 18:58:03 CEST 2014
by Michael Peter Christen
strong redesign of html parser: object recursion is now made using a
stack on html tag objects, not using a recursive parse-again method
which may cause bad performance and huge memory allocation. The new
method also produced better parsed image objects with exact anchor text
references.
Changed Files: source/net/yacy/document/parser/html/AbstractScraper.java, source/net/yacy/document/parser/html/AbstractTransformer.java, source/net/yacy/document/parser/html/ContentScraper.java, source/net/yacy/document/parser/html/ContentTransformer.java, source/net/yacy/document/parser/html/Scraper.java, source/net/yacy/document/parser/html/Transformer.java, source/net/yacy/document/parser/html/TransformerWriter.java, source/net/yacy/document/parser/htmlParser.java, source/net/yacy/document/parser/images/genericImageParser.java, source/net/yacy/search/schema/HyperlinkGraph.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Wed Apr 09 12:45:04 CEST 2014
by Michael Peter Christen
new structure and enhancements for link graph computation:
- added order option to solr queries to be able to retrieve document
lists in specific order, here: link length
- added HyperlinkEdge class which manages the link structure
- integrated the HyperlinkEdge class into clickdepth computation
- extended the linkstructure.json servlet to show also the clickdepth
and other statistic information
Changed Files: htroot/HostBrowser.java, htroot/IndexDeletion_p.java, htroot/api/citation.java, htroot/api/linkstructure.java, htroot/api/linkstructure.json, htroot/js/hypertree.js, source/net/yacy/cora/federate/opensearch/OpenSearchConnector.java, source/net/yacy/cora/federate/solr/SchemaConfiguration.java, source/net/yacy/cora/federate/solr/connector/AbstractSolrConnector.java, source/net/yacy/cora/federate/solr/connector/CachedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/ConcurrentUpdateSolrConnector.java, source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java, source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java, source/net/yacy/cora/federate/solr/connector/SolrConnector.java, source/net/yacy/search/index/ErrorCache.java, source/net/yacy/search/index/Fulltext.java, source/net/yacy/search/index/ReindexSolrBusyThread.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java
Sun Apr 06 10:45:03 CEST 2014
by Michael Peter Christen
replaced solr 4.6.1 with solr 4.7.1 and added index migration to
lucene_47
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, defaults/solr/solrconfig.xml, lib/lucene-analyzers-common-4.7.1.jar, lib/lucene-analyzers-phonetic-4.7.1.jar, lib/lucene-classification-4.7.1.jar, lib/lucene-codecs-4.7.1.jar, lib/lucene-core-4.7.1.jar, lib/lucene-facet-4.7.1.jar, lib/lucene-grouping-4.7.1.jar, lib/lucene-highlighter-4.7.1.jar, lib/lucene-join-4.7.1.jar, lib/lucene-memory-4.7.1.jar, lib/lucene-misc-4.7.1.jar, lib/lucene-queries-4.7.1.jar, lib/lucene-queryparser-4.7.1.jar, lib/lucene-spatial-4.7.1.jar, lib/lucene-suggest-4.7.1.jar, lib/solr-core-4.7.1.jar, lib/solr-solr-4.7.1.License, lib/solr-solrj-4.7.1.License, lib/solr-solrj-4.7.1.jar, lib/spatial4j-0.4.1.jar, source/net/yacy/search/index/Fulltext.java


Bugfixes   
Jump to: YaCy Release 1.72 top / Other Changes

CommitDescription
Wed Apr 30 06:21:53 CEST 2014
by Michael Peter Christen
enhanced HostBrowser buttons and fixed text input alignment
Changed Files: htroot/HostBrowser.html, htroot/env/base.css
Wed Apr 30 05:14:01 CEST 2014
by Michael Peter Christen
fix for strange fail reason
Changed Files: htroot/IndexCreateParserErrors_p.java
Tue Apr 29 19:50:33 CEST 2014
by Michael Peter Christen
fix for slow crawling and better logging in balancer
Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java
Tue Apr 29 19:24:05 CEST 2014
by Michael Peter Christen
npe fix
Changed Files: source/net/yacy/crawler/CrawlSwitchboard.java
Tue Apr 29 19:13:54 CEST 2014
by Michael Peter Christen
fix to menu colours
Changed Files: skins/pdbootstrap.css
Tue Apr 29 16:24:21 CEST 2014
by Michael Peter Christen
fix for result display
Changed Files: htroot/yacysearchtrailer.html
Tue Apr 29 16:24:01 CEST 2014
by Michael Peter Christen
design fixes to better use the new colours
Changed Files: htroot/Network.html, htroot/js/yacyinteractive.js
Sun Apr 27 20:52:06 CEST 2014
by reger
optimize and fix lat / lon assignment
Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Fri Apr 25 09:26:20 CEST 2014
by orbiter
npe fix
Changed Files: source/net/yacy/kelondro/blob/HeapReader.java
Fri Apr 25 09:23:10 CEST 2014
by orbiter
npe fix
Changed Files: source/net/yacy/kelondro/blob/Tables.java
Wed Apr 23 23:13:07 CEST 2014
by orbiter
fixed a situation where finished crawls had not been detected.
Changed Files: source/net/yacy/search/Switchboard.java
Thu Apr 17 16:58:17 CEST 2014
by Michael Peter Christen
fix for deadlocks in crawler
Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/data/NoticedURL.java
Wed Apr 16 22:24:04 CEST 2014
by Michael Peter Christen
fix for display bug
Changed Files: htroot/HostBrowser.java
Fri Apr 11 15:12:34 CEST 2014
by Michael Peter Christen
fix for virtual root nodes
Changed Files: source/net/yacy/search/schema/HyperlinkGraph.java
Fri Apr 11 09:56:44 CEST 2014
by Michael Peter Christen
fix for maximum tag length in parser
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Thu Apr 10 09:08:59 CEST 2014
by Michael Peter Christen
fix for wrong status codes of error pages
Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/data/CrawlQueues.java, source/net/yacy/crawler/retrieval/HTTPLoader.java, source/net/yacy/repository/LoaderDispatcher.java, source/net/yacy/search/index/ErrorCache.java


Other Changes   
Jump to: YaCy Release 1.72 top / Bugfixes

CommitDescription
Tue May 06 18:54:56 CEST 2014
by Michael Peter Christen
Release 1.72
Changed Files: build.properties
Tue May 06 16:48:50 CEST 2014
by Michael Peter Christen
enhanced snippets: remove lines which are identical to the title and
choose longer versions if possible. Prefer the description part.
Changed Files: source/net/yacy/cora/federate/solr/responsewriter/GSAResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/OpensearchResponseWriter.java, source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java, source/net/yacy/http/servlets/GSAsearchServlet.java, source/net/yacy/http/servlets/SolrSelectServlet.java, source/net/yacy/peers/Protocol.java, source/net/yacy/search/query/SearchEvent.java
Tue May 06 05:58:51 CEST 2014
by orbiter
fix for navigation steering / p2p mode
see also:
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5198&p=29958#p29958
Changed Files: htroot/env/templates/submenuAccessTracker.template, htroot/env/templates/submenuCrawlMonitor.template, htroot/env/templates/submenuIndexControl.template
Mon May 05 13:24:41 CEST 2014
by sixcooler
o not check for segments-count on optimize:
this is also done in Solr and our getSegmentsCount() does not return
up-to-date values
Changed Files: source/net/yacy/cora/federate/solr/connector/SolrServerConnector.java
Sun May 04 09:29:07 CEST 2014
by reger
content of surrogates/out never accessed (remove)
After import the conent is never accessed but may take up a lot of disk space,
also the getLoadedOAIServer (which lists the files in surrogate out) is not used.
Making the surrogate.out obsolete. Removed keeping of xmls after import.
Changed Files: source/net/yacy/document/importer/OAIPMHImporter.java, source/net/yacy/search/Switchboard.java, source/net/yacy/search/SwitchboardConstants.java
Sat May 03 21:57:06 CEST 2014
by reger
Merge origin/master
Changed Files: source/net/yacy/kelondro/table/Table.java
Sat May 03 21:55:10 CEST 2014
by reger
fix input-group layout on index.html
see bug http://mantis.tokeek.de/view.php?id=391
Changed Files: htroot/index.html
Fri May 02 22:55:47 CEST 2014
by sixcooler
remove tables from tabletracker on close to avoid lots of dead entrys in
/PerformanceMemory_p.html
Changed Files: source/net/yacy/kelondro/table/Table.java
Fri May 02 19:32:09 CEST 2014
by reger
fix NPE on continuing crawls after YaCy restart
(Agent is then nulll)
Changed Files: source/net/yacy/crawler/HostBalancer.java
Fri May 02 14:18:52 CEST 2014
by Marc Nause
Key for parameter "blacklist name" is "list" in all servlets now.
Changed Files: htroot/api/blacklists/add_entry_p.java, htroot/api/blacklists/delete_entry_p.java, htroot/api/blacklists/get_list_p.java
Fri May 02 01:15:03 CEST 2014
by reger
adjust search page layout - search box to current style
Changed Files: htroot/ConfigSearchPage_p.html
Fri May 02 00:35:54 CEST 2014
by reger
remove obsolet css class bookmarkfieldset
Changed Files: htroot/Bookmarks.html
Wed Apr 30 13:26:32 CEST 2014
by Michael Peter Christen
added configuration option for maxmimum load and minimum ram for
postprocessing
Changed Files: defaults/yacy.init, source/net/yacy/search/Switchboard.java
Wed Apr 30 06:46:06 CEST 2014
by Michael Peter Christen
input-group for main search input window
Changed Files: htroot/index.html
Wed Apr 30 05:05:02 CEST 2014
by Michael Peter Christen
use submitted default userAgent if cloning a crawl
Changed Files: htroot/CrawlStartExpert.html, htroot/CrawlStartExpert.java
Tue Apr 29 22:51:01 CEST 2014
by reger
add display filter (active/disabled) to IndexSchema_p.html config
for easier overview of schema fields
Changed Files: htroot/IndexSchema_p.html, htroot/IndexSchema_p.java
Tue Apr 29 18:46:50 CEST 2014
by Michael Peter Christen
small changes to search headline colour
Changed Files: defaults/yacy.init, skins/pdbootstrap.css
Tue Apr 29 16:23:42 CEST 2014
by Michael Peter Christen
new default skin pdbootstrap which keeps the design shapes but slightly
changes the colours to match with bootstrap colours
Changed Files: defaults/yacy.init, skins/pdbootstrap.css
Tue Apr 29 16:22:31 CEST 2014
by Michael Peter Christen
better buttons
Changed Files: htroot/CrawlResults.html, htroot/Crawler_p.html, htroot/Table_API_p.html
Tue Apr 29 00:41:29 CEST 2014
by reger
add html5 audio/video <source> tag to html content scraper
- <source src=.. type=..> tag content is added to embed collection
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java
Mon Apr 28 11:52:13 CEST 2014
by Michael Peter Christen
bootstrap update
Changed Files: htroot/env/bootstrap/css/bootstrap-rtl.css, htroot/env/bootstrap/css/bootstrap-rtl.min.css, htroot/env/bootstrap/css/bootstrap.css, htroot/env/bootstrap/css/bootstrap.css.map, htroot/env/bootstrap/css/bootstrap.min.css, htroot/env/bootstrap/js/bootstrap.js, htroot/env/bootstrap/js/bootstrap.min.js
Mon Apr 28 04:59:47 CEST 2014
by reger
fix contentscraper img height/width parsing
prevent numberformat exception on common "100px" property

- include in test case
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/net/yacy/cora/document/id/DigestURLTest.java, test/net/yacy/document/parser/htmlParserTest.java
Sun Apr 27 23:54:34 CEST 2014
by malykhin.dmitry
Update russian translation
Changed Files: locales/ru.lng
Sun Apr 27 22:22:00 CEST 2014
by reger
remove redundant javascript & id in index.html
to set focus to query field in IE11
Changed Files: htroot/index.html
Sun Apr 27 18:20:33 CEST 2014
by reger
reimplement tighter lat/lon calc in URIMetadataNode
from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272
Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Sat Apr 26 22:27:59 CEST 2014
by reger
add exit proxy link to UrlProxy
on proxied pages a link to exit proxy is added to top of page.
Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.
Changed Files: defaults/web.xml, source/net/yacy/http/servlets/UrlProxyServlet.java
Sat Apr 26 01:30:51 CEST 2014
by reger
throw MalformedURLException on unknown protocol
on other than the supported   http https ftp file smb \\  mailto
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java
Fri Apr 25 20:15:55 CEST 2014
by reger
fix: resolve url without path but searchpart 
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47

added test case for getHost
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java, test/net/yacy/cora/document/id/MultiProtocolURLTest.java
Fri Apr 25 01:05:28 CEST 2014
by reger
recover sax fatal error on OAI-PMH import of xml with entity error
this allows to continue loading next resumptionToken even if import file caused sax parser error
fix http://mantis.tokeek.de/view.php?id=63
Changed Files: htroot/IndexImportOAIPMHList_p.java, source/net/yacy/document/importer/OAIListFriendsLoader.java, source/net/yacy/document/importer/OAIPMHImporter.java, source/net/yacy/document/importer/OAIPMHLoader.java, source/net/yacy/document/importer/ResumptionToken.java
Wed Apr 23 23:41:10 CEST 2014
by reger
add current css to HTMLResponseWriter to fix metadata view
(using css from metas.template except js links)
Changed Files: htroot/env/templates/metas.template, source/net/yacy/cora/federate/solr/responsewriter/HTMLResponseWriter.java
Wed Apr 23 23:12:08 CEST 2014
by orbiter
better removal of stored urls when doing a crawl start
Changed Files: htroot/Crawler_p.java
Wed Apr 23 23:11:37 CEST 2014
by orbiter
enhanced Host Balancer strategy: fair round robin
Changed Files: source/net/yacy/crawler/HostBalancer.java
Wed Apr 23 08:41:36 CEST 2014
by orbiter
do not apply lazy value instantiation for numeric or boolean values
because that is misleading and confusing in case of 0- or false-values
and may cause NPEs in retrieval functions.
Changed Files: source/net/yacy/cora/federate/solr/SchemaConfiguration.java
Wed Apr 23 08:37:19 CEST 2014
by orbiter
in case of short memory, do not cut down robinson peers to 1, just
reduce by 50%
Changed Files: source/net/yacy/peers/RemoteSearch.java
Wed Apr 23 00:55:16 CEST 2014
by reger
exclude html tags in in/outboundlinks_anchortext_txt parsed text
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
Changed Files: source/net/yacy/document/parser/html/ContentScraper.java, test/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnectorTest.java, test/net/yacy/document/parser/htmlParserTest.java
Tue Apr 22 23:14:54 CEST 2014
by orbiter
added new button to terminate all crawls
Changed Files: htroot/Crawler_p.html, htroot/Crawler_p.java
Tue Apr 22 23:14:05 CEST 2014
by orbiter
catch IllegalArgumentException for wrong process types (that is needed
for migrations when new process types are introduced or disappear)
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Tue Apr 22 19:48:49 CEST 2014
by orbiter
fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of
lazy value instantiation of 0-value in crawldepth_i
Changed Files: htroot/HostBrowser.java, source/net/yacy/search/schema/CollectionConfiguration.java
Tue Apr 22 19:35:15 CEST 2014
by orbiter
removed warnings
Changed Files: source/net/yacy/kelondro/data/meta/URIMetadataNode.java
Mon Apr 21 17:28:21 CEST 2014
by reger
add custom Jetty errorhandler 
to provide custom error page footer line
- remove redundant mime check in UrlProxyServlet
Changed Files: source/net/yacy/http/Jetty8HttpServerImpl.java, source/net/yacy/http/YaCyErrorHandler.java, source/net/yacy/http/servlets/UrlProxyServlet.java
Mon Apr 21 17:16:06 CEST 2014
by reger
defer creation of new ArrayList after possible early return
(to skip not used object allocation)
Changed Files: source/net/yacy/peers/Protocol.java
Fri Apr 18 22:03:16 CEST 2014
by reger
- remove empty http0_9 status text array
  and unused default_charset = ISO-8859-1
Changed Files: source/net/yacy/cora/protocol/HeaderFramework.java, source/net/yacy/cora/protocol/ResponseHeader.java, source/net/yacy/server/http/HTTPDemon.java
Fri Apr 18 19:57:35 CEST 2014
by reger
- remove unused manual http KeepAlive config
    (reducing references to obsolete httpdemon)
- add port info to settings_http
Changed Files: defaults/yacy.init, htroot/SettingsAck_p.html, htroot/SettingsAck_p.java, htroot/Settings_Http.inc, htroot/Settings_p.java, source/net/yacy/server/http/HTTPDemon.java
Fri Apr 18 06:51:46 CEST 2014
by Michael Peter Christen
add canonical links to the same crawldepth, not the next crawldepth
Changed Files: source/net/yacy/document/Document.java, source/net/yacy/search/Switchboard.java
Fri Apr 18 06:51:10 CEST 2014
by Michael Peter Christen
increased runtime for postprocessing query job
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java
Fri Apr 18 06:50:07 CEST 2014
by Michael Peter Christen
special strategy for balancer: do not remove targets with zero wait time
from the queue
Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/LegacyBalancer.java
Thu Apr 17 16:19:38 CEST 2014
by Michael Peter Christen
increased resource.disk.used.max.steadystate and
resource.disk.used.max.overshot by 4 times because first users reached
that limit and wondered why the crawler was paused automatically :)

The crawler will now stop at 2TB disk usage :)
Changed Files: defaults/yacy.init
Thu Apr 17 12:54:18 CEST 2014
by Michael Peter Christen
- better subgraph handling, less overhead for crawls without the
webgraph
- usage of crawler crawldepth cache for the linkgraph target depth
computation
Changed Files: source/net/yacy/search/schema/CollectionConfiguration.java, source/net/yacy/search/schema/WebgraphConfiguration.java
Thu Apr 17 12:52:54 CEST 2014
by Michael Peter Christen
new Strategies in Balancer:
- doublecheck cache now records the crawl depth as well
- doublecheck cache is available from the outside (made static)
- no more need to crawl hosts with lowest depth first, instead all hosts
which have only singleton entries are preferred to reduce the number of
files.
Changed Files: source/net/yacy/crawler/HostBalancer.java, source/net/yacy/crawler/HostQueue.java, source/net/yacy/crawler/robots/RobotsTxt.java
Thu Apr 17 12:44:05 CEST 2014
by Michael Peter Christen
fix for Table in case that requested file does not exist and paths also
do not exist
Changed Files: source/net/yacy/kelondro/table/Table.java
Thu Apr 17 03:20:29 CEST 2014
by reger
implement gzip input handling directly in defaultservlet
(making reference to legacy httpdemon obsolete)
Changed Files: source/net/yacy/http/servlets/YaCyDefaultServlet.java, source/net/yacy/server/http/HTTPDemon.java
Mon Apr 14 13:32:35 CEST 2014
by Michael Peter Christen
refactoring of the crawl balancer: the balancer is turned into an
interface and the old balancer class is moved into LegacyBalancer to
make room for a fresh implementation of a crawl balancer.
Changed Files: source/net/yacy/crawler/Balancer.java, source/net/yacy/crawler/LegacyBalancer.java, source/net/yacy/crawler/data/NoticedURL.java, source/net/yacy/search/Switchboard.java
Sun Apr 13 07:32:32 CEST 2014
by reger
autoupdate fails to download latest release (1.71) due to default release blacklist
- removed the default version blacklist regex from init (for future versions)

!!!  left existing update  blacklist setting untouched !!! 
(existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html)

- moved old blacklist patch to migration.java
Changed Files: defaults/yacy.init, source/net/yacy/migration.java, source/net/yacy/peers/operation/yacyRelease.java
Fri Apr 11 12:27:21 CEST 2014
by Michael Peter Christen
find depth-matches also for edge targets
Changed Files: source/net/yacy/search/schema/HyperlinkEdges.java
Fri Apr 11 12:09:33 CEST 2014
by Michael Peter Christen
introduction of a data structure for HyperlinkEdges which should use
less memory as it does no double-storage of source links for each edge
of the graph.
Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkEdges.java, source/net/yacy/search/schema/HyperlinkGraph.java
Fri Apr 11 10:58:37 CEST 2014
by Michael Peter Christen
using MultiProtocolURL for edge data which is faster (hash computation
is now much easier) and smaller in size
Changed Files: source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java
Fri Apr 11 10:23:48 CEST 2014
by Michael Peter Christen
enhanced hashcode computation for MultiProtocolURL
Changed Files: source/net/yacy/cora/document/id/MultiProtocolURL.java
Fri Apr 11 09:25:18 CEST 2014
by Michael Peter Christen
refactoring of SystemLoad calls (only one backend tool)
Changed Files: source/net/yacy/kelondro/util/MemoryControl.java, source/net/yacy/kelondro/workflow/AbstractBusyThread.java
Thu Apr 10 23:46:35 CEST 2014
by Michael Peter Christen
refactoring
Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/schema/HyperlinkEdge.java, source/net/yacy/search/schema/HyperlinkGraph.java, source/net/yacy/search/schema/HyperlinkType.java
Wed Apr 09 21:59:54 CEST 2014
by Michael Peter Christen
also delete the robots.txt file from the cache when a new crawl is
started
Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/robots/RobotsTxt.java
Wed Apr 09 18:33:48 CEST 2014
by Michael Peter Christen
fix for robots.txt handling: delete old entry before starting a new
crawl.
Changed Files: htroot/Crawler_p.java, source/net/yacy/crawler/robots/RobotsTxt.java, source/net/yacy/search/schema/CollectionConfiguration.java
Wed Apr 09 17:52:51 CEST 2014
by orbiter
linkstructure refactoring to get more options for clickdepth analysis
Changed Files: htroot/api/linkstructure.java, source/net/yacy/search/index/Segment.java, source/net/yacy/search/schema/HyperlinkGraph.java
Sun Apr 06 22:31:22 CEST 2014
by reger
fix: typo in default charset in metadata2solr
update pom and NB build to Solr 4.7.1 libs
Changed Files: nbproject/project.xml, pom.xml, source/net/yacy/search/schema/CollectionConfiguration.java
Sun Apr 06 11:04:23 CEST 2014
by Michael Peter Christen
do solr optimization independently from memory and load constraints:
- not doing an optimization will likely cause a too many files exception
- without optimization performance will be even worse which would
prevent optimization in the future as well (prevent a deadlock
situation)
Changed Files: source/net/yacy/search/Switchboard.java
Sun Apr 06 03:59:11 CEST 2014
by reger
update commons-compress.jar to 1.8
Changed Files: .classpath, addon/YaCy.app/Contents/Info.plist, build.xml, lib/commons-compress-1.8.License, lib/commons-compress-1.8.jar, nbproject/project.xml, pom.xml
Sun Apr 06 01:20:03 CEST 2014
by Michael Peter Christen
different algorithm to test checkalive as it depends less on the
existence of wget (or curl) on the OS.
Changed Files: bin/checkalive.sh
Sun Apr 06 01:00:09 CEST 2014
by Michael Peter Christen
Emergency bugfix for killYACY.sh as the file yacy00.log does not exist
in case that a too many open files error exist. In such a case, the file
yacy00.log does not exist but only the file yacy00.log.lck.
In the long term a different solution should be addressed.
Changed Files: killYACY.sh
Sun Apr 06 00:35:35 CEST 2014
by Michael Peter Christen
test using compound file format, see UseCompoundFile in
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
This appears to be necessary as many times a
java.io.FileNotFoundException: (Too many open files) appears.
See also: https://issues.apache.org/jira/browse/SOLR-4 and desperate
users at
http://stackoverflow.com/questions/3828343/too-many-open-file-exception-while-indexin-using-solr
We cannot force users to do a "ulimit -n 1000000", so this action seems
to be required.
Changed Files: defaults/solr/solrconfig.xml
Sun Apr 06 00:32:10 CEST 2014
by Michael Peter Christen
next development version 1.71
It's nowhere explained or declared, but since some time we follow the
schema that uneven version numbers are used for development versions and
even numbers for release versions. That concept may change sometime but
this is used at this time to distinguish development from main.
Changed Files: build.properties
Sun Apr 06 00:20:12 CEST 2014
by reger
upd version in pom
Changed Files: pom.xml