# Why would we expect libwget to catch on?
What advantages does libwget have compared with libcurl?

Wget and Curl are both old and powerful tools for retrieving web pages.
However, one major difference has always been that curl prefers to provide
shallow support for a wide number of protocols while Wget follows the UNIX
philosophy of, "do one thing and do it well". Wget supports only HTTP(S) and
FTP(S), but allows more complicated actions such as recursive downloading and
filtered downloads.

In a similar fashion, libwget will differ from libcurl. Libcurl is a very
'procotol-centric' library, in that it provides a thin abstraction over lots of
internet protocols, such as HTTP(S), FTP(S), FTP, POP, SMTP, LDAP, etc. But it
leaves most of the application functionality to the client itself. Libwget on
the other hand, intends to offer only HTTP(S) support but with a higher level
of 'understanding' and lots of API tools to easily write programs on top of
HTTP that parse and understand HTML as well.

Libwget now offers API access to all levels of functionality, from basic
algorithms like hashmaps and vectors up to high level download functions. Here
is an incomplete list of API functionality groups:

Doubly linked list
Hashmap / Stringmap
Vector
Base64
String/Memory Buffer
Logger
Portable Threads (wrapper around gnulib’s gthreads, but not limited to)
Decompressors (gzip, bzip2, lzma/xz, brotli, zstd)
URI parsing
Cookie parsing + storage handling
HSTS parsing + storage handling
HPKP parsing + storage handling
TLS session / storage handling
OCSP parsing + storage handling
.netrc parsing
CSS scanning
HTML scanning
XML scanning (e.g. for RSS / Atom feed or streaming formats)
DNS caching
TCP networking
TLS protocol
HTTP parser
GPG Integration for verifying download integrity
Hashing (SHA*, MD*, etc)
Metalink parser
Console Progress bar
Plugin functionality
Statistics data retrieval

Libcurl in comparison offers (from this list) ‘TCP Networking’, ‘TLS protocol’
and in a way ‘HTTP parser’. Of course, it offers much more in terms of support
for various TLS backends and many other protocols. But libwget will make it
much easier for an application developer to write something that uses HTTP only
and deals with HTML / CSS pages. Or it should be much easier to write custom
web crawlers using libwget than it is using libcurl.

Even within the GNU Project, we already maintain a fork of libcurl which is a
stripped down version of it. Having spoken to the maintainers of libgnurl, I
know that they'd be happy to ditch it for a LGPL licensed offering from within
the GNU Project.

> How many lines is the new version of Wget itself?
> How many is libwget?

The library (without tests, fuzzing, gnulib, etc)
$ cloc include libwget
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               54           3682           7372          16208
make                             6            268            240           5499
C/C++ Header                     7            283            833           1985
-------------------------------------------------------------------------------
SUM:                            67           4233           8445          23692

Wget2 (without tests, fuzzing, gnulib, etc)
$ cloc src
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                                18           1761           1210           8374
make                              2             82             31           1851
C/C++ Header                     15            134            464            615
Bourne Again Shell                1             27             52            131
--------------------------------------------------------------------------------
SUM:                             36           2004           1757          10971


> What code would be in wget that wouldn’t be in libwget?  What jobs
> does that code do?  What fraction of wget will be libwget and what
> fraction will be the GPL'd rest?

The new version of Wget, (which we are currently calling wget2), has about 1/3
of the total code while libwget contains 2/3. libwget contains all the code
that talks DNS, TCP and HTTP and also the HTML, XML, CSS parsers that are used
for recursion. On the other hand, the Wget2 binary mostly contains code that
handles:

- Command Line Parsing
- Thread and Job management
- Handling recursion
- Basic program logic and state machine
- User interaction

All the code in libwget will be under LGPLv3 while the code in Wget2 is licensed
under GPLv3.


# vim: set ts=4 sts=4 sw=4 tw=79 ft=markdown noet :
