Who this is for

Netflix has a private enterprise GitHub (and goproxy), which means that pkgsite does not crawl it. I set out to run pkgsite myself against this private enterprise GitHub.

This article is for anyone else setting out to do similarly.

Understanding pkgsite

github.com/golang/pkgsite/blob/master/doc/design.md describes pkgsite’s components. sum.golang.org describes the proxy and index that enables pkgsite.

Code layout

pkgsite is canonically hosted at go.googlesource.com/pkgsite and can be cloned with git clone https://go.googlesource.com/pkgsite. The important parts for this article are,

$ tree
.
|____cmd
| |____frontend
| | |____main.go
| |____pkgsite
| | |____main.go
| |____worker
| | |____main.go
|____migrations
| |____000073_create_documentation_symbols.down.sql
| |____000018_add_module_version_states_status_index.up.sql
| |____...etc...
|____devtools
| |____cmd
| | |____seeddb
| | | |____seed.txt
| | | |____main.go

Let’s quickly describe each:

  • cmd/pkgsite: The simplest and best documented. Running this lets you host a minimal pkgsite. It’s not connected to a database and doesn’t have a worker searching for modules. So, it doesn’t have a usable search feature. It’s useful for looking at one or two of your own projects but without search, it’s not good enough for an entire org.
  • frontend: The web-serving frontend. When run, serves the frontend at localhost:8080.
  • worker: The data-fetching backend that updates the database. When run, serves a work management page at localhost:8000.
  • migrations: A series of sql scripts that when run in order will set up all the necessary tables and schemas.
  • seeddb: A one-shot program that reads a seed.txt file, fetches modules listed there-in, and populates the database with the information.

Modifiable config

The frontend and worker both use internal/config/serverconfig/config.go, which is modifiable with environment variables. Here’s a few you may care about:

# Proxy config.
ProxyURL:   GetEnv("GO_MODULE_PROXY_URL", "https://proxy.golang.org"),

# Database config.
DBHost:               chooseOne(GetEnv("GO_DISCOVERY_DATABASE_HOST", "localhost")),
DBUser:               GetEnv("GO_DISCOVERY_DATABASE_USER", "postgres"),
DBPassword:           os.Getenv("GO_DISCOVERY_DATABASE_PASSWORD"),
DBSecondaryHost:      chooseOne(os.Getenv("GO_DISCOVERY_DATABASE_SECONDARY_HOST")),
DBPort:               GetEnv("GO_DISCOVERY_DATABASE_PORT", "5432"),
DBName:               GetEnv("GO_DISCOVERY_DATABASE_NAME", "discovery-db"),

Running pkgsite

A simple pkgsite

You can run a simple pkgsite with go run cmd/pkgsite/main.go -proxy=URL, but it is only hosts documentation for a single repo. So, we’ll move on from this.

Standing up a real pkgsite

We’ll need five things running:

  • A go proxy server.
  • A go index server.
  • A postgres database.
  • The pkgsite frontend.
  • The pkgsite worker.

Proxy

We’ll assume you already have a proxy server. But if not, check out github.com/goproxy/goproxy as one option.

Index

A simple index server is given at github.com/jeanbza/golang-index. You may wish to build your own, though. Its role is simple, and described at github.com/golang/pkgsite/blob/master/doc/design.md.

Postgres

There are numerous ways to run a postgres database. Here’s something quick, written for a Mac, to get you started:

brew install postgresql@17
# Note: Don't forget the part where homebrew asks you to add to your PATH.
# Note: homebrew may make $USERNAME the default username, not "postgres".
brew services start postgresql@17
createdb discovery-db

# Now, let's set up the tables necessary by running all the migrations.
for f in `ls pkgsite/migrations`; do psql -d discovery-db -a -f "pkgsite/migrations/$f"; done;

Check out your tables with,

psql -d discovery-db
\dt

Frontend

Now, let’s run the frontend:

cd pkgsite
# You may need to set GO_DISCOVERY_DATABASE_USER=myusername.
go run cmd/frontend/main.go \
    -local \
    -proxy_url=https://url-to-my-proxy \
    -bypass_license_check=true

Worker

Finally, let’s run the worker:

# You may need to set GO_DISCOVERY_DATABASE_USER=myusername.
GO_MODULE_PROXY_URL=https://url-to-my-proxy \
    GO_MODULE_INDEX_URL=https://url-to-my-index \
    go run cmd/worker/main.go \
    -bypass_license_check=true

Note: If you’re using github.com/jeanbza/golang-index, the proxy runs (as of this writing) at http://localhost:8081. You’ll note this is raw http. Trying to run the worker with a non-TLS’d GO_MODULE_INDEX_URL will result in an error. Get around this by making the worker accept a non-https index by commenting out the https validation at pkgsite/internal/index/index.go’s New() function.

Poke the worker

The worker will periodically do work, but you can poke it by either visiting http://localhost:8000 and clicking the poll/enqueue buttons, or by something like,

# Assuming the worker is running at localhost:8000.
while true ; do date && curl localhost:8000/enqueue && curl localhost:8000/poll && sleep 20; done;

See the results

Check the “modules” page at http://localhost:8000 to see how the worker is getting along.

As it progresses, it populates the postgres database. You should begin to start seeing modules through the frontend. To do so, navigate to http://localhost:8080, where the frontend is hosted.

Useful debugging

I found the following db commands useful:

psql -d discovery-db

# List tables.
\dt

# Show upcoming processing.
SELECT * FROM module_version_states;
SELECT module_path, last_processed_at, next_processed_after FROM module_version_states;

# Update processing time.
UPDATE module_version_states SET next_processed_after = next_processed_after - INTERVAL '2 HOUR';

# How many we know about.
SELECT DISTINCT(module_path) FROM modules;
SELECT COUNT(version) FROM modules;

# Drop it all and start over.
TRUNCATE documentation, documentation_symbols, excluded_prefixes, imports, imports_unique, latest_module_versions, legacy_documentation_symbols, licenses, module_version_states, modules, package_symbols, package_version_states, paths, readmes, search_documents, symbol_history, symbol_names, symbol_search_documents, units, version_map;