Skip to content

Commit

Permalink
Merge pull request #17 from logmanager-oss/bug_fixes
Browse files Browse the repository at this point in the history
bug fixes
  • Loading branch information
tender-barbarian authored Jan 23, 2025
2 parents 6c618b4 + 1fa3599 commit 5a12fc6
Show file tree
Hide file tree
Showing 21 changed files with 206 additions and 129 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@

.DS_Store
dist/
.vscode
38 changes: 20 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,24 +23,22 @@ There are two components needed to make this work:

```
Usage of ./logveil:
-c value
Path to input file containing custom anonymization mappings
-d value
Path to directory with anonymizing data
-e Change input file type to LM export (default: LM Backup)
-i value
Path to input file containing logs to be anonymized (mandatory - if you don't specify input, code will fail)
Path to input file containing logs to be anonymized
-o value
Path to output file (default: Stdout)
-c value
Path to input file with custom anonymization mapping
-v
Enable verbose logging
-e
Change input file type to LM export (default: LM Backup)
-p
Disable proof writer (default: Enabled)
-r
Disable persistent (per session) replacement map (default: Enabled)
-h
Help for logveil
-p Enable proof writer (default: Disabled)
-r Enable persistent (per session) replacement map (default: Disabled)
-rs int
Size of the reader buffer in Bytes (default 4000000)
-v Enable verbose logging (default: Disabled)
-ws int
Size of the writer buffer in Bytes (default 2000000)
```

**Examples:**
Expand All @@ -53,15 +51,15 @@ Usage of ./logveil:

`./logveil -d example_anon_data/ -i lm_backup.gz -o output.txt`

3. Read log data from LM Backup file (GZIP), output anonymization result to `output.txt` file and disable writing anonymization proof.
3. Read log data from LM Backup file (GZIP), output anonymization result to `output.txt` file and enable writing anonymization proof.

`./logveil -d example_anon_data/ -i lm_backup.gz -o output.txt -p`

4. Read log data from LM Export file (CSV), output anonymization result to standard output (STDOUT) and disable writing anonymization proof.
4. Read log data from LM Export file (CSV), output anonymization result to standard output (STDOUT) and enable writing anonymization proof.

`./logveil -d example_anon_data/ -e -i lm_export.csv -p`

5. Read log data from LM Export file (CSV), output anonymization result to standard output (STDOUT), disable writing anonymization proof and enable verbose logging.
5. Read log data from LM Export file (CSV), output anonymization result to standard output (STDOUT), enable writing anonymization proof and verbose logging.

`./logveil -d example_anon_data/ -e -i lm_export.csv -p -v`

Expand Down Expand Up @@ -105,6 +103,8 @@ If you want to anonymize values in `organization` and `username` keys, you need

Both files should contain appropriate fake data for the values they will be masking.

**Make sure your filenames DOES NOT contain `msg.`**

### Regexp scanning and dynamic fake data generation

LogVeil implements regular expressions to look for common patterns: IP (v4, v6), Emails, MAC and URL. Once such pattern is found it is replaced with fake data generated on the fly.
Expand Down Expand Up @@ -162,9 +162,11 @@ And anonymization proof:
{"original": "71:e5:41:18:cb:3e", "new": "0f:da:68:92:7f:2b"},
```

## Replacement map and possible memory issues
## Replacement map, possible memory issues and performance

You can use `-r` flag to enable persistent replacement map. In such case LogVeil will keep replacement map in memory for each code run (per session) to make sure each unique value gets the same anonymized value each time it is encountered. Depending on the size of input data this replacement map can grow quite large which will cause degrading performance.

LogVeil keeps a replacement map in memory for each code run (per session) to make sure each unique value gets the same anonymized value each time it is encountered. Depending on the size of input data this replacement map can grow quite large, potentially even exhausting available memory (though unlikely). If you'll encounter a memory issue use `-r` flag to disable persistent replacement map.
In case of performance issues you can try using `rs` and `ws` flags to chage reader/writer buffer capacity - less I/O operations due to larger buffers should improve performance.

## Release

Expand Down
32 changes: 26 additions & 6 deletions cmd/logveil/logveil.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@ package logveil

import (
"bufio"
"context"
"errors"
"fmt"
"io"
"log/slog"
"os"
"os/signal"

"github.com/logmanager-oss/logveil/internal/anonymizer"
"github.com/logmanager-oss/logveil/internal/config"
"github.com/logmanager-oss/logveil/internal/files"
"github.com/logmanager-oss/logveil/internal/handlers"
"github.com/logmanager-oss/logveil/internal/proof"
"github.com/logmanager-oss/logveil/internal/reader"
"github.com/logmanager-oss/logveil/internal/writer"
Expand All @@ -25,23 +28,30 @@ func Start() {
slog.SetLogLoggerLevel(slog.LevelDebug)
}

filesHandler := &files.FilesHandler{}
filesHandler := &handlers.Files{}
defer filesHandler.Close()

buffersHandler := &handlers.Buffers{}
defer buffersHandler.Flush()

inputReader, err := reader.CreateInputReader(config, filesHandler)
if err != nil {
slog.Error("initializing input reader", "error", err)
return
}
outputWriter, err := writer.CreateOutputWriter(config, filesHandler)
outputWriter, err := writer.CreateOutputWriter(config, filesHandler, buffersHandler)
if err != nil {
slog.Error("initializing output writer", "error", err)
return
}
proofWriter, err := proof.CreateProofWriter(config, filesHandler)
proofWriter, err := proof.CreateProofWriter(config, filesHandler, buffersHandler)
if err != nil {
slog.Error("initializing proof writer", "error", err)
return
}
anonymizerDoer, err := anonymizer.CreateAnonymizer(config, proofWriter)
if err != nil {
slog.Error("initializing anonymizer", "error", err)
return
}

Expand All @@ -55,7 +65,8 @@ func Start() {
}

func RunAnonymizationLoop(inputReader reader.InputReader, outputWriter *bufio.Writer, anonymizerDoer *anonymizer.Anonymizer) error {
defer outputWriter.Flush()
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()

for {
logLine, err := inputReader.ReadLine()
Expand All @@ -70,7 +81,16 @@ func RunAnonymizationLoop(inputReader reader.InputReader, outputWriter *bufio.Wr

_, err = fmt.Fprintln(outputWriter, anonymizedLogLine)
if err != nil {
return fmt.Errorf("writing log line to buffer: %v", err)
return fmt.Errorf("writing log line %s: %v", anonymizedLogLine, err)
}

select {
case <-ctx.Done():
fmt.Println("\nInterrupt received, closing...")
stop()
return nil
default:
continue
}
}
}
4 changes: 3 additions & 1 deletion cmd/main.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package main

import "github.com/logmanager-oss/logveil/cmd/logveil"
import (
"github.com/logmanager-oss/logveil/cmd/logveil"
)

func main() {
logveil.Start()
Expand Down
32 changes: 14 additions & 18 deletions internal/anonymizer/anonymizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"log/slog"
"maps"
"regexp"
"strings"

"math/rand/v2"

Expand Down Expand Up @@ -49,14 +50,14 @@ func CreateAnonymizer(config *config.Config, proofWriter *proof.ProofWriter) (*A
}

func (an *Anonymizer) Anonymize(logLine map[string]string) string {
replacementMap := an.loadAndReplace(logLine, an.replacementMap)
replacementMap := an.loadAnonymizationData(logLine, an.replacementMap)

logLineRaw := logLine["raw"]
replacementMap = an.generateAndReplace(logLineRaw, replacementMap, an.lookup.ValidIpv4, an.generator.GenerateRandomIPv4())
replacementMap = an.generateAndReplace(logLineRaw, replacementMap, an.lookup.ValidIpv6, an.generator.GenerateRandomIPv6())
replacementMap = an.generateAndReplace(logLineRaw, replacementMap, an.lookup.ValidMac, an.generator.GenerateRandomMac())
replacementMap = an.generateAndReplace(logLineRaw, replacementMap, an.lookup.ValidEmail, an.generator.GenerateRandomEmail())
replacementMap = an.generateAndReplace(logLineRaw, replacementMap, an.lookup.ValidUrl, an.generator.GenerateRandomUrl())
replacementMap = an.generateAnonymizationData(logLineRaw, replacementMap, an.lookup.ValidIpv4, an.generator.GenerateRandomIPv4())
replacementMap = an.generateAnonymizationData(logLineRaw, replacementMap, an.lookup.ValidIpv6, an.generator.GenerateRandomIPv6())
replacementMap = an.generateAnonymizationData(logLineRaw, replacementMap, an.lookup.ValidMac, an.generator.GenerateRandomMac())
replacementMap = an.generateAnonymizationData(logLineRaw, replacementMap, an.lookup.ValidEmail, an.generator.GenerateRandomEmail())
replacementMap = an.generateAnonymizationData(logLineRaw, replacementMap, an.lookup.ValidUrl, an.generator.GenerateRandomUrl())

if an.isPersistReplacementMap {
maps.Copy(an.replacementMap, replacementMap)
Expand All @@ -65,13 +66,13 @@ func (an *Anonymizer) Anonymize(logLine map[string]string) string {
return an.replace(logLineRaw, replacementMap)
}

func (an *Anonymizer) loadAndReplace(logLine map[string]string, replacementMap map[string]string) map[string]string {
func (an *Anonymizer) loadAnonymizationData(logLine map[string]string, replacementMap map[string]string) map[string]string {
for field, value := range logLine {
if field == "raw" {
continue
}

if value == "" {
if value == "" || value == "-" {
continue
}

Expand All @@ -90,7 +91,7 @@ func (an *Anonymizer) loadAndReplace(logLine map[string]string, replacementMap m
return replacementMap
}

func (an *Anonymizer) generateAndReplace(rawLog string, replacementMap map[string]string, regexp *regexp.Regexp, generatedData string) map[string]string {
func (an *Anonymizer) generateAnonymizationData(rawLog string, replacementMap map[string]string, regexp *regexp.Regexp, generatedData string) map[string]string {
values := regexp.FindAllString(rawLog, -1)

for _, value := range values {
Expand All @@ -99,23 +100,18 @@ func (an *Anonymizer) generateAndReplace(rawLog string, replacementMap map[strin
}

replacementMap[value] = generatedData
slog.Debug(fmt.Sprintf("Value matched via regexp. Reaplacing from %s to %s.\n", value, generatedData))
}

return replacementMap
}

func (an *Anonymizer) replace(rawLog string, replacementMap map[string]string) string {
for originalValue, newValue := range replacementMap {
// Added word boundary to avoid matching words withing word. For example "test" in "testing".
r := regexp.MustCompile(fmt.Sprintf(`\b%s\b`, originalValue))
occurrencesCount := strings.Count(rawLog, originalValue)

var found bool
rawLog = r.ReplaceAllStringFunc(rawLog, func(originalValue string) string {
found = true
return newValue
})

if found {
if occurrencesCount > 0 {
rawLog = strings.Replace(rawLog, originalValue, newValue, occurrencesCount)
an.proofWriter.Write(originalValue, newValue)
}
}
Expand Down
17 changes: 11 additions & 6 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,27 @@ type Config struct {
IsLmExport bool
IsProofWriter bool
IsPersistReplacementMap bool
ReaderMaxCapacity int
WriterMaxCapacity int
}

// LoadAndValidate loads values from user supplied input into Config struct and validates them
func (c *Config) LoadAndValidate() {
flag.Func("d", "Path to directory with anonymizing data", validateDir(c.AnonymizationDataPath))
flag.Func("d", "Path to directory with anonymizing data", c.validateDirPath())

flag.Func("i", "Path to input file containing logs to be anonymized", validateInput(c.InputPath))
flag.Func("i", "Path to input file containing logs to be anonymized", c.validateInputPath())

flag.Func("c", "Path to input file containing custom anonymization mappings", validateInput(c.CustomReplacementMapPath))
flag.Func("c", "Path to input file containing custom anonymization mappings", c.validateCustomMappingPath())

flag.Func("o", "Path to output file (default: Stdout)", validateOutput(c.OutputPath))
flag.Func("o", "Path to output file (default: Stdout)", c.validateOutput())

flag.BoolVar(&c.IsVerbose, "v", false, "Enable verbose logging (default: Disabled)")
flag.BoolVar(&c.IsLmExport, "e", false, "Change input file type to LM export (default: LM Backup)")
flag.BoolVar(&c.IsProofWriter, "p", true, "Disable proof writer (default: Enabled)")
flag.BoolVar(&c.IsPersistReplacementMap, "r", true, "Disable persistent (per session) replacement map (default: Enabled)")
flag.BoolVar(&c.IsProofWriter, "p", false, "Enable proof writer (default: Disabled)")
flag.BoolVar(&c.IsPersistReplacementMap, "r", false, "Enable persistent (per session) replacement map (default: Disabled)")

c.ReaderMaxCapacity = *flag.Int("rs", 4000000, "Size of the reader buffer in Bytes")
c.WriterMaxCapacity = *flag.Int("ws", 2000000, "Size of the writer buffer in Bytes")

flag.Parse()

Expand Down
33 changes: 27 additions & 6 deletions internal/config/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"os"
)

func validateInput(inputPath string) func(string) error {
func (c *Config) validateInputPath() func(string) error {
return func(flagValue string) error {
fileInfo, err := os.Stat(flagValue)
if err != nil {
Expand All @@ -17,33 +17,54 @@ func validateInput(inputPath string) func(string) error {
return fmt.Errorf("Input file %s cannot be a directory.\n", flagValue)
}

inputPath = flagValue
c.InputPath = flagValue

return nil
}
}

func validateOutput(outputPath string) func(string) error {
func (c *Config) validateCustomMappingPath() func(string) error {
return func(flagValue string) error {
fileInfo, err := os.Stat(flagValue)
if err != nil {
return err
}

if fileInfo.IsDir() {
return fmt.Errorf("Path to custom mapping file %s cannot be a directory.\n", flagValue)
}

c.CustomReplacementMapPath = flagValue

return nil
}
}

func (c *Config) validateOutput() func(string) error {
return func(flagValue string) error {
fileInfo, err := os.Stat(flagValue)
if err != nil {
// If output path does not exist it's ok - we will create it
if errors.Is(err, os.ErrNotExist) {
c.OutputPath = flagValue
return nil
}
return err
}

// If output path exists check if it's a directory - which would be wrong
if fileInfo.IsDir() {
return fmt.Errorf("Output file %s cannot be a directory.\n", flagValue)
}

outputPath = flagValue
// If output path exists and is not a dir it's ok - file will be truncated
c.OutputPath = flagValue

return nil
}
}

func validateDir(dir string) func(string) error {
func (c *Config) validateDirPath() func(string) error {
return func(flagValue string) error {
fileInfo, err := os.Stat(flagValue)
if err != nil {
Expand All @@ -54,7 +75,7 @@ func validateDir(dir string) func(string) error {
return fmt.Errorf("Path to anonymization data %s needs to be a directory.\n", flagValue)
}

dir = flagValue
c.AnonymizationDataPath = flagValue

return nil
}
Expand Down
17 changes: 0 additions & 17 deletions internal/files/handler.go

This file was deleted.

Loading

0 comments on commit 5a12fc6

Please sign in to comment.